Statistical design - Anova Table

Notice

Recent Posts

Recent Comments

Link

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tags more

Archives

Today

Total

관리 메뉴

DS and stats

Statistical design - Anova Table 본문

Data science - sem 2/Statistical design of investigation

Statistical design - Anova Table

으르미 2022. 3. 15. 04:25

우선 ANOVA란?

구글에 따르면 Analysis of variance, 평균 간의 차이를 분석하는데 사용된 통계적 모델과 그것과 관련된 추정 과정(변동)의 집합
잔차나 오류의 변동, treatment의 변동을 표로 나타낸 것이 ANOVA table

어떤 모델이 더 나은지 성능을 측정하기 위해서 reduced model과 general model을 사용하는데

null hypothesis $ H_0 $ : reduced model, alternative model $ H_1 $: general model이라고 하자.

null model(= reduced model)에서 $ X\beta = X_0 \hat \mu $ 라서

저번 포스팅에서 구했던 SSE랑은 조금 다르게 나온다

$ SSE_0 = (y-X_0\hat \mu)^T(y-X_0\hat \mu) = y^TAy $ where $ A = I_n-\frac 1 n 11^T $

$ A*A = (I_n- \frac 1 n 11^T) * ( I_n- \frac 1 n 11^T ) = I_n - \frac 1 n 11^T - \frac 1 n 11^T - \frac 1 {n^2} 11^T11^T = I_n-\frac 1 n 11^T = A $ (∵ $ 1^T1 = n ) $

따라서 A도 idempotent인 것을 알 수 있고, rank(A) = n-1, 자유도랑 같아야 함.

Null model 과 Alternative model의 SSE 비교

항상 파라미터가 적으면 모델 핏이 안좋으므로 $ SSE_0 > SSE $가 성립한다.

$ SSE_0 - SSE = y^TAy - y^T(1-XGX^T)y = y^TBy $ where B = $ XGX^T - \frac 1 n 11^T $

카이제곱분포의 정의에 따라서 SSE / $ \sigma^2$ ~ chi-square (rank(A), λ)

즉, 두 모델의 차이를 계산한 F값이 F분포 임계값보다 크다 -> reject the null hypothesis = alternative 모델이 더 성능이 좋다.

Blocking

목적: 전체 모집단에서 결과를 일반화 하기 위해 treatment effect를 감지할 power를 증가하고 실험오류의 분산을 줄이기 위함!

* 여기서 power란, 1- p(type 2 error)

실험 유닛을 최대한 비슷한 특성을 가지도록 만들어야 한다.

서로 다른 유닛이더라도 treatment factor 레벨을 랜덤화하기 전에 비슷한 성질을 가진 서브그룹으로 나누면 됨!

=> 이 일련의 과정을 blocking이라고 함

예를 들면, 농업 관련된 실험에서는 주로 근접성에 의해 블록으로 나눈다

그리고 동물데이터 관련 실험에서는 유전적으로 비슷한 동물끼리 그룹화하는 식이다.

RCB(Randomised complete block design)

2022.03.12 - [Data science - sem 2/Statistical design of investigation] - statistical design - Mean model vs. Treatment effects model

statistical design - Mean model vs. Treatment effects model

2022.03.11 - [Data science - sem 2/Statistical design of investigation] - 통계 실험 디자인하기 - CRD(Completely Randomised design) 지난 번에 이어서, (means model) $$ Y_{ij} = \mu_i + \epsilon_{ij} $..

eat-drink-study.tistory.com

앞에서 treatment model을 정리했었다.

거기에 블록 효과만 추가하면 되는데

$ y_{ij} = \mu + b_i + \tau_i + \epsilon_{ij} $

$\tau $는 treatment 효과이고,

$ \epsilon_{ij} $ : 실험 오류, 표준정규분포를 따르고 서로 독립임.

$ b_i $ : 블록 효과인데 $\tau $랑은 서로 interaction효과가 존재하지 않는다

어차피 treatment interaction이 포함된 block의 경우, ssE의 자유도가 0일 것이다.

저작자표시 비영리 동일조건 (새창열림)

'Data science - sem 2 > Statistical design of investigation' 카테고리의 다른 글

Observational studies and Causal inference (관찰 연구 및 인과추론) (0)	2022.04.17
[Statistical design] Factorial designs in blocks(RCB) & GCB (0)	2022.03.15
Experimental designs with multiple factors (0)	2022.03.13
statistical design - Mean model vs. Treatment effects model (0)	2022.03.12
통계 실험 디자인하기 - CRD(Completely Randomised design) (0)	2022.03.11