To now we have considered **between-subjects** experimental designs; that is, experimental set-ups where a single subject contributes a single observation to the data set. However, for good reason many researchers adopt **within-subjects** experimental designs: experimental designs where a single subject contributes more than one observation to the data. There are two primary advantages to within subject designs for us:

- A within-subjects design allows us to get more data from each subject. Practically speaking, this means greater power per subject run. This means to reach a given power level, you need to run fewer subjects, saving time, energy, and money.
- A within-subjects design allows you to remove variance due to between participant differences from the error term in the ANOVA. This is another way in which within-subjects designs increase power: each subject essentially serves as his/her own control.

Let’s go back to our simple \(3\times1\) one way experiment from Vasishth and Broe. We recorded 3 observations in each of three treatments:

A | B | C |
---|---|---|

9 | 10 | 8 |

1 | 3 | 1 |

4 | 5 | 2 |

If this were a between-subjects design, then each observation would come from a different participant, meaning we would have run nine subjects in total. Let’s run the ANOVA on the between-variants version of this data and see how it goes:

```
data <- data.frame(value=c(9,1,4,10,3,5,8,1,2),treatment=c('a','a','a','b','b','b','c','c','c'),subject=c(1,2,3,1,2,3,1,2,3))
data$subject <- as.factor(data$subject)
my.anova <- aov(value~treatment,data=data)
summary(my.anova)
```

```
## Df Sum Sq Mean Sq F value Pr(>F)
## treatment 2 8.22 4.111 0.282 0.763
## Residuals 6 87.33 14.556
```

No signficant differences between groups.

Suppose though we had a within-subjects design instead of a between-subjects design.A within-subjects version of this same experiment might involve measuring a subject’s response to each of the treatments A, B, and C:

Subj | A | B | C |
---|---|---|---|

S1 | 9 | 10 | 8 |

S2 | 1 | 3 | 1 |

S3 | 4 | 5 | 2 |

What goes wrong if we analyze this with our between-subjects ANOVA model?

\(x_{ij} = \mu + \alpha_i + \epsilon_{ij}\)

Well, let’s look at the errors \(\epsilon_{ij}\) that would arise with this model, given this data:

```
group.means <- tapply(data$value,data$treatment,mean)
data$errors <- data$value-group.means[data$treatment]
data
```

```
## value treatment subject errors
## 1 9 a 1 4.3333333
## 2 1 a 2 -3.6666667
## 3 4 a 3 -0.6666667
## 4 10 b 1 4.0000000
## 5 3 b 2 -3.0000000
## 6 5 b 3 -1.0000000
## 7 8 c 1 4.3333333
## 8 1 c 2 -2.6666667
## 9 2 c 3 -1.6666667
```

We can see a pretty striking pattern here. Within any subject, the errors are strongly correlated: within subject one, for instance, the errors are always strongly positive. This makes sense: it’s plausible that a subject who gives high scores in one condition will give high scores in another.

But this correlation of errors is not innocent. Remember that one of the core assumptions of ANOVA is that our errors are normally distributed and independent of each other. What’s worse, ANOVA is not robust to violations of the independence assumption. So the high correlation among the errors, due to the subject groupings in the data, is a subsantial violation of our assumptions. We’re also losing power unnecessarily: our error term, which contributes to the denominator in our F statistic, includes one source of variance, by subject variance, that is not really unexplained. We don’t really want that to be considered as part of our error term.

To deal with this situation, we’re going to expand our model to include the effect of subjects in the model:

\(x_{ij} = \mu + \alpha_i + \pi_j +\epsilon_{ij}\)

This looks very similar to our two way ANOVA model from last class: there is an effect of our treatment and an effect of condition. We are essentially treating the subject label as if it were a factor like any other.

Now, one thing to note is that the interaction of subject and treatment is essentially indistinguishable from the error term here, so we’ll just leave it out. (Though we will return below to the issue of treatment by subject interactions.)

As usual, our null hypothesis is whether the \(\alpha_i\) all are equal to zero.

There is an important difference between these models however. In the two-way between-subjects model, our two factors were **fixed effects**: they reprented fixed levels of the factors that we were interested in, and we built a model to account specifically for these effects. If we re-ran the experiment, we’d include exactly those same effects. The same is true of our treatments here. However, our factor of subject is not a fixed effect: we are not interested in estimating the behavior of this particular group of subjects. Their selection was random, and they are simply meant to represent the population from which they were drawn. If we re-ran the experiment, we’d likely get different subjects. Subject we will say, then, is a **random factor**. The model for our within-subjects design, then, is a mixed model, containing random and fixed factors.

So how do we test our null hypothesis with this model? Conceptually, we do it the same way as the between subjects design. We first partition the sum of squares into between and within treatment SS:

\(SS_{total} = SS_{between} + SS_{within}\)

The difference in our mixed model is that we can take the within group variance and break it down into two components:

\(SS_{within} = SS_{subjects} + SS_{error}\)

That is, there is within-group variance that we know to be the effect of subjects, and then there is random, unexplained within-group variance.

As before, we can calculate mean squared error for our treatment, and for our error effect now:

\(MS_{between} = \dfrac{SS_{between}}{a-1}\)

\(MS_{error} = \dfrac{SS_{error}}{(a-1)(n-1)}\)

Where *a* is the number of levels in our factor, and *n* is the number of subjects in the experiment. Our *F*-statistic is calculated as before:

\(F = \dfrac{MS_{between}}{MS_{error}}\)

To calculate this in R, we need to explicitly indicate to R that the error term needs to be split up:

```
my.aov <- aov(value~treatment+Error(subject/treatment),data=data)
summary(my.aov)
```

```
##
## Error: subject
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 2 86.22 43.11
##
## Error: subject:treatment
## Df Sum Sq Mean Sq F value Pr(>F)
## treatment 2 8.222 4.111 14.8 0.0142 *
## Residuals 4 1.111 0.278
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

The calculation of this ANOVA proceeds essentially the same as the one-way ANOVA. The only new term is:

\(SS_{subj} = a*\sum_j (\bar{x}_j - \bar{x})^2\)

That is, the sum of squares associated with the subject is the sum of the deviations of the subject means \(\bar{x}_j\) from the grand mean \(\bar{x}\).

```
group.means <- tapply(data$value,data$treatment,mean)
subj.means <- tapply(data$value,data$subject,mean)
grand.mean <- mean(data$value)
ss.between <- sum((group.means[data$treatment]-grand.mean)^2)
ss.subj <- sum((subj.means[data$subject]-grand.mean)^2)
ss.total <- sum((data$value-grand.mean)^2)
ss.error <- ss.total-ss.between-ss.subj
ms.between <- ss.between / (3-1)
ms.error <- ss.error / ((3-1)*(3-1))
F = ms.between/ms.error
1-pf(F,2,4)
```

`## [1] 0.01417234`

Earlier, I noted that the error term here was essentially the interaction of subject and treatment. This is in fact the case here, although it’s not explicit in the way we calculated it just now. We can check that this is the case, though. Remember how we calculated the SS for an interaction term in our two way ANOVA, given factors A and B:

\(SS_{A:B} = n\sum_i\sum_j (\bar{x}_{ij}-\bar{x}_j - \bar{x}_i+\bar{x})^2\)

In the same fashion, we could calculate the interaction of subject and treatment in our data:

\(SS_{Treatment:S} = n\sum_i\sum_j (\bar{x}_{ij}-\bar{x}_j - \bar{x}_i+\bar{x})^2\)

We can confirm that this SS gives us the same value for the error term that our ANOVA did:

`ss.error`

`## [1] 1.111111`

```
ss.error <- sum((data$value-group.means[data$treatment]-subj.means[data$subject]+grand.mean)^2)
ss.error
```

`## [1] 1.111111`

A thing to note is that our usual post hoc comparison schemes don’t work currently with repeated measure ANOVAs in R. We’re limited, for the time being, to using paired *t*-tests to test for differences between treatment groups:

`t.test(data[data$treatment=="a","value"],data[data$treatment=="b","value"],paired=T)`

```
##
## Paired t-test
##
## data: data[data$treatment == "a", "value"] and data[data$treatment == "b", "value"]
## t = -4, df = 2, p-value = 0.05719
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.7675509 0.1008842
## sample estimates:
## mean of the differences
## -1.333333
```

`t.test(data[data$treatment=="a","value"],data[data$treatment=="c","value"],paired=T)`

```
##
## Paired t-test
##
## data: data[data$treatment == "a", "value"] and data[data$treatment == "c", "value"]
## t = 1.7321, df = 2, p-value = 0.2254
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.484138 3.484138
## sample estimates:
## mean of the differences
## 1
```

`t.test(data[data$treatment=="b","value"],data[data$treatment=="c","value"],paired=T)`

```
##
## Paired t-test
##
## data: data[data$treatment == "b", "value"] and data[data$treatment == "c", "value"]
## t = 7, df = 2, p-value = 0.0198
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.8991158 3.7675509
## sample estimates:
## mean of the differences
## 2.333333
```

Let’s re-examine the NPI experiment from the other day, and now analyze it as a within-subjects experiment, using our repeated measures ANOVA. First, we might read in the raw data from the experiment:

```
npi.data <- read.table('http://people.umass.edu/bwdillon/NPI.SPR.txt',header=T)
npi.data$Item <- as.factor(npi.data$Item)
npi.data$Subject <- as.factor(npi.data$Subject)
npi.data.critical <- droplevels(subset(npi.data,Region=="k1"))
### How do we prep data for rm ANOVA?
library(plyr)
data.bysubjs <- ddply(npi.data.critical,.(Subject,Condition),summarize,RT=mean(RT))
aov.bysubjs <- aov(RT~Condition+Error(Subject/Condition),data=data.bysubjs)
summary(aov.bysubjs)
```

```
##
## Error: Subject
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 52 2924315 56237
##
## Error: Subject:Condition
## Df Sum Sq Mean Sq F value Pr(>F)
## Condition 2 295357 147678 6.076 0.0032 **
## Residuals 104 2527880 24307
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

`t.test(data.bysubjs[data.bysubjs$Condition=="a","RT"],data.bysubjs[data.bysubjs$Condition=="d","RT"],paired=T)`

```
##
## Paired t-test
##
## data: data.bysubjs[data.bysubjs$Condition == "a", "RT"] and data.bysubjs[data.bysubjs$Condition == "d", "RT"]
## t = -2.9133, df = 52, p-value = 0.005261
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -168.59840 -31.06827
## sample estimates:
## mean of the differences
## -99.83333
```

`t.test(data.bysubjs[data.bysubjs$Condition=="a","RT"],data.bysubjs[data.bysubjs$Condition=="g","RT"],paired=T)`

```
##
## Paired t-test
##
## data: data.bysubjs[data.bysubjs$Condition == "a", "RT"] and data.bysubjs[data.bysubjs$Condition == "g", "RT"]
## t = -3.6531, df = 52, p-value = 0.0006028
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -123.40328 -35.89861
## sample estimates:
## mean of the differences
## -79.65094
```

`t.test(data.bysubjs[data.bysubjs$Condition=="d","RT"],data.bysubjs[data.bysubjs$Condition=="g","RT"],paired=T)`

```
##
## Paired t-test
##
## data: data.bysubjs[data.bysubjs$Condition == "d", "RT"] and data.bysubjs[data.bysubjs$Condition == "g", "RT"]
## t = 0.60799, df = 52, p-value = 0.5458
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -46.42928 86.79406
## sample estimates:
## mean of the differences
## 20.18239
```

This analysis allows us to establish that there is a reliable effect of our condition. Recall that we treated subject as a random factor in this model. Our significant *F*-test gives us license to generalize our effect to other levels of our random factor. In other words, if we drew some other random sample from the same population, we would expect that that same would exhibit the same response to our treatment.

Interestingly, since the 70’s psycholinguists have taken it that items in a psycholinguistic experiment of this sort can be treated as random factors. That is, we may want to think of the particular sentences that we tested in our experiment as random samples from a population of possible sentences, each instantiating some treatment of interest. From this point of view, we might be interested in asking whether our effect of condition would generalize to a new sample of items. In the previous example, we tested something similar for subjects by treating subject as a random factor in our model. We could in principle do the same thing for items, and run the analysis the same way: get the average RT by condition, for items, and then use item as a random factor in our ANOVA:

```
data.byitems <- ddply(npi.data.critical,.(Item,Condition),summarize,RT=mean(RT))
aov.byitems <- aov(RT~Condition+Error(Item/Condition),data=data.byitems)
summary(aov.byitems)
```

```
##
## Error: Item
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 26 314819 12108
##
## Error: Item:Condition
## Df Sum Sq Mean Sq F value Pr(>F)
## Condition 2 160176 80088 6.425 0.00321 **
## Residuals 52 648151 12464
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

`t.test(data.byitems[data.byitems$Condition=="a","RT"],data.byitems[data.byitems$Condition=="d","RT"],paired=T)`

```
##
## Paired t-test
##
## data: data.byitems[data.byitems$Condition == "a", "RT"] and data.byitems[data.byitems$Condition == "d", "RT"]
## t = -3.1373, df = 26, p-value = 0.004206
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -173.88168 -36.22414
## sample estimates:
## mean of the differences
## -105.0529
```

`t.test(data.byitems[data.byitems$Condition=="a","RT"],data.byitems[data.byitems$Condition=="g","RT"],paired=T)`

```
##
## Paired t-test
##
## data: data.byitems[data.byitems$Condition == "a", "RT"] and data.byitems[data.byitems$Condition == "g", "RT"]
## t = -4.5138, df = 26, p-value = 0.0001213
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -112.73071 -42.18464
## sample estimates:
## mean of the differences
## -77.45767
```

`t.test(data.byitems[data.byitems$Condition=="d","RT"],data.byitems[data.byitems$Condition=="g","RT"],paired=T)`

```
##
## Paired t-test
##
## data: data.byitems[data.byitems$Condition == "d", "RT"] and data.byitems[data.byitems$Condition == "g", "RT"]
## t = 0.74989, df = 26, p-value = 0.4601
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -48.0467 103.2372
## sample estimates:
## mean of the differences
## 27.59524
```

By running two separate analyses, one treating subjects as a random factor, and the other treating items as a random factor, we have some degree of confidence that the effect we observe will generalize to new items and subjects. In other words, we have reason to believe our effect isn’t due to our particular group of subjects or items. By-subject test stats are typically subscripted with 1, as in \(F_1\) or \(t_1\). By-item stats are subscripted with 2: \(F_2\) or \(t_2\).

At this point, we need to be clear about an additional statistical assumption that repeated measures ANOVAs make that our regular independent ANOVAs did not. Repeated measures ANOVAs assume that **the variance of the differences between conditions is the same for all pairwise comparisons of levels within a factor**. When this assumption is met, we say that **sphericity** holds. If you have more than three levels in a factor in a repeated measures ANOVA, it is customary to check that sphericity holds. If it does not, then we need to apply a correction to our *F*-statistic.

We can check sphericity easily by using a new package called `ez`

:

```
#install.packages('ez')
library(ez)
my.aov <- ezANOVA(data.bysubjs,dv=.(RT),within=.(Condition),wid=.(Subject))
my.aov
```

```
## $ANOVA
## Effect DFn DFd F p p<.05 ges
## 2 Condition 2 104 6.075661 0.00319505 * 0.05138824
##
## $`Mauchly's Test for Sphericity`
## Effect W p p<.05
## 2 Condition 0.7658908 0.001112385 *
##
## $`Sphericity Corrections`
## Effect GGe p[GG] p[GG]<.05 HFe p[HF]
## 2 Condition 0.8103011 0.006015691 * 0.8326014 0.005583207
## p[HF]<.05
## 2 *
```

`ezANOVA`

will run an ANOVA on the data contained in a data frame if we specifiy the dependent variable (`dv=.(..)`

), within-groups factors (`within=.(...)`

), and between-groups factors (`between=.(...)`

. If there are within-groups factors, we should further specify which factor determines the groups (`wid=.(...)`

). Thus, the call above says ’run a repeated measures ANOVA with `Condition`

as a within-group fixed effect and with `Subject`

as a random effect.

The first part of the printout has exactly the same information as we found in our regular `aov`

analysis. It additionally gives us a generalized \(\eta^2\) measure (`ges`

), which is a measure of effect size that is sometimes reported with the results of an ANOVA.

The second part has a statistical test called **Mauchly’s test for sphericity.** As the name suggests, this tests whether or not the sphericity assumption is met in our data. A significant effect at *p* < 0.05 indicates that sphericity is most likely *not* met in our data, meaning that we should be suspicious of the *F*-value we observe in our test.

Given that sphericity does not hold, we need to apply a correction to the degrees of freedom. On the third line, two different corrections are offered: the **Greenhouse-Geisser** (`GG`

) correction, and the **Huynh-Feldt** (`HF`

) correction. Both of these corrections provide a value \(\epsilon\), which is the proposed correction to the degrees of freedom on each of the two corrections. Given one of these two \(\epsilon\) values, we can calculate corrected degrees of freedom, and so get an adjusted *p*-value:

`1-pf(6.076,2,104)`

`## [1] 0.003194079`

```
epsilon <- 0.8103011
1-pf(6.076,2*epsilon,104*epsilon)
```

`## [1] 0.006014162`

`### F(1.6,84.3) = 6.08, p < 0.05`

The two corrections provide slightly different answers. A rough rule of thumb that lots of researchers use is to check the Greenhouse-Geisser \(\epsilon\), and see if it is greater or less than .75. If it is greater than .75, then the Huynh-Feldt correction is recommended. Less than .75, and the Greenhouse-Geisser correction is recommended.

`ezANOVA`

is a really useful tool. It can actually compute ANOVAs for you if you haven’t gone through the preprocessing steps to calculate cell means before analysis:

```
my.aov <- ezANOVA(data.bysubjs,dv=.(RT),within=.(Condition),wid=.(Subject))
my.aov
```

```
## $ANOVA
## Effect DFn DFd F p p<.05 ges
## 2 Condition 2 104 6.075661 0.00319505 * 0.05138824
##
## $`Mauchly's Test for Sphericity`
## Effect W p p<.05
## 2 Condition 0.7658908 0.001112385 *
##
## $`Sphericity Corrections`
## Effect GGe p[GG] p[GG]<.05 HFe p[HF]
## 2 Condition 0.8103011 0.006015691 * 0.8326014 0.005583207
## p[HF]<.05
## 2 *
```

`my.aov <- ezANOVA(npi.data.critical,dv=.(RT),within=.(Condition),wid=.(Subject))`

```
## Warning: Collapsing data to cell means. *IF* the requested effects are a
## subset of the full design, you must use the "within_full" argument, else
## results may be inaccurate.
```

`my.aov`

```
## $ANOVA
## Effect DFn DFd F p p<.05 ges
## 2 Condition 2 104 6.075661 0.00319505 * 0.05138824
##
## $`Mauchly's Test for Sphericity`
## Effect W p p<.05
## 2 Condition 0.7658908 0.001112385 *
##
## $`Sphericity Corrections`
## Effect GGe p[GG] p[GG]<.05 HFe p[HF]
## 2 Condition 0.8103011 0.006015691 * 0.8326014 0.005583207
## p[HF]<.05
## 2 *
```

This makes it a snap to run by-subject and by-item ANOVAs:

`f1.analysis <- ezANOVA(npi.data.critical,dv=.(RT),within=.(Condition),wid=.(Subject))`

```
## Warning: Collapsing data to cell means. *IF* the requested effects are a
## subset of the full design, you must use the "within_full" argument, else
## results may be inaccurate.
```

`f2.analysis <- ezANOVA(npi.data.critical,dv=.(RT),within=.(Condition),wid=.(Item))`

```
## Warning: Collapsing data to cell means. *IF* the requested effects are a
## subset of the full design, you must use the "within_full" argument, else
## results may be inaccurate.
```

It’s also possible to have `ezANOVA`

explicitly calculate an `aov`

object for you, and return it:

`f1.analysis <- ezANOVA(npi.data.critical,dv=.(RT),within=.(Condition),wid=.(Subject),return_aov=T)`

```
## Warning: Collapsing data to cell means. *IF* the requested effects are a
## subset of the full design, you must use the "within_full" argument, else
## results may be inaccurate.
```

```
f1.aov <- f1.analysis$aov
summary(f1.aov)
```

```
##
## Error: Subject
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 52 2924315 56237
##
## Error: Subject:Condition
## Df Sum Sq Mean Sq F value Pr(>F)
## Condition 2 295357 147678 6.076 0.0032 **
## Residuals 104 2527880 24307
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

What about two-way, within-subjects designs?

Suppose we had two two-level factors, A and B, that we manipulated within subjects. As before, the strategy will be very much the same: factor out subject variance from the error term, and develop F tests based around the SS for each of our treatements and their interaction.

```
data <- data.frame(value=c(9,1,4,10,3,5,8,1,2,10,3,4),A=c('Y','Y','Y','N','N','N','Y','Y','Y','N','N','N'),B=c('Y','Y','Y','Y','Y','Y','N','N','N','N','N','N'),subject=c(1,2,3,1,2,3,1,2,3,1,2,3))
data$subject <- as.factor(data$subject)
tapply(data$value,list(data$A,data$B),mean)
```

```
## N Y
## N 5.666667 6.000000
## Y 3.666667 4.666667
```

```
my.anova <- aov(value~A*B+Error(subject/(A*B)),data=data)
summary(my.anova)
```

```
##
## Error: subject
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 2 114.5 57.25
##
## Error: subject:A
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 8.333 8.333 100 0.00985 **
## Residuals 2 0.167 0.083
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: subject:B
## Df Sum Sq Mean Sq F value Pr(>F)
## B 1 1.333 1.3333 2.286 0.27
## Residuals 2 1.167 0.5833
##
## Error: subject:A:B
## Df Sum Sq Mean Sq F value Pr(>F)
## A:B 1 0.3333 0.3333 4 0.184
## Residuals 2 0.1667 0.0833
```

The output here looks a little different: what’s going on? When we break up the sum of squares for a two-way, within-subjects design, we make explicit something that was only implicit in the one way ANOVA: **our error term represents the interaction of the treatment factor with the subject factor**. That is, the variance that goes into the error term in the one way analysis repeated measures analysis can be thought of as the difference between the predicted effect of treatment, and the actual effect of treatment observed for any given subject. This makes intuitive sense of our *F*-statistic: for a repeated measures ANOVA analysis, we want to know how consistent our treatment effect is across levels of our random factor. Remember, our statistic is the mean squared error for our treatment divided by the mean squared error of the treatment by subject interaction. So to the extent that there is little variation in the effect across subjects, the *F* statistic will be large, and we will reject the null hypothesis.

In a two-way, within-subjects design, we split up the error term into interactions with subjects *for each factor and their interaction*:

\(SS_{total} = SS_{subj}+SS_{A}+SS_{A:subj}+SS_{B}+SS_{B:subj}+SS_{A:B}+SS_{A:B:subj}\)

The SS for interaction term for A:S, or the by-subject error associated with the A factor, is calculated just like the SS for an interaction in the two-way ANOVA:

\(SS_{A:subj} = b\sum_i\sum_j (\bar{x}_{ij}-\bar{x}_i-\bar{x}_j+\bar{x})^2\)

Looping over *i*, levels of A, and *j*, levels of the subject factor; *b* is the number of levels in factor B.

In order to calculate MS for each of these terms, we divide by the relevant degrees of freedom. For the main effects of A and B and their interaction, the d.f. are as in the two way ANOVA: \((a-1)\),\((b-1)\), and \((a-1)(b-1)\), respectively. The d.f. for the interactions with subjects are these error terms multiplied by \((n-1)\), *n* the number of subjects:

\(MS_{A:subj} = SS_{A:subj}/((n-1)(a-1))\)

\(MS_{B:subj} = SS_{B:subj}/((n-1)(b-1))\)

\(MS_{A:B:subj} = SS_{A:B:subj}/((n-1)(b-1)(a-1))\)

Now we’re closer to understanding the output of our two way, repeated measures ANOVA table. The *F* tests for our main effects and our interaction here the mean squares associated with the factor divided by the mean squares of the interaction of that factor with subjects:

\(F = \dfrac{MS_{A}}{MS_{A:subj}}\)

\(F = \dfrac{MS_{B}}{MS_{B:subj}}\)

\(F = \dfrac{MS_{A:B}}{MS_{A:B:subj}}\)

`summary(my.anova)`

```
##
## Error: subject
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 2 114.5 57.25
##
## Error: subject:A
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 8.333 8.333 100 0.00985 **
## Residuals 2 0.167 0.083
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: subject:B
## Df Sum Sq Mean Sq F value Pr(>F)
## B 1 1.333 1.3333 2.286 0.27
## Residuals 2 1.167 0.5833
##
## Error: subject:A:B
## Df Sum Sq Mean Sq F value Pr(>F)
## A:B 1 0.3333 0.3333 4 0.184
## Residuals 2 0.1667 0.0833
```

We can use `ezANOVA`

to do this for us as well:

`my.ANOVA <- ezANOVA(data,dv=.(value),within=.(A,B),wid=.(subject))`

And the results are printed in an ANOVA table that contains only the minimal information needed to report the ANOVA, along with the generalized \(\eta^2\) effect size.

We note that here we do not see any tests for sphericity; this is expected, because it is actually impossible to violate sphericity when there are only two levels in a factor (**Question**: Why is this so?).