|
Final Exam
Solutions
Spring
1998
A study was conducted to evaluate the impact of
dietary/exercise interventions on subject's weight over
time. There were three intervention groups, with each
subject randomly assigned to one of the three interventions.
Each subject was measured at four consecutive equally spaced
time points. The main objective of the study was to identify
how the different interventions affected weight over time.
The interventions corresponded to exercise (E) using the
NordicTrack, dietary changes (D), and a combination of diet
plus exercise (DE).
1. The first objective is to examine the covariance
structure for subjects in the Dietary Change group. We will
do this by fitting several models with different possible
correlation structures via the SAS program fe98p4.sas.
First we fit models based only on subjects with complete
data. Run the program, and summarize the results in several
tables. In the first table, let columns correspond to the
following:
- Model number
- Type of Variance
- Subject Component (estimate)
- Autocorrelation (estimate)
- Response Error (estimate)
- Number of variance parameters
- -2 Log(likelihood)
Table 1. Summary of Variance Structures and
Estimates
|
Model #
|
Variance Structure
|
Estimate of Subject
Variance
|
Autocorr
Estimate
|
Estimate of Response Error
Var
|
# Variance Parameters
|
-2Log
(Likelihood)
|
|
1
|
Compound Symmetry
|
47.302
|
na
|
10.750
|
2
|
247.336
|
|
2
|
Random Intercept
|
47.302
|
na
|
10.750
|
2
|
247.336
|
|
3
|
Unstructured
|
na
|
na
|
na
|
10
|
225.423
|
|
4
|
AR(1)
|
na
|
0.899
|
67.616
|
2
|
242.001
|
|
5
|
Independence
|
na
|
na
|
58.052
|
1
|
285.560
|
|
6
|
Compound Symmetry &
AR(1)
|
18.805
|
0.858
|
49.126
|
3
|
241.984
|
a. Using likelihood ratio tests, which variance structure
is best?
Based on a straight comparison of the likelihood ratio
tests, the model with the largest likelihood is the
unstructured model. This is to be expected since you can
explain more of the data by using more parameters to fit the
data. This approach has its limitation in over fitting, and
requiring a more complicated explanation. The chi square
tests indicate that model 3 favored (at alpha=0.05) over the
other models. Both model 1 (and 2) and 4 are significantly
better than model 5. Also, model 6 appears to be better than
model 2 (and in terms of log likelihood), equivalent to
model 4. However, the mean structure must be correctly
modeled before an autocorrelation structure is appropriate.
For this reason, of the more simplified models, model 6
appears to be best, appears to be an improvement over model
1(or 2). Note that model 2 is equivalent to model 1.
b. Obtain the variance matrix based on an unstructured
variance. This variance matrix corresponds to the variance
that would be estimated in a multi variate analysis. Examine
the diagonal elements in this matrix, and discuss these
estimates.
The
diagonal elements in the matrix are not equal. It appears
that at 3 months and 12 months there is larger variance than
at the two intermediate times. The differences in
covariances may be attributed in part to these differences
in variances. Such a result may occur if there are separate
regression lines for subjects that cross in the middle time
points. With such crossing, the variance will be lower.
Further out on either end, the variance will be larger.
c. Now summarize results on estimates of mean response at
each time point. Create a table that includes the treatment
group means at each time (Table 2) for each possible
variance structure.
Table 2. Summary of Estimates over Time
|
Model #
|
Variance Structure
|
Estimate at Time 1
|
Estimate at Time 2
|
Estimate at Time 3
|
Estimate at Time 4
|
|
1
|
Compound Symmetry
|
94.32
|
93.41
|
93.85
|
92.89
|
|
2
|
Random Intercept
|
94.32
|
93.41
|
93.85
|
92.89
|
|
3
|
Unstructured
|
94.32
|
93.41
|
93.85
|
92.89
|
|
4
|
AR(1)
|
94.32
|
93.41
|
93.85
|
92.89
|
|
5
|
Independence
|
94.32
|
93.41
|
93.85
|
92.89
|
|
6
|
Compound Symmetry & AR(1)
|
94.32
|
93.41
|
93.85
|
92.89
|
Note that all the estimates of the means are equal.
d. Create a similar table (Table 3) that summarizes the
standard errors for the treatment group means for the
different models. Discuss differences in results for the
standard errors, and how you would choose one to
present.
Table 3. Summary of SE of Estimates over Time
|
Model #
|
Variance Structure
|
Estimate at Time 1
|
Estimate at Time 2
|
Estimate at Time 3
|
Estimate at Time 4
|
|
1
|
Compound Symmetry
|
2.297
|
2.297
|
2.297
|
2.297
|
|
2
|
Random Intercept
|
2.297
|
2.297
|
2.297
|
2.297
|
|
3
|
Unstructured
|
2.624
|
2.059
|
1.937
|
2.496
|
|
4
|
AR(1)
|
2.479
|
2.479
|
2.479
|
2.479
|
|
5
|
Independence
|
2.297
|
2.297
|
2.297
|
2.297
|
|
6
|
Compound Symmetry & AR(1)
|
2.467
|
2.467
|
2.467
|
2.467
|
For three of the models (1,2, and 5) the same SE occurs
for all time points and models. This result is equal to the
square root of (58.052/11), the simple estimate of the SE of
the mean for a group. Also note that 58.05=47.30+10.75, so
that the SE are equal for models 1&2 and model 5.
The unstructured error changes with different times. This
reflects changes in variance in the data, something that
none of the other models capture.
- a. Using likelihood ratio tests, which variance
structure is best?
- b. Obtain the variance matrix based on an
unstructured variance. This variance matrix corresponds
to the variance that would be estimated in a multi
variate analysis. Examine the diagonal elements in this
matrix, and discuss these estimates.
- c. Now summarize results on estimates of mean
response at each time point. Create a table that includes
the treatment group means at each time (Table 2) for each
possible variance structure.
- d. Create a similar table (Table 3) that summarizes
the standard errors for the treatment group means for the
different models. Discuss differences in results for the
standard errors, and how you would choose one to
present.
2. We consider a compound symmetric variance structure in
more detail here. Fit the models in the program fe98p5.sas.
- A. Answer the following questions using the output
from Model #2.1.
- i. Calculate the average weight over all subjects
and times. Show how you can obtain this estimate from
the fixed effects estimates from the model.
- ii. Calculate the average weight for subject
ID=9704. Also calculate the average predicted weight
for subject ID=9704. Using the overall mean, the
average weight for ID=9704, and the variance
parameters estimated in Model #2.1, show how the
average predicted weight was obtained.
- B. Review the results of Model #2.2 which is
constructed from residuals from model #2.1. Obtain the
variance matrix for this unstructured model, and compare
it with the unstructured variance matrix obtained in
Question #1. What pattern does the variation by diagonal
elements suggest?
- C. Models #2.3 and 2.4 have difference variance
structures, with model #2.4 including an autocorrelation
parameter. Use a likelihood ratio test to determine
whether the auto-correlation pararmeter should be
included in the model. Do you think a first
order-autocorrelation should be included in the
model?
- D. Compare your results with the results in part C
with the results in Question #1 concerning the
appropriateness of adding a 1st order auto-correlation.
Discuss any conclusions you may be able to draw from
these comparisons.
- E. Models #2.5 to 2.7 are constructed after first
subtracting the subject mean for each subject from the
observations. Compare the MSE of Model 2.3 and Model 2.6.
Which model has smaller MSE? Which model better fits the
data? Discuss your conclusions.
3. We consider another type of model for the dietary data
here. Since the dietary protocol was administered over time,
we might expect continual change in weight among subjects
under different protocols. Different subjects may have
different patterns over time. A simple model to represent
these patterns is a linear regression model, where the
intercept and slope are allowed to vary between subjects. We
fit such a model in fe98p6.sas.
- A. Run the program to fit a model with random effects
for the intercepts and slopes. For the first subject,
write out the model in terms of the mixed model matrix
notation, and express each matrix in detail.
- B. Compare different models with linear effects for
time. Use likelihood ratio tests to examine differences
between models with different variance structures. In
particular, compare a complete independence model with
the random intercept model, with the random intercept and
slope model, with the random intercept and slope model
allowing correlation of the intercept and slopes. Which
of these models appears to represent the data best?
- C. Obtain a plot of the predicted values from a
random intercept and slope model (using fe98p6.sas), and
write the regression equation predicted for subject
ID=9704.
- D. Run the program fe98p7.sas.
Develop a simple set of matrix equations that parallel
the results in the program, clearly defining each matrix
and vector. Present the results of these calculations for
ID=9704 for ordinary least squares and BLUP estimates of
intercepts and slopes.
4. Using the full data set, develop a model that you
consider to be best for representing the results of the
study, and comparing the three interventions. Describe the
model adequately, list the PROC Mixed code, and discuss why
you selected the model.
|