Dept. of Biostatistics and Epidemiology at the :

BioEpi 740: Mixed Models and Analysis of Repeated Measures/Longitudinal Data

Overview
Content
Problems
Resources
Assignments
Research Problems
Exams
Final Exam Solutions

Spring 1998


A study was conducted to evaluate the impact of dietary/exercise interventions on subject's weight over time. There were three intervention groups, with each subject randomly assigned to one of the three interventions. Each subject was measured at four consecutive equally spaced time points. The main objective of the study was to identify how the different interventions affected weight over time. The interventions corresponded to exercise (E) using the NordicTrack, dietary changes (D), and a combination of diet plus exercise (DE).


1. The first objective is to examine the covariance structure for subjects in the Dietary Change group. We will do this by fitting several models with different possible correlation structures via the SAS program fe98p4.sas. First we fit models based only on subjects with complete data. Run the program, and summarize the results in several tables. In the first table, let columns correspond to the following:

  1. Model number
  2. Type of Variance
  3. Subject Component (estimate)
  4. Autocorrelation (estimate)
  5. Response Error (estimate)
  6. Number of variance parameters
  7. -2 Log(likelihood)

 

Table 1. Summary of Variance Structures and Estimates

Model #

Variance Structure

Estimate of Subject

Variance

Autocorr

Estimate

Estimate of Response Error Var

# Variance Parameters

-2Log

(Likelihood)

1

Compound Symmetry

47.302

na

10.750

2

247.336

2

Random Intercept

47.302

na

10.750

2

247.336

3

Unstructured

na

na

na

10

225.423

4

AR(1)

na

0.899

67.616

2

242.001

5

Independence

na

na

58.052

1

285.560

6

Compound Symmetry & AR(1)

18.805

0.858

49.126

3

241.984

a. Using likelihood ratio tests, which variance structure is best?

Based on a straight comparison of the likelihood ratio tests, the model with the largest likelihood is the unstructured model. This is to be expected since you can explain more of the data by using more parameters to fit the data. This approach has its limitation in over fitting, and requiring a more complicated explanation. The chi square tests indicate that model 3 favored (at alpha=0.05) over the other models. Both model 1 (and 2) and 4 are significantly better than model 5. Also, model 6 appears to be better than model 2 (and in terms of log likelihood), equivalent to model 4. However, the mean structure must be correctly modeled before an autocorrelation structure is appropriate. For this reason, of the more simplified models, model 6 appears to be best, appears to be an improvement over model 1(or 2). Note that model 2 is equivalent to model 1.

b. Obtain the variance matrix based on an unstructured variance. This variance matrix corresponds to the variance that would be estimated in a multi variate analysis. Examine the diagonal elements in this matrix, and discuss these estimates.

The diagonal elements in the matrix are not equal. It appears that at 3 months and 12 months there is larger variance than at the two intermediate times. The differences in covariances may be attributed in part to these differences in variances. Such a result may occur if there are separate regression lines for subjects that cross in the middle time points. With such crossing, the variance will be lower. Further out on either end, the variance will be larger.

c. Now summarize results on estimates of mean response at each time point. Create a table that includes the treatment group means at each time (Table 2) for each possible variance structure.

Table 2. Summary of Estimates over Time

Model #

Variance Structure

Estimate at Time 1

Estimate at Time 2

Estimate at Time 3

Estimate at Time 4

1

Compound Symmetry

94.32

93.41

93.85

92.89

2

Random Intercept

94.32

93.41

93.85

92.89

3

Unstructured

94.32

93.41

93.85

92.89

4

AR(1)

94.32

93.41

93.85

92.89

5

Independence

94.32

93.41

93.85

92.89

6

Compound Symmetry & AR(1)

94.32

93.41

93.85

92.89


Note that all the estimates of the means are equal.

d. Create a similar table (Table 3) that summarizes the standard errors for the treatment group means for the different models. Discuss differences in results for the standard errors, and how you would choose one to present.

Table 3. Summary of SE of Estimates over Time

Model #

Variance Structure

Estimate at Time 1

Estimate at Time 2

Estimate at Time 3

Estimate at Time 4

1

Compound Symmetry

2.297

2.297

2.297

2.297

2

Random Intercept

2.297

2.297

2.297

2.297

3

Unstructured

2.624

2.059

1.937

2.496

4

AR(1)

2.479

2.479

2.479

2.479

5

Independence

2.297

2.297

2.297

2.297

6

Compound Symmetry & AR(1)

2.467

2.467

2.467

2.467



For three of the models (1,2, and 5) the same SE occurs for all time points and models. This result is equal to the square root of (58.052/11), the simple estimate of the SE of the mean for a group. Also note that 58.05=47.30+10.75, so that the SE are equal for models 1&2 and model 5.

The unstructured error changes with different times. This reflects changes in variance in the data, something that none of the other models capture.

  • a. Using likelihood ratio tests, which variance structure is best?
  • b. Obtain the variance matrix based on an unstructured variance. This variance matrix corresponds to the variance that would be estimated in a multi variate analysis. Examine the diagonal elements in this matrix, and discuss these estimates.
  • c. Now summarize results on estimates of mean response at each time point. Create a table that includes the treatment group means at each time (Table 2) for each possible variance structure.
  • d. Create a similar table (Table 3) that summarizes the standard errors for the treatment group means for the different models. Discuss differences in results for the standard errors, and how you would choose one to present.


2. We consider a compound symmetric variance structure in more detail here. Fit the models in the program fe98p5.sas.

  • A. Answer the following questions using the output from Model #2.1.
    • i. Calculate the average weight over all subjects and times. Show how you can obtain this estimate from the fixed effects estimates from the model.
    • ii. Calculate the average weight for subject ID=9704. Also calculate the average predicted weight for subject ID=9704. Using the overall mean, the average weight for ID=9704, and the variance parameters estimated in Model #2.1, show how the average predicted weight was obtained.
  • B. Review the results of Model #2.2 which is constructed from residuals from model #2.1. Obtain the variance matrix for this unstructured model, and compare it with the unstructured variance matrix obtained in Question #1. What pattern does the variation by diagonal elements suggest?
  • C. Models #2.3 and 2.4 have difference variance structures, with model #2.4 including an autocorrelation parameter. Use a likelihood ratio test to determine whether the auto-correlation pararmeter should be included in the model. Do you think a first order-autocorrelation should be included in the model?
  • D. Compare your results with the results in part C with the results in Question #1 concerning the appropriateness of adding a 1st order auto-correlation. Discuss any conclusions you may be able to draw from these comparisons.
  • E. Models #2.5 to 2.7 are constructed after first subtracting the subject mean for each subject from the observations. Compare the MSE of Model 2.3 and Model 2.6. Which model has smaller MSE? Which model better fits the data? Discuss your conclusions.


3. We consider another type of model for the dietary data here. Since the dietary protocol was administered over time, we might expect continual change in weight among subjects under different protocols. Different subjects may have different patterns over time. A simple model to represent these patterns is a linear regression model, where the intercept and slope are allowed to vary between subjects. We fit such a model in fe98p6.sas.

  • A. Run the program to fit a model with random effects for the intercepts and slopes. For the first subject, write out the model in terms of the mixed model matrix notation, and express each matrix in detail.
  • B. Compare different models with linear effects for time. Use likelihood ratio tests to examine differences between models with different variance structures. In particular, compare a complete independence model with the random intercept model, with the random intercept and slope model, with the random intercept and slope model allowing correlation of the intercept and slopes. Which of these models appears to represent the data best?
  • C. Obtain a plot of the predicted values from a random intercept and slope model (using fe98p6.sas), and write the regression equation predicted for subject ID=9704.
  • D. Run the program fe98p7.sas. Develop a simple set of matrix equations that parallel the results in the program, clearly defining each matrix and vector. Present the results of these calculations for ID=9704 for ordinary least squares and BLUP estimates of intercepts and slopes.


4. Using the full data set, develop a model that you consider to be best for representing the results of the study, and comparing the three interventions. Describe the model adequately, list the PROC Mixed code, and discuss why you selected the model.


Last Update: 5/5/99
Comments: Ed Stanek
Email:
stanek@schoolph.umass.edu
\ed\web\be740\webready\fe98d2.html