Dept. of Biostatistics and Epidemiology at the :

BioEpi 740: Mixed Models and Analysis of Repeated Measures/Longitudinal Data

Overview
Content
Problems
Resources
Assignments
Research Problems
Exams
Homework Assignment #2

Reading:

1. Re-read Searle et al. (1992). Pages 1-9. For each example discussed, consider how you would describe the study in terms of a survey, an observational study, or an experiment. Can you define the population, the sampling, the randomization?


Problems:

2.1. Potentially observable data for two experiments (2A and 2B) are given in class. Each "cross" in the potentially observable population represents a subject. The subject ID id given by the number circled at the center of the cross. The numbers on the ends of the crosses corresponds to the potentially observable response for the subject when given different treatments. The treatments for experiment 2A are (green, yellow, blue, and red). The treatments for experiment 2B are (brown, black, orange, and purple).

In Class:

A. Conduct a completely randomized study where you randomly allocate one treatment to each of 3 subjects, and observed the response. Write a brief protocol that describes how you conduct the randomization, and record the data for your experiment.

After Class:

B. Analyze the data from the experiment that you conducted in A). Write up and interpret the results of your study.

C. Using a different experiment, do the following:

i. Conduct a completely randomized study where you randomly allocate one treatment to each of 3 subjects, and observed the response. Write a brief protocol that describes how you conduct the randomization, and record the data for your experiment. Analyze the results of this experiment, and interpret your results.

ii. Enumerate the potentially observable population.

iii. Evaluate parameters for each treatment group.

iv. Compute the power of a 1-Way ANOVA to detect a statistically significant treatment effect with the given design.

v. Be prepared to discuss how a study could designed to have adequate power for the study.


2.2. One factor completely randomized experimental studies can be designed using PROC PLAN (see solutions to Homework 1) or by selecting a random permutation of study subjects. To study the long run properties of a randomized study we need to repeat the randomization, and describe the properties of estimators and test statistics. This study will require knowing the potentially observable population, since different realizations of the experiment will result in different treatment assignments to the subjects.

We develop a program that can simulate the conduct of many experiments for a one factor completely randomized design.

a. We first consider a simple problem where there is a population of M=5 subjects. We assume that there are A=2 treatments, and we want to conduct a completely randomized design that randomly assigns n=2 subjects to each treatment. We assume that the potentially observable response are known. We develop a program that does the following:

  • reads the subject ids and the potentially observable responses for the population.
  • strings out the subjects and potentially observable responses into one long vector
  • reads in a vector that corresponds to the treatment assignment for a random permutation of subjects identified by position.
  • reads the vector of ids and responses into an array (a matrix), and randomly permutes the subjects creating a vector that corresponds to the realized response for the subject-treatment combination.

Run the program D40P6a.sas, review the logic and statements, and examine the output. Use SAS manuals to help clarify any statements you don't understand, and write out a list unanswered questions to be discussed in class.

b. The results of part a) are contained in a file that has one row for each "trial", with the realized responses for the treatments in columns for the trial. Analysis programs such as PROC ANOVA, or PROC MIXED require data to be arranged with one row per response. We re-arrange the data in the progam D40P6b.sas. Examine the output, and the statements for the program.

c. A typical one way analysis of variance can be fit to the data from a trial using PROC GLM or PROC MIXED. We use both program to produce, and compare output in D40P6c.sas. Review the output from these programs, and identify the comparable pieces. Read the SAS manual for PROC MIXED compare the output. Change the number of replications to 2, and re-run the program.


2.3. Studying The Long Run Properties of Statistics in a 1 Way ANOVA

To study long run properties we need to automate the simulation program. We automate the simulation here using a small example, developing a program that will conduct a simulation, and output various statistics from each trial in the simulation to a data set. This data set can later be examined. We follow several steps in this construction. First, using only one or two trials, we develop a program that will create a small data set. Next, we add statements that remove other output from the program, so as to have a more streamlined version. Finally, we conduct the simulation and describe the results.

a. We focus on output from PROC MIXED since the program is more general. Statements are added to save particular output in data sets, and arrange the data so that there is a single record for each trial in the simulation. See program D40P6d.sas and the output.

b. The format for the data set is fine, but there is much additional output. Running 1000 trials would result in a large output file created. We add statements to minimize the output. The program is D40P6e.sas, with brief output.

c. Finally, we simulate the distribution of the F statistic for this study, and compare the distribution under two treatments to the distribution under the null hypothesis. We illustrate the distribution of the F statistic under the alternative with D40P6f.sas, and this output.

The null distribution is given in D40P6g.sas and this output, which is obtained by setting the values for the input data to be identical for the two treatment groups.


2.4. Searle et al (1991) describes a clinical trial in Example 2, p8 that was considered in Assignment #1, problem #1. There were four possible treatments, with the potentially observable data given in the file hw1a.txt.

a. We adapt the program D40p6e.sas to simulate the experiment, and then use the results of the simulation to evaluate the power of the study to detect a treatment effect (see D40P6h.sas and the resulting output). This program conducts only one simulation, but displays tables that illustrate the processing of the data.

b. We remove the extra "print" statements, and run an operational version of the simulation program here. Our simulation includes 2000 hypothetical experiments. The simulation program is D40P6j.sas, with the following output.

c. Run an additional program to generate the null distribution of the F statistics corresponding to D40P6j.sas . Also, calculate the power (assuming normality) based on the potentially observable population. Use the results in b) and c) to compare your two values of the power of the design.


2.5 Basics on Matrix Algebra.

a. Read the handout on Introduction to Matrix Algebra, and work through the handout examples using PROC IML up to page 22.


2.6. Using simple matrix algebra to fit a model.

We use PROC IML to represent a simple model for the realized experiment that was obtained in D40P6b.sas (see output). This problem was discussed in 2.2b. A similar discussion is given in Searle et al (p47).

a. The program D40P6m1.sas creates a data vector and a design matrix for the model. We also use the model to estimate using least squares parameters for the treatment groups, and other terms in the ANOVA table (see output and output from D40P6c.sas )

b. Modify the program D40P6m1.sas to fit the data for the second replication of the experiment from D40P6b.sas . Relate the results to the output from D40P6c.sas .

c. Use the results of the experiment conducted in D40P6h.sas and IML to estimate the LSMEANS, the Model SS, and the error SS. Check your results with PROC MIXED.

d. The solution to 2.6c is given in D40P6hb.sas, where the parameter estimates are constructed based on a cell mean design matrix. Along with the estimates, linear combinations are defined that estimate all possible pair wise differences between treatments, and their variance. Different parameterizations are possible. For example, x2 defines a deviations from means parameterization, and x3 defines a reference cell parameterization. Define linear combinations of parameters from these parameterizations that correspond to all possible differences between treatment group means (by defining an appropriate C matrix), and evaluate the differences and their variances using PROC IML.


2.7. Practice with Linear Combinations

a. Suppose dietary intake is independent between subjects, and between days on a given subject. Assume also that the standard deviation in intake (from day to day) is identical for all subjects, which we represent by v. Also suppose that the long run average intake for each subject is identical. Use vectors and linear combinations to derive the variance in intake of the average formed by measuring intake on 1 day each for a simple random sample of 3 subjects.

b.Suppose dietary intake is independent between subjects, and between days on a given subject. Assume also that the standard deviation in intake (from day to day) is identical for all subjects, which we represent by ve. It is unrealistic to assume that the long-run average intake is the same for each subject. In a population, let us assume the standard deviation in the long run intake is vb. Now suppose that on a simple random sample of 3 subjects we have 3 days of reported intake on subject 1, 1 day of reported intake on subject 2, and 5 days of reported intake on subject 3. Use vectors and linear combinations to derive the variance in average intake formed by taking the simple average of each subject's average intake.


2.8. Designing a Simple Randomized Study

We use data from the Seasons Study (see Research Problems) here to consider design of a simple randomized study. The data collected on a subset of subjects from this study are to be used to help design a completely randomized study to compare an intervention with a control. We assume the intervention (to be considered practically valuable) will have to change the variable by 10%. Design a study with adequate power to detect this effect, assuming only one 24-hour recall measure will be collected per subject.


2.9. A study designed to evaluate the effect of potting soil and fertilizers on growth of chrysanthemum plants is described by Searle et al in Example 3, p9. The potentially observable responses (corresponding to height (in inches) of the plant after 1 month) are given below. This assignment focuses on this example. You may which to make use of programs given in the Solution to homework assignment #1. The data were created by d40p7.sas and are saved as hw2a.txt .

  • A. Write a program using PROC PLAN (example: d40p3a.sas) to conduct the randomization described by Searle et al.
  • B. Use the data on potentially observable responses to obtain data for your experiment.
  • C. Use PROC ANOVA to analyze the experiment. Is there evidence of a significant fertilizer effect? potting soil effect?
  • D. Use the random permutation program (d40p4a.sas) to conduct the randomization described by Searle et al.
  • E. Use the data on potentially observable responses to obtain data for your experiment.
  • F. Use PROC MIXED to analyze the experiment. Is there evidence of a significant fertilizer effect? potting soil effect?


2.10. Simulating Variance Structures

  • Using data from the Seasons study, fit mixed models to Energy intake with the following variance structures:
    • Compound Symmetry
    • AR(1), where data for a subject are first sorted by date of recall
    • Compound Symmetry and AR(1)
  • Using the estimates of the variance components and mean as if they are parameters, simulate data with the variance structures indicated above ( see examples)
  • Develop an expression (using the parameters) for the variance matrix that should result from the simulations. Compare this variance matrix with the simulated matrix to verify the simulation is done correctly.

 


2.11. Write a descriptive report of the results of the weight training study. Your report should be from 1-3 pages, and include figures and simple descriptive tables. The goal of the report is to provide the reader with the context and descriptive results of the study.


2.12. For the weight training study, fit models corresponding to compound symmetry, AR(1), a combination of compound symmetry and AR(1), and random coefficient models. After fitting these models, discuss the results in terms of a comparisons of the programs. Choose a model that you feel best describes the experimental results, and highlight the results from this model. Discuss differences (if any) from other models that may be considered, and reasons why you prefer the model that you selected. Write a 1-4 page report describing these results.


Last Update: 4/27/99
Comments: Ed Stanek
Email:
stanek@schoolph.umass.edu
\ed\web\be740\webready\hw2.html