|
|
Dept. of Biostatistics
and Epidemiology at the
|
|
|
|
|
|
|
|
|
|
|
|
Problems: 2.6. Using simple matrix algebra to fit a model. c. Use the results of the experiment conducted in D40P6h.sas and IML to estimate the LSMEANS, the Model SS, and the error SS. Check your results with PROC MIXED. Solution: a. We run the program D40P6h.sas, and save the data set using a COPY command, afterwards pasting it into the program D40P6ha.sas. First, we fit an ANOVA model using PROC MIXED. The results are given in D40P6ha.lst. b. Next we read the data into PROC IML using the program D40P6hb.sas . We create design matrics for different parameterizations of the model. The design matrices correspond to the following:
The cell mean model is fit, with parameters estimated for corresponding to the least squares means (see D40P6hb.lst). In addition, we define a matrix that estimates all possible pair-wise differences, and their variance. c. Other Parameterizations The key to constructing estimates based on other parameterizations (such as the deviations from means, or reference cell parameterization) is re parameterizing the model to reflect the cell mean parameterization. With such a re-parmeterization, all the other evaluations can be performed in an identical manner. We illustrate this re-parameterization in D40P6hc.sas.
2.7. Practice with Linear Combinations a. Suppose dietary intake is independent between subjects, and between days on a given subject. Assume also that the standard deviation in intake (from day to day) is identical for all subjects, which we represent by v. Also suppose that the long run average intake for each subject is identical. Use vectors and linear combinations to derive the variance in intake of the average formed by measuring intake on 1 day each for a simple random sample of 3 subjects. b.Suppose dietary intake is independent between subjects, and between days on a given subject. Assume also that the standard deviation in intake (from day to day) is identical for all subjects, which we represent by ve. It is unrealistic to assume that the long-run average intake is the same for each subject. In a population, let us assume the standard deviation in the long run intake is vb. Now suppose that on a simple random sample of 3 subjects we have 3 days of reported intake on subject 1, 1 day of reported intake on subject 2, and 5 days of reported intake on subject 3. Use vectors and linear combinations to derive the variance in average intake formed by taking the simple average of each subject's average intake. 2.8. Designing a Simple Randomized Study We use data from the Seasons Study (see Research Problems) here to consider design of a simple randomized study. The data collected on a subset of subjects from this study are to be used to help design a completely randomized study to compare an intervention with a control. We assume the intervention (to be considered practically valuable) will have to change the variable by 10%. Design a study with adequate power to detect this effect, assuming only one 24-hour recall measure will be collected per subject. Solution: We first save the data set seat1.sd2 and get the contents with the SAS program D40P8.SAS. We consider as an example design of a study to reduce caffeine consumption by 10%. A simple description of caffeine consumption (in mg/day) is given by D40P9.SAS. Based on these results (see D40P9.LST) , a 10% reduction in caffeine consumption is a reduction of 28 mg/d (using a simple mean), or 30 mg/d (using an average of subject averages, or using PROC MIXED). We fit a mixed model to these data to estimate the variance components between subjects and within subjects in D40P10.sas. The results are given in D40P10.lst. The variance between subjects (36054) is roughly twice the variance between days on a subject (18714). If the variance between days is the same for all subjects, then an estimate of the variance for a single subject-day is the sum of these variances = 54768. We design a study assuming that this is the variance. Assuming normality of caffeine intake (which is in doubt here), we evaluate the power with D40P11.sas (with the estimates given in D40P11.lst). These calculations are based on the non-central F distribution (see Kirk (p183).
2.9. A study designed to evaluate the effect of potting soil and fertilizers on growth of chrysanthemum plants is described by Searle et al in Example 3, p9. The potentially observable responses (corresponding to height (in inches) of the plant after 1 month) are given below. This assignment focuses on this example. You may which to make use of programs given in the Solution to homework assignment #1. The data were created by d40p7.sas and are saved as hw2a.txt .
2.10. Simulating Variance Structures
Solution: A summary of the results are given in be740e38.pdf . The programs used to fit the models are given above. By default, REML estimates are computed for variance components. When comparing models, -2 log(likelihood) values should be compare using ML estimates (with an appropriate chi-square distribution). The program d40p27.sas notes one apparent specification of a CS model that gives puzzling results. In addition, note that the AR(1) model assumes equal spacing of dietary intake measures. This spacing does not hold since there are at most 3 measures made in a 3-month period (usually separated by one to three days).
Develop an expression (using the parameters) for the variance matrix that should result from the simulations. Compare this variance matrix with the simulated matrix to verify the simulation is done correctly. Solution: We run simulations for each possible variance structure, and illustrate what the true variance matrix should be in be740e38.pdf . The simulation programs are given by:
2.11. Examples of descriptive analyses for weight training studies.
Last Update: 4/20/99 Comments: Ed Stanek Email: stanek@schoolph.umass.edu \ed\web\be740\webready\hw2a.html |