Practical Data Management
and Statistical Computing (BioEp691F)
Outline: Lec11 Lec12
Lec13 lec14
Lec15 Lec16
Lec17 Lec18
Lec19 Lec20
Lectures: Lec11
Lec12
Lec13
Lec14
Lec15
Lec16
Lec17
Lec18
Lec19
Lec20
Lecture 11
1. Reading Data With Multiple and
Unequal Numbers of Lines per Subject (An Example)
Chronic Granulomatous Disease (CGD) is a group of inherited rare
disorders of the immune function characterized by recurrent pyogenic
infections which usually present early in life and may lead to death
in childhood. There is evidence establishing a role for gamma
interferon as an important macrophage activating factor which could
restore superoxide anion production and bacterial killing by
phagocytes in CGD patients. In order to study the ability of gamma
interferon to reduce the rate of serious infections, that is, the
rate of infections requiring hospitalization for parenteral
antibiotics, a double-blinded clinical trial was conducted in which
patients were randomized to placebo vs. gamma interferon. The UMASS
data set contain the data
and a brief description.
The research hypothesis is:
Does the infection rate differ between the two intervention
groups?
Analysis Plan: Create infection rate variable for each subject
over time. Separate cross-sectional from longitudinal data. Summarize
infection rates for longitudinal data. Compare rates between
treatment groups overall, and by gender.
Inspection of the data indicates that there are multiple infections
per subject. We read the data for a subset of the variables ( ID,
Z1,Z8,T1, and D) in dmes99p19.sas. The
resulting data set has 1 record for each infection. We focus on the
variables ID, IDT , T1 and D.
We separate variables into a cross-sectional, and a longitudinal
data set in dmes99p20.sas.
- creating multiple data sets in a DATA step.
- OUTPUT statement
- Using the automatic variable
FIRST.xxx
with BY
xxx; statment when Data has been
SORTED ;
BY xxx;
Add a record to the longitudinal data set for the patient entry
time, and recode the events to 0, 1 (dmes99p21.sas).
- use SET to concatenate two data
sets.
Create variable that represents the cumulative number of
infections for each subject (dmes99p22.sas)
- Using the automatic variable
FIRST.xxx
with BY
xxx; statment when Data has been
SORTED ;
BY xxx;
- Use RETAIN statement in
DATA step to cumulate infection for
each subject.
Add cross-sectional variables to longitudinal data (dmes99p23.sas)
- use MERGE statement in
DATA step with
BY xxx;
statment when Data has been SORTED ;
BY xxx;
Plot simultaneously Number of Infections over time by Treatment.
(dmes99p24.sas)
- use PROC GPLOT with options
- SYMBOL statement options
- I=JOIN to connect points
- C=BLACK to select color
- R=100 to repeat specification for subsequent SYMBOL#
statements
- BY c to create separate plots for levels of c
- PLOT a*b=c to overlay separate plots for levels of c
- option NOLEGEND to suppress the legend for
subjects.