Fall
2011

BioEpi 691F: Practical Data Management and Statistical Computing

SOLUTIONS - Assignment 8: Using SET and MERGE statements

 


 

1. Starting with the formatted, labeled version of the burner data from assignment #7, do the following:

For part 1a see: hw8p1a.sas

a1.  Create a new variable for height in inches, dropping the separate variables for the feet and inches parts of height from the data set.

2.  Recode any unexpected or unusual values you found for CLASS and HOWMANY to missing values.  For all players who indicated "no burners"  last year, recode the value of HOWMANY from missing to zero.

3.  Create an indicator variable to identify Freshman that takes the value 1 for freshman and zero for upperclassmen (sophmore/junior/senior/5th yr).

4.  Label the new variables you created. Assign formats as appropriate.
(Tip: add to your format program from the previous assignment, as needed, and rerun it).

5.  Print the first 20 observations in the file using the labels, and run contents on your dataset. Note:  the obs=20 option after the data set name will limit the number of observations printed.  DO NOT TURN IN A LIST OF ALL 700+ OBSERVATIONS!
(Use: PROC PRINT DATA=libn.dsn(obs=20) labels; )


For parts 1b and 1c: hw8p1bc.sas

b. SUBSETTING
Create two temporary datasets from the burner data, one with those who never had burners, and one with those who have ever had burners. Keep only subject id, freshman/upperclass indicator, and number of burners last season in these data sets.

For each data set, get frequency tables of freshman/upperclass status, and number of burners last year. Use appropriate titles.

c. CONCATENATING
Rejoin the never/ever burner data sets into a single data set. In the process of rejoining, re-create a variable to indicate ever/never had a burner in this new data set.

Print frequency tables of freshman/upperclass status and ever/never had burners using your rejoined data.


2. The file binj1.sas7bdat contains a small subset of variables from the injury report forms. The program for problem 2: hw8p2.sas

a. Look at contents of this file.
How many burners were reported?
This question can be answered from looking at the log: there are 176 observations, or 176 injury reports.

b. Merge the injury file with the initial survey file, matching by player id.
Look at the contents of the new merged file. How many observations are in the data set?
In the merged data set, there are 817 or 818 observations (Note: if you deleted the observation with all invalid or missing values from the burner file you will have 817 when merging, if you didn't delete this you'll have 818.).

There are many players with multiple burners, as well as some without any burners. Merging files where subjects have multiple observations in one or more of the files will result in multiple observations for the subject in the resulting file.

c. Create 2 subsets of the merged file, those with 'current' year injuries, and those without current year injuries.
How many players did not have any burners in the current year? How many players had at least one burner?
641 or 642 players are in the data file for those with no current year burners. Since we started with 755 players, 755-642 = 113 players had at least one burner in the 'current' study year.  (Or 754-641=113 if you got rid of that problem observation).

Note -- if you create a subset of records from the current year file using first.pid or last.pid to keep only 1 record per subject -- you will find 114 records.  In fact, there is a player who reported an injury who did not complete an initial survey.  So actually 114 players reported 176 burners.


 

Extra Credit:
The merged data file you created in step 2b has multiple observations on some players. Create a new file that has one observation per player, and contains a count of the number of burners reported during the study year: BCOUNT.  This should have value zero if the player did not report any burners.

See hw8p4.sas for one way to do this.  There are others! This tells me that 640 players did not report any burners during the season;  76 players reported 1 burner; 23 reported 2 burners; 9 players reported 3 burners, and 3 each reported 4 or 5 during the season.

PROGRAM: HW8p4.SAS
Extra Credit Problem
The FREQ Procedure
BCOUNT:*# of*burners
bcount Frequency Percent Cumulative
Frequency
Cumulative
Percent
0 640 84.88 640 84.88
1 76 10.08 716 94.96
2 23 3.05 739 98.01
3 9 1.19 748 99.20
4 3 0.40 751 99.60
5 3 0.40 754 100.00

Assignments Page


Last Update: 11/22/2011
Comments:Penny Pekow
Email:
ppekow@schoolph.umass.edu
assignments\sol8_2011.html