|
SOLUTIONS
-- Assignment 7:
Documenting a Data Set
|
|
1. Write a simple
program to:
-
Use PROC
CONTENTS to list information on the initial survey data set.
-
Create frequency
tables for the variables CLASS, EVER, LAST, HOWMANY, NOTREPORT,
EQUIP
Program:
Hw7p1.sas
OPTIONS PAGESIZE=55 LINESIZE=78 NODATE NOCENTER NONUMBER;
***********************************************************;
*** ***;
*** PROJECT: BE691F ***;
*** FILE: HW7p1.SAS ***;
*** PROGRAMMER: PENNY PEKOW ***;
*** RE: documenting burner study ***;
*** data -- initial survey ***;
*** contents and frequency tables ***;
*** *************************************************** ***;
*** INPUT FILES: burner1.sas7bdat ***;
*** ***;
*** OUTPUT FILES: no data, only list ***;
*** ***;
***********************************************************;
title1 'PROGRAM: HW7p1.SAS';
libname hwdata 'e:\sasexamples\';
***********************************************;
** step 1: get dataset contents description **;
***********************************************;
proc contents data=hwdata.burner1 varnum;
title2 'HW 7, Problem 1';
run;
** get freq tables of data **;
proc freq data=hwdata.burner1;
tables class ever last howmany notreport equip;
run;
|
Note
that there are some unexpected values in the data, such as 0 for
class,
and 999 for howmany, when you look at the frequency tables.
In assignment 8 you will need to make a decision on how to handle these invalid values --
AND TO DOCUMENT THE CHANGES YOU MAKE TO THE DATA.
2. Write a program
to create a format file for the data.
Program: Hw7p2.sas
OPTIONS PAGESIZE=55 LINESIZE=78 NODATE NOCENTER NONUMBER;
***********************************************************;
*** ***;
*** PROJECT: BE691F ***;
*** FILE: HW7p2.SAS ***;
*** PROGRAMMER: PENNY PEKOW ***;
*** RE: documenting burner study ***;
*** data -- initial survey ***;
*** create formats ***;
*** *************************************************** ***;
*** INPUT FILES: none ***;
*** ***;
*** OUTPUT FILES: burnfmt1.sas7bdat ***;
*** formats for initial survey data ***;
***********************************************************;
title1 'PROGRAM: HW7p2.SAS';
libname hwdata 'e:\sasexamples\'; ** create and store formats **;
proc format cntlout=hwdata.burnfmt1; value ynfmt 0='0.No'
1='1.Yes';
value classf 1='1.Freshman'
2='2.Sophmore'
3='3.Junior'
4='4.Senior'
5='5.5th year';
run; ** print a list of formats **;
data codes(keep=fmtname start label);
set hwdata.burnfmt1;
run;
proc sort;
by fmtname;
run; proc print data=codes;
by fmtname;
id fmtname;
var start label;
title2 'List of Formats for Burner Initial Survey';
run; |
Using
CNTLOUT=libname.dsn on the format
statement writes (creates) a special type of SAS
dataset, which contains format
information. This dataset can be viewed like any other SAS
dataset, using the VIEWTABLE. This file can also be read and used like any other SAS data set
in DATA
steps or PROCs. The 2nd part of the program reads this file,
renames one variable, and
prints a list of the codes and formats using
PROC PRINT.
When
the formats are to be assigned to variables in subsequent programs,
they must be
made available as formats, rather than used as other
SAS datasets. To do this, the
statements:
PROC FORMAT CNTLIN=libname.dsn;
run;
must be used. This makes the formats available during your SAS session as formats.
If they are already available this does no harm-- you just get a message in the log saying
they are already available.
3. Write a program
to create a new version of the data set that will:
Program:
Hw7p3.sas
OPTIONS PAGESIZE=55 LINESIZE=78 NODATE NOCENTER NONUMBER;
***********************************************************;
*** ***;
*** PROJECT: BE691F ***;
*** DATE: 11 NOV 08 ***;
*** FILE: HW7p3.SAS ***;
*** PROGRAMMER: PENNY PEKOW ***;
*** RE: documenting burner study ***;
*** data -- initial survey ***;
*** recode missing values ***;
*** assign formats and labels and save ***;
*** *************************************************** ***;
*** INPUT FILES: burner1.sas7bdat (unedited) ***;
*** ***;
*** OUTPUT FILES: burner2.sas7bdat (formatted, labeled) ***;
*** ***;
***********************************************************;
title1 'PROGRAM: HW7p3.SAS';
libname hwdata 'c:\temp';
** read format file to make available for use **;
proc format cntlin=hwdata.burnfmt1;
run;
***************************************************;
** step 1: read data, assign labels and formats **;
** reset length for codes to 3 **;
***************************************************;
data hwdata.burner2(label='Burner Study Initial Survey');
length pid ever last howmany notreport class equip 3;
set hwdata.burner1;
** recode missing values **;
if pid=9999 then pid=.;
if pid=. then put pid= class= ;
if htft=9 then htft=.;
if htin=99 then htin=.;
if weight=999 then weight=.;
if class=9 then class=.;
if howmany=99 or howmany=999 then howmany=.;
** recode howmany last season to zero if last is zero **;
if last=0 then howmany=0;
** assign formats **;
format class classf.
ever last notreport equip ynfmt. ;
** assign labels **;
label pid='PID:*Player ID'
htft='HTFT:*Feet part* of height'
htin='HTIN:*Inch part* of height'
weight='Weight:*in lbs'
class='CLASS:yr in*college'
ever='EVER:*had*burner'
last='LAST:*year had*burner'
howmany='HOWMANY:*burners*last yr'
notreport='NOTREPORT:*burner*to trainer'
equip='EQUIP:*wear*protective' ;
run;
***************************************************;
*** look at contents and freq of documented data **;
***************************************************;
proc contents data=hwdata.burner2 varnum;
title2 'HW 7, Problem 3';
title3 'Formatted, Labeled Data';
run;
proc freq data=hwdata.burner2;
tables class ever last howmany notreport equip;
run;
|
Hand in copies of
the log and edited output files.
Comment on differences you note between
the results of the PROC CONTENTS
from parts 1 and 3.
In
the original data file, the variables all have length 8, none have labels,
and the data set
page size is 16348 bytes. Four (4) data pages are used to store the data, or
4x16348=65392 bytes
to store the file. (Look in the Windows explorer to see this file size!)
In
the 2nd version of the data, the variables are labeled and formatted, the lengths
of the code
variables have been changed to 3, and the total space used
is 10 pages x 4096 bytes/page = 40960.
So even with additional information
(labels, formats) considerable space has been saved.
Note:
If you try to change the length of variables that are already stored
in a SAS data set you
cannot use the default= option on
the length statement. The default applies only to variables newly
created in SAS in that datastep -- i.e., variables read in using INFILE and INPUT statements or
variables
created by computation or other statements (if-then, ...) within
the data step.
To change the length of a variable that is already saved in a SAS dataset,
you must specify the
variable name on the length statement.
|