Fall
2011

BioEpi 691F: Practical Data Management and Statistical Computing

SOLUTIONS -- Assignment 7: Documenting a Data Set


1. Write a simple program to:

  • Use PROC CONTENTS to list information on the initial survey data set.

  • Create frequency tables for the variables CLASS, EVER, LAST, HOWMANY, NOTREPORT, EQUIP

    Program: Hw7p1.sas

    OPTIONS PAGESIZE=55 LINESIZE=78 NODATE NOCENTER NONUMBER;
    ***********************************************************;
    ***                                                     ***;
    *** PROJECT: BE691F                                     ***;
    *** FILE: HW7p1.SAS                                     ***;
    *** PROGRAMMER: PENNY PEKOW                             ***;
    *** RE: documenting burner study                        ***;
    *** data -- initial survey                              ***;
    *** contents and frequency tables                       ***;
    *** *************************************************** ***;
    *** INPUT FILES: burner1.sas7bdat                       ***;
    ***                                                     ***;
    *** OUTPUT FILES: no data, only list                    ***;
    ***                                                     ***;
    ***********************************************************;
    title1 'PROGRAM: HW7p1.SAS';
    libname hwdata 'e:\sasexamples\';

    ***********************************************;
    ** step 1: get dataset contents description  **;
    ***********************************************;
    proc contents data=hwdata.burner1 varnum;
    title2 'HW 7, Problem 1';
    run;

     

    ** get freq tables of data **;
    proc freq data=hwdata.burner1;
         tables class ever last howmany notreport equip;
    run;

    Note that there are some unexpected values in the data, such as 0 for class,
    and 999 for howmany, when you look at the frequency tables.

    In assignment 8 you will need to make a decision on how to handle these invalid values --
    AND TO DOCUMENT THE CHANGES YOU MAKE TO THE DATA.


2. Write a program to create a format file for the data.

Program: Hw7p2.sas

OPTIONS PAGESIZE=55 LINESIZE=78 NODATE NOCENTER NONUMBER;
***********************************************************;
***                                                     ***;
*** PROJECT: BE691F                                     ***;
*** FILE: HW7p2.SAS                                     ***;
*** PROGRAMMER: PENNY PEKOW                             ***;
*** RE: documenting burner study                        ***;
***     data -- initial survey                          ***;
***     create formats                                  ***;
*** *************************************************** ***;
*** INPUT FILES: none                                   ***;
***                                                     ***;
*** OUTPUT FILES: burnfmt1.sas7bdat                     ***;
***       formats for initial survey data               ***;
***********************************************************;
title1 'PROGRAM: HW7p2.SAS';
libname hwdata 'e:\sasexamples\';

** create and store formats **;
proc format cntlout=hwdata.burnfmt1;

     value ynfmt 0='0.No'
                 1='1.Yes';

 

     value classf 1='1.Freshman'
                  2='2.Sophmore'
                  3='3.Junior'
                  4='4.Senior'
                  5='5.5th year';
run;

** print a list of formats **;
data codes(keep=fmtname start label);
     set hwdata.burnfmt1;
run;

 

proc sort;
     by fmtname;
run;

proc print data=codes;
     by fmtname;
     id fmtname;
    var start label;
title2 'List of Formats for Burner Initial Survey';
run;

Using CNTLOUT=libname.dsn on the format statement writes (creates) a special type of SAS
dataset, which contains format information
. This dataset can be viewed like any other SAS
dataset, using the VIEWTABLE.   This file can also be read and used like any other SAS data set
in DATA steps or PROCs. The 2nd part of the program reads this file, renames one variable, and
prints a list of the codes and formats using PROC PRINT.

When the formats are to be assigned to variables in subsequent programs, they must be
made available as formats, rather than used as other SAS datasets
.
To do this, the
statements:

PROC FORMAT CNTLIN=libname.dsn;
run;

must be used.  This makes the formats available during your SAS session as formats.
If they are already available this does no harm-- you just get a message in the log saying
they are already available.

3. Write a program to create a new version of the data set that will:

  • Include a program header to describe the purpose of the program, along with documentation information such as input and output data files.

  • Add an options statement to control pagesize, linesize, no printing of page numbers, ...

  • Read in the formats you created in step 2, using the following statements:
    • PROC FORMAT CNTLIN=libname.filename;
      run;


      where libname is the libname for the directory where the format file is stored.

  • In the DATA step:
    • Add a LENGTH statement to control variable length.
      For numeric codes a length of 3 is adequate. You decide for other numeric variables.

    • Add a LABEL statement to label variables with descriptive labels.

    • Use a FORMAT statement in the data step to assign the formats you created in step 2 to the variables.

    • Recode missing values as appropriate, to SAS missing values, or to other appropriate values.

  • Create frequency tables for the same variables as in step 1.

  • Use PROC CONTENTS to look at the new dataset information.

  • Include TITLES and/or FOOTNOTES in your program.

  • Use comments throughout.

Program: Hw7p3.sas

OPTIONS PAGESIZE=55 LINESIZE=78 NODATE NOCENTER NONUMBER;
***********************************************************;
***                                                     ***;
*** PROJECT: BE691F                                     ***;
*** DATE: 11 NOV 08                                     ***;
*** FILE: HW7p3.SAS                                     ***;
*** PROGRAMMER: PENNY PEKOW                             ***;
*** RE: documenting burner study                        ***;
*** data -- initial survey                              ***;
*** recode missing values                               ***;
*** assign formats and labels and save                  ***;
*** *************************************************** ***;
*** INPUT FILES: burner1.sas7bdat  (unedited)           ***;
***                                                     ***;
*** OUTPUT FILES: burner2.sas7bdat (formatted, labeled) ***;
***                                                     ***;
***********************************************************;
title1 'PROGRAM: HW7p3.SAS';
libname hwdata 'c:\temp';

** read format file to make available for use **;
proc format cntlin=hwdata.burnfmt1;
run;

***************************************************;
** step 1: read data, assign labels and formats  **;
** reset length for codes to 3                   **;
***************************************************;
data hwdata.burner2(label='Burner Study Initial Survey');
     length pid ever last howmany notreport class equip 3;
     set hwdata.burner1;

** recode missing values **;
   if pid=9999 then pid=.;
   if pid=. then put pid= class= ;

   if htft=9 then htft=.;
   if htin=99 then htin=.;
   if weight=999 then weight=.;
   if class=9 then class=.;
   if howmany=99 or howmany=999 then howmany=.;

** recode howmany last season to zero if last is zero **;
   if last=0 then howmany=0;

** assign formats **;
   format class classf.
          ever last notreport equip ynfmt. ;

** assign labels **;
   label pid='PID:*Player ID'
        htft='HTFT:*Feet part* of height'
        htin='HTIN:*Inch part* of height'
      weight='Weight:*in lbs'
       class='CLASS:yr in*college'
        ever='EVER:*had*burner'
        last='LAST:*year had*burner'
     howmany='HOWMANY:*burners*last yr'
   notreport='NOTREPORT:*burner*to trainer'
       equip='EQUIP:*wear*protective' ;
run;

***************************************************;
*** look at contents and freq of documented data **;
***************************************************;
proc contents data=hwdata.burner2 varnum;
title2 'HW 7, Problem 3';
title3 'Formatted, Labeled Data';
run;

proc freq data=hwdata.burner2;
     tables class ever last howmany notreport equip;
run;

 

Hand in copies of the log and edited output files.
Comment on differences you note between the results of the PROC CONTENTS
from parts 1 and 3.

In the original data file, the variables all have length 8, none have labels, and the data set
page size is 16348 bytes. Four (4) data pages are used to store the data, or 4x16348=65392 bytes
to store the file. (Look in the Windows explorer to see this file size!)

In the 2nd version of the data, the variables are labeled and formatted, the lengths of the code
variables have been changed to 3, and the total space used is 10 pages x 4096 bytes/page = 40960.
So even with additional information (labels, formats) considerable space has been saved.

Note: If you try to change the length of variables that are already stored in a SAS data set you
cannot use the default= option on the length statement. The default applies only to variables newly
created in
SAS in that datastep -- i.e., variables read in using INFILE and INPUT statements or variables
created by computation or other statements (if-then, ...) within the data step.

To change the length of a variable that is already saved in a SAS dataset, you must specify the
variable name on the length statement.

 

 

Assignments Page


Last Update: 11/15/2011
Comments:Penny Pekow
Email: ppekow@schoolph.umass.edu
assignments\sol7_2011.html