Fall
2011

BioEpi 691F: Practical Data Management and Statistical Computing

SOLUTIONS: Assignment 6: Getting Started in SAS


 

  1. From the Windows Explorer or MY Computer right-click the SAS file Hw6p1.sas and select “Open with SAS 9.x”  (x=2 or 3).

    a. The file Hw6p1.sas will open into an enhanced editor window.

        Look the program over, and then run (submit) it.
        What happens? Write a response.

    As the program is run, information on the execution is written to the log window, and results to the output
    window. A temporary or work data set is produced with 19 observations and 5 variables. This data set can
    be found in the Explorer window within the WORK library. A listing or print of the data and a frequency table
    for sex are written to the output window.


    b. Move to the LOG window and go to the top of the page. Read the log.
        What information is given?

    Information on execution appears in the log -- detailing that the data is in the work directory and the number of variables and observations written to the file, along with time of execution.

  2.   Return to the program editor window and make the following changes in the program:

    a. Add a line in the header to say MODIFIED BY:  
        followed by your name and the date.

    b. Add comments throughout the program to explain the purpose of each step.

    c. Add Titles that give your name and the assignment and problem number.

    d. Edit the program so that the data file will be saved when you exit from SAS.

    54   OPTIONS PAGESIZE=55 LINESIZE=78 NODATE NOCENTER NONUMBER;
    55 ***********************************************************;
    56 *** ***;
    57 *** PROJECT: BE691F ***;
    58 *** DATE: 11 OCT 00 ***;
    59 *** FILE: HW6p2.SAS ***;
    60 *** PROGRAMMER: PENNY PEKOW ***;
    61 *** Modified by: Penny Pekow ***;
    62 *** 31 OCT 2001 ***;
    63 *** RE: EXAMPLE PROGRAM ***;
    64 *** GETTING STARTED IN SAS ***;
    65 *** (example courtesy of Trina Hosmer) ***;
    66 *** *************************************************** ***;
    67 *** INPUT FILES: Instream data ***;
    68 *** ***;
    69 *** OUTPUT FILES: CLASS (temp file) ***;
    70 *** ***;
    71 ***********************************************************;
    72 TITLE1 'PROGRAM: HW6p2.SAS';
    73 Title2 ' Assignment 6 Problem 2';
    74 title3 'Penny Pekow';
    75
    76 ** define library to store data **;
    77 Libname save 'c:\temp';
    NOTE: Libref SAVE was successfully assigned as follows:
    Engine: V8
    Physical Name: c:\temp

    78
    79 ** read in data **;
    80 DATA save.CLASS;
    81 INPUT NAME $ SEX $ AGE HEIGHT WEIGHT;
    82 CARDS;
    NOTE: The data set SAVE.CLASS has 19 observations and 5 variables.
                     NOTE: DATA statement used:
                     real time 0.00 seconds
    102 ;
    103 RUN;
    104
    105 ** print a list of data **;
    106 PROC PRINT DATA=save.CLASS;
    107 TITLE2 'CLASS LIST';
    108 RUN;
    NOTE: There were 19 observations read from the data set SAVE.CLASS.
                     NOTE: PROCEDURE PRINT used:
                     real time 0.00 seconds
    109
    110 ** get frequency table of sex **;
    111 PROC FREQ DATA=save.CLASS;
    112 TABLES SEX;
    113 TITLE2 'NUMBER OF BOYS AND GIRLS IN CLASS';
    114 RUN;
    NOTE: There were 19 observations read from the data set SAVE.CLASS.
                     NOTE: PROCEDURE FREQ used:
                     real time 0.00 seconds

    PROGRAM: HW6p2.SAS
    CLASS LIST
    Obs  NAME   SEX  AGE HEIGHT  WEIGHT
    1   ALFRED   M   14   69.0   112.5
    2   ALICE    F   13   56.5    84.0
    3   BARBARA  F   13   65.3    98.0
    4   CAROL    F   14   62.8   102.5
    5   HENRY    M   14   63.5   102.5
    6   JAMES    M   12   57.3    83.0
    7   JANE     F   12   59.8    84.5
    8   JANET    F   15   62.5   112.5
    9   JEFFREY  M   13   62.5    84.0
    10  JOHN     M   12   59.0   995.0
    11  JOYCE    F   11   51.3    50.5
    12  JUDY     F   14   64.3    90.0
    13  LOUISE   F   12   56.3    77.0
    14  MARY     F   15   66.5   112.0
    15  PHILIP   M   16   72.0   150.0
    16  ROBERT   M   12   64.8   128.0
    17  RONALD   M   15   67.0   133.0
    18 THOMAS    M   11   57.5    85.0
    19 WILLIAM   M   15   66.5   112.0

    NUMBER OF BOYS AND GIRLS IN CLASS
    The FREQ Procedure
                               Cumulative   Cumulative
    SEX  Frequency    Percent   Frequency      Percent
    F           9       47.37           9        47.37
    M          10       52.63          19       100.00

      g. Return to SAS and clear the log and output windows.
        In the SAS Explorer, look at your SAS libraries.
        How many are there?  What are they called?

    There are 5 or 6 SAS libraries: Sashelp, Sasuser, Maps, and Work are default libraries in V9.2.
    In V9.3 you should see Sashelp, Sasuser, Mapsgfk, Mapsas, and Work
    . The additional library is the
    one you named. In my case, it is called Save.


    Note that if you have a data file open in VIEWTABLE mode, you may see an error message if you try to use that file in a program you submit -- the file is in use by VIEWTABLE.


  3. This problem requires the use of an INFILE statement to read a text data file into SAS.

    a. Open the file hw6p3.sas into the program editor and submit (run) it.
        Note: if you did not save files to c:\temp you need to change the file reference
        within the program to the drive and directory where you saved the data file.

    b. Look at the log and output. Save, edit into 1 document, and print.
        Also print the ASCII data file (you can do this from Notepad).

    c. What are the purpose of the & and $15. on the INPUT statement?
    The ampersand (&) is used when:
    1. there is an embedded blank in a text field when blanks are also used as the delimiter
    2. or when the data are of varying length in list input.

      The $15. indicates that the variable is character format, and takes 15 columns, longer than
      the default of 8 columns

         
      PROGRAM: HW6p3.SAS
      LIBRARY LIST

Obs NAME    DAY   SUBJECT

1   BARBARA   2   ENGLISH
2   CAROL     2   SCIENCE
3   CAROL     3   ENGLISH
4   CAROL     4   MATH
5   DONALD    1   ART
6   JAMES     4   MATH
7   JOYCE     5   HOME ECONOMICS
8   JOYCE     6   SCIENCE
9   MARY      2   MECHANICS
10  PHILIP    1   HOME ECONOMICS
11  PHILIP    3   MATH
12  WILLIAM   2   SCIENCE


    Re-run the program without the & .
Without the &, the input statement is trying to read 15 columns for the character value SUBJECT --
when the line ends with fewer than 15 columns to read, it goes to the next line to find and read 15
columns, and then moves on to the next line for the next observation. So only 7 observations are read.
The ampersand is needed because the SUBJECT data is of varying length -- not all subjects take 15 columns.

If you look at the log, you will see:
Note that SAS moved to a new line when the input statement reached the end of a line

-- this is not an error message, but a "note". It serves as notification to you that there could be a problem  (and in this case there is a problem!).

PROGRAM: HW6p3.SAS
LIBRARY LIST: No &

Obs   NAME   DAY  SUBJECT

1   BARBARA   2   CAROL 2 SCI
2   CAROL     3   CAROL 4 MAT
3   DONALD    1   JAMES 4 MAT
4   JOYCE     5   HOME ECONOMICS
5   JOYCE     6   MARY 2 MEC
6   PHILIP    1   HOME ECONOMICS
7   PHILIP    3   WILLIAM 2 SCI


    Next, rerun the program with the &, but use $ in place of $15. What happened?
In this case, the default length for character data is 8 columns, so only 8 columns were read, including a space - since the & was used.

PROGRAM: HW6p3.SAS
LIBRARY LIST No $15.

Obs NAME    DAY   SUBJECT

1   BARBARA   2   ENGLISH
2   CAROL     2   SCIENCE
3   CAROL     3   ENGLISH
4   CAROL     4   MATH
5   DONALD    1   ART
6   JAMES     4   MATH
7   JOYCE     5   HOME ECO
8   JOYCE     6   SCIENCE
9   MARY      2   MECHANIC
10  PHILIP    1   HOME ECO
11  PHILIP    3   MATH
12  WILLIAM   2   SCIENCE


  1. d.
    Add the word MISSOVER to the INFILE statement before the semicolon,
        and try again without the & . What happened this time?
    Without the ampersand, the subject data again could not be read when it was less than 15 columns
    (an end of line marker is reached). This time, the MISSOVER prevented the program from reading
    data from the next line -- so the value is missing, or blank instead of improperly read from the next line.
    Only the subjects that use a full 15 columns are read.


    PROGRAM: HW6p3.SAS
    LIBRARY LIST: Missover, no &

    Obs NAME     DAY   SUBJECT

    1   BARBARA   2
    2   CAROL     2
    3   CAROL     3
    4   CAROL     4
    5   DONALD    1
    6   JAMES     4
    7   JOYCE     5   HOME ECONOMICS
    8   JOYCE     6
    9   MARY      2
    10  PHILIP    1   HOME ECONOMICS
    11  PHILIP    3
    12   WILLIAM  2

    This is tricky data to read into SAS -- the important lesson to learn, is that there are a variety of control characters that can be used with input and infile statements to read data, depending upon the format.

    You will often find yourself in the position of trial and error in reading data with
    unusual formatting --
    there are additional line and column pointer controls that can be used with input and put statements.

    $ indicates character data is to be read
    $15. indicates character data taking 15 columns is to be read
    & allows reading of imbedded blanks, and values of varying length (some lines < 15)
    MISSOVER prevents moving to a new line to read data


  2. In this problem you will read in some data on a few more children with their names, ages, sex and height, and then concatenate this data with the data set you saved in problem 2.
     
    OPTIONS PAGESIZE=55 LINESIZE=78 NODATE NOCENTER NONUMBER;
    ***********************************************************;
    *** ***;
    *** PROJECT: BE691F ***;
    *** DATE: 31 OCT 01 ***;
    *** FILE: HW6p4.SAS ***;
    *** PROGRAMMER: PENNY PEKOW ***;
    *** RE: EXAMPLE PROGRAM ***;
    *** GETTING STARTED IN SAS ***;
    *** concatenate files ***;
    *** (example courtesy of Trina Hosmer) ***;
    *** *************************************************** ***;
    *** INPUT FILES: Instream data ***;
    *** ***;
    *** OUTPUT FILES: CLASSA (temp file) ***;
    *** ***;
    ***********************************************************;

    TITLE1
    'PROGRAM: HW6p4.SAS';
    Title2 ' Assignment 6 Problem 4';
    title3
    'Penny Pekow';
    Libname save 'c:\temp';  ** define location to save data **;
    ** read in data **;
     data classa;
          infile 'c:\temp\hw6p4.dat';            ** define location of data to read in **;
          input name $ age sex $ height weight;  ** assign names, type of data *;
    run;
    ** print a list of data **;
    PROC PRINT DATA=CLASSA;
    TITLE4 'New CLASS LIST';
    RUN;
    ** concatenate and save data **;
    ** name both files to read on ONE set statement **;
    data save.class2;
         set save.class classa;
    run;
    proc print data=save.class2;
    Title4 'Concatenated Class list';
    run;
    proc contents position data=save.class2;
    run;
           
    ** NOTE:  if you try to concatenate by using 2 infile statements this won't work **;
    **        the variables are not in matching order and require different input    **;
    ** - order of variables doesn't matter with SET **;


  3. Suggestions for printing LOG and OUTPUT: I find the easiest thing to do is copy and paste from the log or output window directly into a WORD file. -- Then edit the word file for font and pagination to avoid word-wrap at the end of a line;  breaking a table across 2 pages;  wasting paper by having a single small table or listing on each page.

Please remember to copy all the files you want to save to your own disk.

If you are working in a computer lab, please erase your files from the hard drive and log off when you leave the computer lab.

 


Assignments Page


Last Update: 11/08/2011
Comments:Penny Pekow
Email: ppekowf@schoolph.umass.edu
assignments\sol6_2011.html