Introduction
We illustrate how to read data into SAS using as an
example data from the Intensive Care Study (icu.dat).
These data are stored as an ASCII data file in a rectangular
format with columns representing variables and rows
representing subjects. Each column is separated by one or
more blanks. A code book (icu.txt)
describes the variables representing the columns. To follow
this example, you should
- i. have a copy of SAS installed on your computer
- ii. Print a copy of the code book (icu.txt)
- iii. Save a copy of the file icu.dat
on your computer. If you are using OIT or SPH&HS
computers, save this file in on the "c:\data" directory.
Inspect the data in the file
icu.dat, and record the
number of the first line that contains data.
- iv. Have opened the SAS program.
Reading Data Into
SAS
The data files stored as ASCII files with rows
corresponding to subjects, and columns corresponding to
variables. The values in columns are separated by one or
more blanks. This enables the data to be read in list input.
A batch program that reads the file c:\data\icu.dat into SAS
is given by LISTP1.SAS . All SAS
batch programs should end with the suffix *.sas . Each
statement in the program ends with a semi-colon
(;).
- a. Include a copy of the program LISTP1.SAS
from the WEB in the program window in SAS.
- b. Run the program by clicking on the ICON of the
person running.
- c. Move to the LOG window, and check for errors. If
the data file icu.dat is not found, make sure you have
copied it to the c:\temp directory.
- d. Move to the OUTPUT window, and inspect the output.
When the program runs correctly, you should view a list
of the data.
- e. Move to the PROGRAM window (which is now empty).
Recall your program by clicking on the LOCAL pull down
menu, and then clicking on RECALL TEXT. The most recent
version of your SAS program is brought back into the
PROGRAM window.
- f. Use the text editor to alter the name of the
program (alter a54p1.sas) (we suggest using simple names
indexed by numbers for consecutive programs). Make sure
your program name ends with *.sas.
- g. Change the name of the programmer to your name. Be
careful not to delete the quotation marks.
- h. Save the program using the SAVE AS command and the
new program name.
- i. Run the program and review the annotated
discussion below.
Annotated
Discussion of Program LISTP1.SAS
OPTIONS LINESIZE=72 PAGESIZE=55
NODATE NONUMBER NOCENTER;
- The OPTIONS statement
sets values for the program environment. We will use the
same settings for all batch programs. The settings
correspond to parameters for the number of characters on
a line, the number of lines on the page, whether output
will include a date or page number, or be centered.
***************************************************************************;
- Comments in SAS are for use by the user to help keep
track of what the program is doing. Any command line that
begins with an asterisks is
a comment. We start each program with a comment
indicating what the program is doing.
TITLE1 "Source: LISTP1.SAS 9/24/98
Ed Stanek" ;
- Titles can be specified that appear at the top of
each page in all results. The first line in a title is
indicated by TITLE1. A
second title line can be indicated by the SAS keyword
TITLE2. Titles must be
enclosed in quotation marks. It is important that
the quotation marks be balanced.
- Since results are produced by programs, we include a
title that indicates the name of the program that created
the output.in all programs, and the location where the
program is stored.
* DESCRIPTION: Read in ICU Data and
created SAS system data set ;
***************************************************************************;
- Additional comments indicate what the program is
doing. Each comment begins with an
* and ends with a
; .
DATA d;
INFILE
'c:\temp\icu.dat' FIRSTOBS=11;
INPUT id sta age sex race ser can crn inf cpr sys hra pre
typ
fra po2 ph pco bic
cre loc;
- The SAS statements given above constitute a SAS
DATA STEP. A DATA STEP is used to read in data into
the SAS system. We will always indent statements in a SAS
Data step to help emphasize that these statements go
together. Data steps always begin with the keyword DATA
and are followed with the name that is given to the
created data set. In this example, the SAS system data
set is named "d.sd2". SAS automatically adds the suffix
*.sd2 to the SAS system data sets. The
DATA statement ends with a
semi-colon.
- In order to identify data that will be contained in
the data set, the file that contains the data is
specified via the location using an
INFILE statement. The file
location is specified in quotation marks. It is
important that quotation marks are used in sets (two
single quotation marks, or two double quotation
marks). Don't mix single and double
quotations. There is one optional parameter
specified in the INFILE
statement. The optional keyword is
FIRSTOBS, and it indicates
that data begin on line 11 in the file. The
INFILE statement ends with a
semi-colon.
- The third statement in the SAS data step begins with
the SAS keyword INPUT.
Following this keyword, the names are listed in order
corresponding to values in columns in the data file.
These names match the code book icu.txt.
Note that the INPUT
statement can continue for more than one line. The end of
the statement is indicated by the semi-colon.
PROC PRINT;
- The statement PROC PRINT
is a procedure in SAS that prints a copy of data in the
current SAS data step. PROC
is short for procedure. The SAS program is
sequential, meaning that the order of statements in the
program determines the order of processing. A listing of
the data in the OUTPUT window results. The
PROC PRINT procedure ends
with a semi-colon.
PROC CONTENTS;
- The statement PROC
CONTENTS is a procedure in SAS to list the SAS
system information about a data set. The most recently
created SAS data set is used. The results are given in
the OUTPUT window.
RUN;
- The final SAS statement is
RUN. This statement requests
that the previous statements be executed by the SAS
program.
|