Practical Data Management
and Statistical Computing (BioEp691F)
Outline: Lec1 Lec2
Lec3 Lec4
Lec5 Lec6
Lec7 Lec8
Lec9 Lec10
Lectures: Lec1
Lec2 Lec3
Lec4
Lec5
Lec6
Lec7
Lec8
Lec9
Lec10
Lecture 6
1.
Review:
Problems in Reading Data from
ASCII Files (from Assignment #4) Use
column input to read the ASCII data set lec3a.dta
into SAS (as well as you can). Do not change anything in the
ASCII data set. [Note that it is not possible to read all
data values in correctly.] (lec3ap1.sas)
or (lec3ap2.sas)
- Notice jagged left border when Selecting
data from the WEB- This indicates the records have different
length.
- The MISSOVER option in the INFILE
statement. (lec3ap4.sas )
- Edit the data-add some blank spaces to make lines of equal
length in WORD or NOTEPAD (lec3ap5.sas
which uses the data set lec3a1.dta)
- Make sure the way data is entered can be
read! (before it is all entered)
- Other solutions:
Read data in LIST input mode:
lec3ap6.sas
2. Once a SAS data set has been created, you can use other PROCedures
to describe the data. We illustrate this with data from the Research
on the Intensive Care Unit study.
You can copy output to a Word Processor, and use these results in
a report.
2. Obtaining Charts with
PROC CHART
Introduction
We illustrate how to obtain bar charts and histograms illustrating
the SAS Procedures:
- PROC CHART;
- PROC FORMAT;
- PROC MEANS;
We use as an example data from the Intensive Care Study (icu.dat).
A code book (icu.txt)
describes the variables representing the columns, and is neede to
interpret the results. We assume that you:
- i. have a copy of SAS installed on your computer
- ii. have a printed copy of the code book (icu.txt)
- iii. have previously saved a copy of the file icu.dat on the
"c:\temp\" directory on your computer.
- iv. have opened the SAS program.
- v. have copied the program CHARTP1.SAS
into the SAS program window.
Obtaining Charts and
Histograms.
Vertical and horizontal bar charts can be obtained for variables
with discrete values, and histograms for continuous variables with
PROC CHART. The
OPTIONS statement controls the size of
the figure that is produced. Other options in the
PROC CHART procedure control chart
appearance.
Annotated Discussion of Program
CHARTP1.SAS
OPTIONS LS=72 PS=55 NODATE NONUMBER
NOCENTER;
******************************************************;
*** Program Date Disk Programmer ;
TITLE1 "Source: CHARTP1.SAS 9/24/98 Ed Stanek ";
* DESCRIPTION: ;
* a. Obtaining charts ;
* b. Obtaining historgrams ;
* c. Attaching formats to discete value ;
* using ICU.SD2 created from icu.dat ;
* Examples of PROC CHART, PROC FORMAT ;
******************************************************;
*****************************;
*** Read ICU Study data ***;
*****************************;
DATA icu;
INFILE 'c:\temp\icu.dat'
FIRSTOBS=11;
INPUT id sta age sex race ser can crn inf cpr sys hra
pre typ fra po2 ph pco bic cre loc;
PROC FORMAT;
VALUE sexf
0="Male"
1="Female";
VALUE staf
0="Live"
1="Die";
DATA icu1;
SET icu;
FORMAT
sex sexf.
sta staf.;
- We substitute names for the values using
PROC FORMAT. The set of names that
are attached is assigned a Format Name given in a
VALUE statement. For the variable
sex, the format name assigned is
sexf . The description of the names
is given by equating the value to a descriptive name. Once a list
of all values is given, the VALUE
statement ends in a semi- colon. More than one
VALUE statement can be used in the
PROC FORMAT procedure.
- Once the PROC FORMAT procedure
has been run, other procedures in SAS can use the formats that
have been defined by including an optional
FORMAT statement. We include this
statement in a DATA Step. The DATA step creates a
new SAS System data set named icu1
from the SAS data set icu. In the new
SAS data set, the FORMAT statement
tells the SAS program to use the format names (ie
sexf ) in place of the values for the
variable (ie. sex). A
period must be included after the format name in a format
statement (
sexf.
) for the program to distinguish the format
name from the variable..
PROC CHART DATA=icu1;
VBAR sex /DISCRETE;
TITLE2 "Figure 1. Frequency Distribution of Gender in Intensive
Care Unit Study";
- These statements illustrate produce a simple vertical bar
chart for the variable sex. The
VBAR statement produces a vertical
bar chart. A similar horizontal bar chart can be constructed using
a HBAR statement in place of a
VBAR statement. The option
DISCRETE indicates that the values of
the variable sex are discrete.
OPTIONS PAGESIZE=30;
- The size of the chart can be controlled with an
OPTIONS statement. The statement
PAGESIZE= sets the number of lines on
a page for the chart. The width of the chart can be controlled
with a LINESIZE= command in the
OPTIONS statement. Once an
OPTIONS statement is given, the
values for the options are in effect until a new
OPTIONS statement is issued.
PROC CHART DATA=icu1;
VBAR sex /DISCRETE GROUP=sta;
TITLE2 "Figure 2. Frequency Distribution of Gender by Survival
Status";
TITLE3 " for subjects in the Intensive Care Unit Study";
- These statements illustrate a vertical bar chart for the
variable sex, split by the variable
sta (survival status). The variable
used as a grouping variable is indicated by the
GROUP= option.
PROC CHART DATA=icu1;
VBAR sex /
DISCRETE
GROUP=sta
TYPE=PERCENT;
TITLE2 "Figure 3. Percent Frequency
Distribution of Gender by Survival Status";
TITLE3 " for subjects in the Intensive Care Unit
Study";
- These statements illustrate a vertical bar chart with the
vertical axis representing percent. Note that the percent is
calculated relative to the total number of subjects (in this
example 200 subjects), and not as a percent for each of the levels
of the GROUP variable
sta.
PROC CHART DATA=icu1;
VBAR age / SPACE=0;
TITLE2 "Figure 4. Frequency Distribution of Age for Subjects in
the ";
TITLE3 " Intensive Care Unit Study";
- Histograms can be created for continuous variables such as
age using PROC
CHART. The option SPACE= sets
the amount of space between the bars. Equal width intervals for
age are automatically determined for
the variable. The values of age in
the histogram are the midpoints of the class intervals.
PROC MEANS DATA=icu1;
VAR age;
TITLE2 "Table 1. Summary Statistics for Age";
TITLE3 " Intensive Care Unit Study";
PROC CHART DATA=icu1;
VBAR age /
MIDPOINTS=10 TO 90 BY 10
SPACE=0;
TITLE2 "Figure 5. Frequency Distribution
of Age for Subjects in the ";
TITLE3 " Intensive Care Unit Study";
- The interval midpoints can be set explicitly in a histogram
using the MIDPOINTS= option in
PROC CHART. First, we determine the
range of values for the variable age
using PROC MEANS. The variables whose
statistics are evaluated are given following the
VAR statement. The results from this
procedure tell us that age ranges
from 16 to 92 years for the subjects. Knowing the range, we define
age classes from 5-15, 15-25, etc. These classes are specified in
PROC CHART using the
MIDPOINTS= statement.
PROC CHART DATA=icu1;
VBAR age /
MIDPOINTS=10 TO 90 BY 10
SPACE=0
TYPE=PERCENT;
TITLE2 "Figure 6. Percent Relative
Frequency Distribution of Age for Subjects in the ";
TITLE3 " Intensive Care Unit Study";
- The TYPE=PERCENT option in
PROC CHART specifies that the
vertical axis represent percent rather than frequency.
PROC CHART DATA=icu1;
VBAR age /
MIDPOINTS=10 TO 90 BY 10
SPACE=0
TYPE=CPERCENT;
TITLE2 "Figure 7. Cumulative Percent
Distribution of Age for Subjects in the";
TITLE3 " Intensive Care Unit Study";
RUN;
- The TYPE=CPERCENT options in
PROC CHART specifies that the percent
represented in the intervals are the cumulated percent of subjects
have age less than or equal to the
age in the interval.