Practical Data Management and Statistical Computing (BioEp691F)

Contacts

Outline
Assignments

Resources


Outline: Lec11 Lec12 Lec13 lec14 Lec15 Lec16 Lec17 Lec18 Lec19 Lec20
Lectures: Lec11 Lec12 Lec13 Lec14 Lec15 Lec16 Lec17 Lec18 Lec19 Lec20


Lecture 14


1. More on Using Informats for Dates and Commas with Column Input

Reading Dates

In order for SAS to recognize a date and convert it to a numeric value, the date must be entered following some set of rules. Different rules can be used, and a choice must be made as to which rule will be followed when entering the dates in an ASCII data set.

Data Entry
Informat

Data Entry
Informat

1/12/60

MMDDYY8.

12jan1960

DATE9.

1 12/60

MMDDYY8.

12 jan 60

DATE9.

1.12 60

MMDDYY8.

12jan 60

DATE8.

01/12/1960

MMDDYY10.

12jan60

DATE7.

18/1/60

DDMMYY8.

12 January 1960

DATE15.

31/12/59

DDMMYY8.

1959/12/30

YYMMDD10.

31/12/1959

DDMMYY10.

591230

YYMMDD6.


A Caution with Missing Dates:

If a date is missing, the INFORMAT acts as if it is a Character variable format. Thus, as many columns as are specified will be skipped when reading the data from the missing date, as in LEC15P1.SAS. Note that in the OUTPUT, data for ID #'s 5, 6, and 10 are read incorrectly. These errors are identified in the LOG.


Using an INFORMAT Statement and a DATE8. Format

Date formats can be specified in an INFORMAT statement prior to an INPUT statement, as illustrated in LEC15P2.SAS. Notice that the OUTPUT corresponds to the dates represented since data were entered in COLUMN input. When reading data with an INFORMAT Statement:

Dates can be printed using a DATExx Format in a similar manner. Without a DATExx. format, dates will be displayed as simple numbers, corresponding to how they are stored in SAS.


Creating AGE from Subject's Birth Dates and Today's Date

Commonly, data on subjects with contain the subject's birth date. To determine a subject's age at a particular study start date, we divide the number of days a person is old by 365.25, and then drop the remainder (using an integer function INT( ) ) as shown by LEC15P3.SAS, with results given in the OUTPUT. This program uses:

When a person's birth date is reported as 11/21/01, we may presume the actual birthdate is the 21st of November, 1901. As we reach the year 2002, it will not be obvious as to whether this representation of a date is for a birthdate in 1901, or 2001. The OPTION YEARCUTOFF=1900 specifies how such data will be read. In SAS 6.12, the default value for YEARCUTOFF is 1900. In SAS 7.0, the default value for YEARCUTOFF is 1920. This means that if your are using SAS 7.0, a person born on Nov. 21, 2001, whose birthdate is recorded as 11/21/01 will be read correctly in SAS. On the other hand, this same birthdate read in SAS 6.12 will be read as Nov. 21, 1901.

To avoid future SAS version problems, include on the OPTIONS statement YEARCUTOFF=1920;


Reading Dollar Amounts

In order for SAS to recognize a $ amounts and convert them to a numeric value, the variable with the $ amounts is assigned a COMMAxx.x INFORMAT. The values of xx.x specify how the $ amounts will be read. We illustrate this in LEC15P4.SAS with the results given in the OUTPUT.


2. Reading Addresses and Phone Numbers 

A common aspect of many studies is maintaining a list of study participants and their contact information. Such data usually consists of names and addresses, phone numbers, and ID numbers for linking such data with other data. Manipulating such data requires manipulating character variables. We consider a simple example of such an application here, given by LEC15P5.SAS (with contact data taken from the SAS Language and Procedures Manual: Usage Ver 6, 1st Edition pp382) and the following OUTPUT .

Note that the program reads the data in LIST format, since names and addresses do not correspond to particular column fields. Fortunately, these data have been entered with each field separated by a "double blank". As a result, the "&" can be used to separate entries that are read. Note in the OUTPUT that none of the fields with names longer than 8 characters are read into SAS completely. Although no "ERROR" message is given in the SAS program, not all the information is read correctly.


In an attempt to read these data correctly into SAS, we need to identify the length of the character fields. Setting the field length to the maximum length observed for a variable, we find that the program LEC15P6.SAS reads the data correctly (as shown in the OUTPUT). The contents of the SAS data set is also given.

If there is a large address list, we may not know how many columns to allocate to the name, address, and city fields. One solution to this problem is to allocate a generous number of columns to each field, and then use the LENGTH function . This function displays the number of non-blank characters in a character variable, as illustrated in LEC15P7.SAS and the OUTPUT.


Note that the when using the COLUMN input format for CITYS, the "blank" spaces between the city name and the state remain in the variable. These spaces can be removed using the SAS function SCAN and character concatenation. The SCAN function will abstract words (separated by a deliminator) from a character variables. The TRIM function deletes trailing blanks, and the LEFT function left justifies characters. Character variables can be pasted together (concatenated) using || . An example is given in LEC15P8.SAS with the OUTPUT. Notice that manipulating character variables can create very long length variables. To prevent this with created variables, we use a LENGTH statement as in LEC15P9.SAS and the OUTPUT.

 


Using PUT and FILE to create Address Lists

We can create a list of names and addresses for subjects by writing the data to an ASCII file, as illustrated in LEC15P10.SAS, with the file given in ADDRESS1.TXT. There are several noteworthy features of this program:

Example with Multiple Longitudinal Models


Using a LENGTH statement to create Character Values in a DATA Steps

On occasion, there may be interest in assigning character variables to data. For example, in the previous example, suppose that over time, follow-up information is being recorded on each subject. Some subjects are "active", while other subjects have "dropped out". We may add a variable with the follow-up status to the listing of IDS. Suppose for example, the subjects with ID 1, 3, 5, and 6 are "active", while the other subjects have "dropped out". A list of the IDs, names, and follow-up status may be attempted with the program LEC15P11.SAS with OUTPUT.

We correct the problem of too narrow a length by adding a length variable prior to the INPUT statement in LEC15P12.SAS, resulting in the following OUTPUT.


Readings and Reference:



Produced and maintained by the Dept of BioEpi at UMASS
Send comments or questions about this web site to Ed Stanek
Email:
stanek@schoolph.umass.edu
\be691f\web\webready\lec14.html
Lst Update: 11/2/99