|
|
|
|
On occasion, there may be interest in assigning character variables to data. For example, in the previous example, suppose that over time, follow-up information is being recorded on each subject. Some subjects are "active", while other subjects have "dropped out". We may add a variable with the follow-up status to the listing of IDS. Suppose for example, the subjects with ID 1, 3, 5, and 6 are "active", while the other subjects have "dropped out". A list of the IDs, names, and follow-up status may be attempted with the program LEC15P11.SAS with OUTPUT.
We correct the problem of too narrow a length by adding a length variable prior to the INPUT statement in LEC15P12.SAS, resulting in the following OUTPUT.
Numeric variables are relatively straight-forward to work with in SAS. The main difficulty that can occur results from attempts to save space in SAS data sets. First, note that since all computers ultimately use BASE 2 to perform all calculations, there are not exact BASE 2 expressions for numbers like 1/3, or 1/6 or other decimal numbers. Furthermore, there are limits in precision for even integer numbers. These limits can cause problems when operating on numeric variables. The program LEC16P1.SAS illustrates some of these limits, as shown in the output..
Note that while the first statements produce a "MATCH" for variables A and B, the value of A is not equal to the value of 1/3 specified in the program. Furthermore, note that the value of D in the program differs, while the value of D in the output is the same. Both examples are due to limits of precision on the computer for numeric value representations.
SAS uses 8 bytes to represent a numeric value (such as the value of D in LEC16P1.SAS ). With 8 bytes, SAS can represent an integer as large as 2 to the 53rd power=9,007,199,254,740,992 exactly. Many integer values can be represented exactly with fewer bytes. Specifying a length less than 8 bytes means that variables will occupy less space in SAS data sets. The Table below (from SAS LANGUAGE GUIDE, Version 6.03, p198) indicates the maximum integer that can be expressed exactly by the number of Bytes.
Table 1. Number of Bytes and Maximum Integer that Can Be
Represented Exactly
|
# of Bytes |
Power of 2 |
Maximum Integer |
|
3 |
13 |
8,192 |
|
4 |
21 |
2,087,152 |
|
5 |
29 |
536,870,912 |
|
6 |
37 |
137,438,953,472 |
|
7 |
45 |
35,184,372,088,832 |
|
8 |
53 |
9,007,199,254,740,992 |
It is desirable to limit the length for integer variables so as to minimize data set size. For example, in the program LEC16P2.SAS, 1000 records with 100 variables in each record are created, where in one data set, all variables have length 8, while in the other data set, all variables have length 3. The data set with the smaller length is stored as a smaller data set on the C:\TEMP directory.
The file D1.SD2 contains 817,000 Bytes, while the file D2.SD2 contains 333,000 bytes. Thus the savings in storage space is roughly proportional to the percent reduction in the length of the variables.
Problems occur when variables are stored with a given length (to save storage space) that is inadequate for all of the values of the variables. An example is given in LEC16P3.SAS, where a variable, age, is reported in years for subjects, with a value of 99.9 corresponding to the missing value. The length has been set for age to 3, since the integer values of age are less than 3 digits long. However, since the missing value code is a decimal value, defining age of length 3 will mis-represent the decimal code, as illustrated in the output.
|
|
Produced and maintained by the
Dept
of BioEpi at UMASS |