Department of:
|
|
|
Introduction Statistical programs allow calculations to be performed easily on sets of values. For example, when age is reported on 100 subjects, a statistical program can easily calculate the average age of the group. In order to perform the calculation, the statistical program needs to know where to find the values of age for the 100 subjects. Normally, these values are contained in a data file, which consists of an ordered array (table) of values. Rows in the array correspond to subjects, and columns in the array correspond to the values of (the same) variables for the subject. Before a statistical program can calculate an average (or any other formula), the program needs to know where the data are located, and how to refer to columns in the data file. The process of defining these characteristics of the rectangular array is the first step in using the program. Where the data are located normally corresponds to the path on the computer. The columns in the data correspond to variable names. Often after the data location and variable names are defined, a copy of the data file is saved as a "system" data set. Such a data set can only be used directly by the statistical program that created the data set. The advantage of saving a data set as a "system" data set is that variables names are automatically associated with columns of data. The definitions do not have to be repeated. Once a data file is defined for a statistical program, simple commands can perform calculations on the data. Calculations include such things as creating bar charts, forming frequency distributions, calculating the mean and standard deviation, and performing the calculations needed for various statistical analysis.
Data Files Data files are usually rectangular arrays (tables) with rows and columns. Normally, values on the same variable (such as age) are stored in a single column. Rows in the table correspond to subjects, with the values in columns for a given row corresponding to values for the subject. Often, there will be several variables are measured on each subject, and hence several columns, with one column for each variable. A common format for data files is the ASCII format. Such data files can be viewed in NOTEPAD (in Windows 95), and end with a suffix *.txt, or *.dat. The data files can also be viewed using wordprocessors (such as WORD or WordPerfect). However, when saving data files, it is important that the file TYPE be specified as an ASCII file. When viewing files in a wordprocessor, values in columns will line up in columns when the font is set to a Courier font (or some other non-proportional font).
Initially Reading Data into Statistical Programs We assume that data files are stored as ASCII files with rows corresponding to subjects, and columns corresponding to variables. To read data into a statistical program, we specify in the statistical program:
The initial location of the data file is determined by you (such as C:\DATA\icu.dat) while information on the variables contained in each column is normally contained in a code book. The specification in statistical software may be given by:
The first specification is easiest when one is not familiar with the software commands. It is also valuable for exploring variables in a data set. This is often used for descriptive analysis. However, for large statistical software programs, the menus themselves can become complex. Also, it may be difficult to retrace sequences through menu commands. For these reasons, specification based on command lines (usually in "batch" programs) is preferable when doing more extensive statistical computing.
Describing Variables in Data Sets Once data are stored as a permanent data set, they can be copied to a floppy disk, and used again by the statistical program. Different selections of procedures may be requested for variables, such as calculating the mean, the variance, or the frequency distribution. The distribution of variables may be displayed using bar charts, histograms or other visual displays.
Putting Statistical Results in a Word Processing Document Rarely will results directly given by a statistical program be self explanatory. Usually, the results will be more easily understood if placed in the body of a report. Writing such a report is easy using a wordprocessor. The report should include the following ingredients:
Each section may be brief. In a multi-tasking environment (such as Windows 95), the results from the statistical program may be cut and pasted into the word processor. While proportional fonts are useful for written text, tables from programs usually must be aligned with non-proportional fonts. |