|

|
|
Practical Data
Management and Statistical Computing
(BioEp691F)
Outline
Outline: Lec1 Lec2
Lec3 Lec4 Lec5
Lec6 Lec7 Lec8
Lec9 Lec10
Lectures: Lec1
Lec2
Lec3
Lec4
Lec5
Lec6
Lec7
Lec8
Lec9
Lec10
Outline: Lec11 Lec12
Lec13 lec14
Lec15 Lec16
Lec17 Lec18
Lec19 Lec20
Lectures: Lec11
Lec12
Lec13
Lec14
Lec15
Lec16
Lec17
Lec18
Lec19
Lec20
Outline: Lec21 Lec22
Lec23 lec24
Lec25
Lectures: Lec21
Lec22
Lec23
Lec24
Lec25
Lecture 1
- Scope of Course
- Course Expectations
- Computers
- Background Survey
- Example 1. Weight Change Program
- Reading data with INPUT statement with mixed
format
- Printing SAS data set
- Using PROC TABULATE to get means by group
Lecture 2
- SAS STATEMENT keywords
- DATA
- INPUT
- CARDS or DATALINES
- RUN
- Variable Names in SAS
- Adding Comments: *
- Adding Buttons to control processing.
- Adding Titles
- TITLE1 "Source: es99p1.sas 9/14/99";
- TITLE2 "Mean weight at various times by
group";
- Saving programs and data
- Using a previously created SAS data set.
- Creating a permanent SAS data set.
- LIBNAME
- Example:
- LIBNAME new 'c:\temp\';
- DATA new.lec1;
- Documenting and Naming SAS programs with a Common
Header
- the OPTIONS statement
- the Project Name
- the TITLE1 statement (with program name,
programmer, date)
- the Description
- data files read
- data files created
- the LIBNAME statements
Lecture 3
- Review
- Using a SAS data set again in the same SAS Session
- Creating a Permanent SAS Data set with a LABEL.
- LIBNAME new 'c:\temp\';
- DATA new.lec1 (LABEL="desc");
- Getting the Contents of the data
- Reading Data from a Previously Saved ASCII Data File
- INFILE Statement
- Downloading ASCII data from the WEB and
stripping special characters
- Identifying special ASCII characters in an
ASCII data file.
- Reading Data in Column Input.
- Determining columns for variables in a data set
- INFILE Statement with options: CARDS OBS=
FIRSTOBS=
- LIST statement
Lecture 4
- Review
- Downloading ASCII data from the WEB
- Reading data from the ICU study.
- Identifying ASCII codes in a data set
- INFILE with the option MISSOVER (to go to a new
record)
- INPUT (v1-v5) ($1.) ; as a shorthand to input a
set of variables
- ARRAY v{5} ; to define an array of variables
- DO i=1 TO 5; ...... END; to perform an operation
on variables
- FILE PRINT; to route output from PUT statements to
the OUTPUT window
- PUT ..... to write out results directly as data
lines are processed.
- SAS Functions
- BYTE(v1) to convert an ASCII code to the
ASCII character.
- RANK(v1) to determine the ASCII code
equivalent of an ASCII character variable
- Using ANALYZE to create a description of data
Lecture 5
- Using PROC PRINT
- Basic Printing Tasks
- Options to PROC PRINT DATA= (OBS= )
NOOBS
- Optional Statements:
- Special Printing Tasks
- Options to PROC PRINT SPLIT=" /"
ROWS=PAGE
- Optional Statements
- BY (with PROC SORT)
- SUM
- SUMBY
- Making Lists easy to read
- Options to PROC PRINT HEADING=V
WIDTH=U
- Naming Values for Categorical Variables (PROC FORMAT)
- Creating format names with
PROC FORMAT
- Linking format names to variables
(FORMAT)
- With PROC PRINT
- With PROC FREQ
- Using PROC FREQ
- Adding Format names
- Required TABLES Statement
- Optional Statements
- NOROW
- NOCOL
- NOCUM NOPERCENT
- MISSING
- MISSPRINT
- LIST
Lecture 6
- Using PROC CHART
- Using PROC MEANS
- Using PROC PLOT
- Using PROC UNIVARIATE
- Using PROC GCHART
Lecture 7
- Using Endnote 3.0 for Bibliography Management
- Construction of a reference list with Endnote
- Working with references
- Creating a bibliography for a research paper
(using MS Word)
- Searching remote citation data bases
Lecture 8
- Making a Web Site
- Using Claris HOMEPAGE
- Overview
- Basics
- Initial Upload at UMass
- Password Protection
Lecture 9
- Reading ASCII data with more than one line per
record
- use of / or #1
- check for matching duplicate variables
- check for properly ordered lines
- using a subset of variables with a KEEP option in
a DATA statement.
- More on PROC FORMAT
- Saving permanent Formats
- Using CNTLOUT and CNTLIN options.
- Defining formats for character variables
($fmtname)
Lecture 10
- Alternative Ways to store PROC FORMATS
- use of CNTLOUT and CNTLIN
- use of automatic LIBRARY="libref" option
- automatically generates a Format Catalogue
called FORMATS.SC2
- uses the same name for the catalogue for all
projects
- use FMTSEARCH=(libref) in OPTIONS statement to
use the formats
- use NOMFTERR in OPTIONS statement to override
unknown formats saved in a SAS data set.
- Reading data with unequal # lines per subject-
introduction
Lecture 11
Creating Multiple Data sets in a DATA step
- OUTPUT statement
- Automatic FIRST.xxx and LAST.xxx variables
- RETAIN to keep values to use in another record
Working with Data in a DATA step
- concatenating two SAS data sets with a SET
statement
- using a match MERGE statement with a BY
variables
High Resolution Graphics
- SYMBOL# statement
- I=JOIN to connect points
- C=BLACK to select color
- R=100 to repeat for more SYMBOL# statements
- PROC GPLOT
- PLOT x*y=z to overlay plots by z
- option NOLEGEND to supress the legend for
subjects
Lecture 12
- Automating DOS and Windows Commands in SAS
- Reading older SAS data sets with Engines
- LIBNAME V604 old4 'c:\data';
- Locating ASCII data
- FILENAME statement
- INFILE statement
- More on LIST input
- deliminators
- reading variables with an embedded blank
Lecture 13
- List Input Problems
- Missing Data: INFILE options MISSOVER, LRECL
- Embedded blanks: &
- Long character Variables: $12.
- Column Input Options
- Pointer Control: +3
- Multiple lines per record: #1, #2, or /
- Control of Columns: @16
- INFILE Options
- INFORMATS
Lecture 14
- More on DATE INFORMATS
- DATE8. formats (with month names)
- MMDDYY8. formats (all numbers)
- Year 2000 Issues
- Setting the OPTION YEARCUTOFF=1920;
- Creating age from birthdates
- Other INFORMATs
- Reading dollar amounts: COMMAxx.x
- Reading addresses and phone numbers
- Using LENGTH statements
- Using Character functions
- LENGTH function (counts non-blank characters)
- SCAN function (identify words)
- TRIM function (deletes trailing blanks)
- LEFT function (left justifies a string)
- Writing an Address List
- the FILE statement
- Special SAS KEYWORDS:
- _NULL_ a data set that contains nothing
- _N_ a counter for each consecutive records in a
data step
- Using PUT statements
Lecture 15
- Using LENGTH statements for character variables
- use to prevent the first occurance of a new
character variable setting to small a character
length
- Using LENGTH statements for numeric variables (and
dangers)
- generally do not set length for numeric variables-
allow them to have length 8.
Lecture 16
- Reading DATA saved by other Data Base Systems
- Access files *.sa2
- View Files *.sv2
- Reading DBASE files with the IMPORT wizzard
- Reading EXCEL files with the IMPORT wizzard
- Using DBMS COPY to translate files
- Reading Pipe Deliminated files through MS ACCESS to
EXCEL to SAS
Lecture 17
- Macro Variables
- Defining Numeric: % LET a=25;
- Using: TITLE2 "Study with &a subjects";
- INPUT statement with Trailing @ to hold the line for
next INPUT
- SELECT ; END; statements
- INPUT statement with Trailing @@ to hold line
for repeated input
- Options that make Debugging SAS programs Easier
- Using the LIBRARY ICON
- Using KEEP and DROP in DATA/SET
statemtents
- USING SPLIT="*" with PROC PRINT, and FORMATS
- More on SAS VIEWS
Lecture 18
- Handling MISSING VALUES in SAS
- Recogonizing Missing and Invalid Numeric Values
- Using OPTION INVALIDDATE="Z" to assign all
characters to the special missing value .Z
- Recogonizing Missing and Invalid Character
Variables
- Converting Numberic and Character Variables
- Converting character values to numeric values
- Details on Storage of numeric variables on PCs and in
SAS
- Recoding missing value codes to SAS missing values
and other Special Missing Values
Lecture
19
- Using EPI INFO to enter and verify data
Lecture 20
Combining and Manipulating Data
- Overview
- DATA Step Options: SET, MERGE, UPDATE
- PROCedure Options: APPEND, COPY, DATASETS,
TRANSPOSE
- Example: 5 women measured at three times resulting
in 3 data sets
- Concatenating SAS data sets
- Problems with different variable names for
different times.
- Pre-sorting to InterLeave SAS data sets: BY
ID;
- Using RETAIN to keep a variable for subsequent
records
- Using FIRST. and LAST. variables with
BY statement.
- Using Automatic IN variables to control
processing
- Merging SAS Data Sets
- Sort and MATCH merge
- BY statement
- Avoid overlapping variables: last data set takes
priority (order matters)
- Duplicate records
- Merging by more than one variable
- Using IN variables in a merge
- Using FIRST. and LAST. variables in a MERGE
Lecture 21
- Creating a Codebook
- Creating a WEB site to Document Data
Lecture 22
- Selecting a simple random sample with
replacement
- Using a Uniform Random Number Generator
- Selecting a simple random sample without replacement
- Using a RETAIN statement
- Using OUTPUT statements
- Using PUT statements to see how it works.
- Selecting a stratified simple random sample
Lecture 23
- Simulations (Illustrating the central limit theorem)
- SAS random number generators
- Controlling the random numbers with (SEED)
- Using a CALL statement to control several strings
of random numbers
- Using RANBIN to generate binomial random
numbers
- Using RANNOR to generate normal (0,1) random
variables
- Evaluating the Central Limit Theorem with n=7
- Evaluating the Central Limit Theorem with
n=30
- Using MACRO variables the Simulation.
Lecture 24
- Simulations (Evaluating how good the Relative
Odds ratio is to the Risk Ratio in studies)
- The Problem
- The Cohort Study
- The Case Control Study
- Writing a MACRO program
- %MACRO name (a,b);
- %MEND name;
- %name(2,5)
- A simple example
- Designing the simulation
- Using Arrays
- Conducting the Simulation
- Using MACROs to help
Lecture
25
|