Practical Data Management and Statistical Computing (BioEp691F)


Contacts

Outline
Assignments

Resources


Outline
Outline: Lec1 Lec2 Lec3 Lec4 Lec5 Lec6 Lec7 Lec8 Lec9 Lec10
Lectures: Lec1 Lec2 Lec3 Lec4 Lec5 Lec6 Lec7 Lec8 Lec9 Lec10


Outline: Lec11 Lec12 Lec13 lec14 Lec15 Lec16 Lec17 Lec18 Lec19 Lec20
Lectures: Lec11 Lec12 Lec13 Lec14 Lec15 Lec16 Lec17 Lec18 Lec19 Lec20
Outline: Lec21 Lec22 Lec23 lec24 Lec25
Lectures: Lec21 Lec22 Lec23 Lec24 Lec25


Lecture 1

  • Scope of Course
  • Course Expectations
  • Computers
  • Background Survey
  • Example 1. Weight Change Program
    • Reading data with INPUT statement with mixed format
    • Printing SAS data set
    • Using PROC TABULATE to get means by group


Lecture 2

  • SAS STATEMENT keywords
    • DATA
    • INPUT
    • CARDS or DATALINES
    • RUN
  • Variable Names in SAS
  • Adding Comments: *
  • Adding Buttons to control processing.
  • Adding Titles
    • TITLE1 "Source: es99p1.sas 9/14/99";
    • TITLE2 "Mean weight at various times by group";
  • Saving programs and data
  • Using a previously created SAS data set.
    • SET
  • Creating a permanent SAS data set.
    • LIBNAME
    • Example:
      • LIBNAME new 'c:\temp\';
      • DATA new.lec1;
  • Documenting and Naming SAS programs with a Common Header
    • the OPTIONS statement
    • the Project Name
    • the TITLE1 statement (with program name, programmer, date)
    • the Description
    • data files read
    • data files created
    • the LIBNAME statements


Lecture 3

  • Review
    • Using a SAS data set again in the same SAS Session
      • SET
    • Creating a Permanent SAS Data set with a LABEL.
      • LIBNAME new 'c:\temp\';
      • DATA new.lec1 (LABEL="desc");
    • Getting the Contents of the data
      • PROC CONTENTS
  • Reading Data from a Previously Saved ASCII Data File
    • INFILE Statement
    • Downloading ASCII data from the WEB and stripping special characters
    • Identifying special ASCII characters in an ASCII data file.
  • Reading Data in Column Input.
    • Determining columns for variables in a data set
      • INFILE Statement with options: CARDS OBS= FIRSTOBS=
      • LIST statement


Lecture 4

  • Review
    • Downloading ASCII data from the WEB
    • Reading data from the ICU study.
  • Identifying ASCII codes in a data set
    • INFILE with the option MISSOVER (to go to a new record)
    • INPUT (v1-v5) ($1.) ; as a shorthand to input a set of variables
    • ARRAY v{5} ; to define an array of variables
    • DO i=1 TO 5; ...... END; to perform an operation on variables
    • FILE PRINT; to route output from PUT statements to the OUTPUT window
    • PUT ..... to write out results directly as data lines are processed.
    • SAS Functions
      • BYTE(v1) to convert an ASCII code to the ASCII character.
      • RANK(v1) to determine the ASCII code equivalent of an ASCII character variable
  • Using ANALYZE to create a description of data


Lecture 5

  • Using PROC PRINT
    • Basic Printing Tasks
      • Options to PROC PRINT DATA= (OBS= ) NOOBS
      • Optional Statements:
        • VAR
        • WHERE
        • ID
    • Special Printing Tasks
      • Options to PROC PRINT SPLIT=" /" ROWS=PAGE
      • Optional Statements
        • BY (with PROC SORT)
        • SUM
        • SUMBY
    • Making Lists easy to read
      • Options to PROC PRINT HEADING=V WIDTH=U
  • Naming Values for Categorical Variables (PROC FORMAT)
    • Creating format names with PROC FORMAT
      • Optional Statements
        • VALUE
    • Linking format names to variables (FORMAT)
      • With PROC PRINT
      • With PROC FREQ
  • Using PROC FREQ
    • Adding Format names
    • Required TABLES Statement
      • Optional Statements
        • NOROW
        • NOCOL
        • NOCUM NOPERCENT
        • MISSING
        • MISSPRINT
        • LIST

       


Lecture 6

  • Using PROC CHART
  • Using PROC MEANS
  • Using PROC PLOT
  • Using PROC UNIVARIATE
  • Using PROC GCHART


Lecture 7

  • Using Endnote 3.0 for Bibliography Management
    • Construction of a reference list with Endnote
    • Working with references
    • Creating a bibliography for a research paper (using MS Word)
    • Searching remote citation data bases


Lecture 8

  • Making a Web Site
  • Using Claris HOMEPAGE
    • Overview
    • Basics
    • Initial Upload at UMass
    • Password Protection


Lecture 9

  • Reading ASCII data with more than one line per record
    • use of / or #1
    • check for matching duplicate variables
    • check for properly ordered lines
    • using a subset of variables with a KEEP option in a DATA statement.
  • More on PROC FORMAT
    • Saving permanent Formats
    • Using CNTLOUT and CNTLIN options.
    • Defining formats for character variables ($fmtname)


Lecture 10

  • Alternative Ways to store PROC FORMATS
    • use of CNTLOUT and CNTLIN
    • use of automatic LIBRARY="libref" option
      • automatically generates a Format Catalogue called FORMATS.SC2
      • uses the same name for the catalogue for all projects
      • use FMTSEARCH=(libref) in OPTIONS statement to use the formats
    • use NOMFTERR in OPTIONS statement to override unknown formats saved in a SAS data set.
  • Reading data with unequal # lines per subject- introduction


Lecture 11

Creating Multiple Data sets in a DATA step

  • OUTPUT statement
  • Automatic FIRST.xxx and LAST.xxx variables
  • RETAIN to keep values to use in another record

Working with Data in a DATA step

  • concatenating two SAS data sets with a SET statement
  • using a match MERGE statement with a BY variables

High Resolution Graphics

  • SYMBOL# statement
    • I=JOIN to connect points
    • C=BLACK to select color
    • R=100 to repeat for more SYMBOL# statements
  • PROC GPLOT
    • PLOT x*y=z to overlay plots by z
    • option NOLEGEND to supress the legend for subjects


Lecture 12

  • Automating DOS and Windows Commands in SAS
    • Use X , return with EXIT
  • Reading older SAS data sets with Engines
    • LIBNAME V604 old4 'c:\data';
  • Locating ASCII data
    • FILENAME statement
    • INFILE statement
  • More on LIST input
    • deliminators
    • reading variables with an embedded blank


Lecture 13

  • List Input Problems
    • Missing Data: INFILE options MISSOVER, LRECL
    • Embedded blanks: &
    • Long character Variables: $12.
  • Column Input Options
    • Pointer Control: +3
    • Multiple lines per record: #1, #2, or /
    • Control of Columns: @16
  • INFILE Options
    • DELIMITER=",";
  • INFORMATS
    • Dates


Lecture 14

  • More on DATE INFORMATS
    • DATE8. formats (with month names)
    • MMDDYY8. formats (all numbers)
  • Year 2000 Issues
    • Setting the OPTION YEARCUTOFF=1920;
  • Creating age from birthdates
  • Other INFORMATs
    • Reading dollar amounts: COMMAxx.x
    • Reading addresses and phone numbers
  • Using LENGTH statements
  • Using Character functions
    • LENGTH function (counts non-blank characters)
    • SCAN function (identify words)
    • TRIM function (deletes trailing blanks)
    • LEFT function (left justifies a string)
  • Writing an Address List
    • the FILE statement
    • Special SAS KEYWORDS:
      • _NULL_ a data set that contains nothing
      • _N_ a counter for each consecutive records in a data step
  • Using PUT statements


Lecture 15

  • Using LENGTH statements for character variables
    • use to prevent the first occurance of a new character variable setting to small a character length
  • Using LENGTH statements for numeric variables (and dangers)
    • generally do not set length for numeric variables- allow them to have length 8.


Lecture 16

  • Reading DATA saved by other Data Base Systems
    • Access files *.sa2
    • View Files *.sv2
  • Reading DBASE files with the IMPORT wizzard
  • Reading EXCEL files with the IMPORT wizzard
  • Using DBMS COPY to translate files
  • Reading Pipe Deliminated files through MS ACCESS to EXCEL to SAS


Lecture 17

  •  Macro Variables
    • Defining Numeric: % LET a=25;
    • Using: TITLE2 "Study with &a subjects";
  • INPUT statement with Trailing @ to hold the line for next INPUT
  • SELECT ; END; statements
  • INPUT statement with Trailing @@ to hold line for repeated input
  • Options that make Debugging SAS programs Easier
    • Using the LIBRARY ICON
    • Using KEEP and DROP in DATA/SET statemtents
    • USING SPLIT="*" with PROC PRINT, and FORMATS
  • More on SAS VIEWS


Lecture 18

  • Handling MISSING VALUES in SAS
    • Recogonizing Missing and Invalid Numeric Values
      • Using OPTION INVALIDDATE="Z" to assign all characters to the special missing value .Z
    • Recogonizing Missing and Invalid Character Variables
    • Converting Numberic and Character Variables
    • Converting character values to numeric values
  • Details on Storage of numeric variables on PCs and in SAS
  • Recoding missing value codes to SAS missing values and other Special Missing Values


Lecture 19 

  • Using EPI INFO to enter and verify data


Lecture 20

Combining and Manipulating Data

  • Overview
    • DATA Step Options: SET, MERGE, UPDATE
    • PROCedure Options: APPEND, COPY, DATASETS, TRANSPOSE
    • Example: 5 women measured at three times resulting in 3 data sets
  • Concatenating SAS data sets
    • Problems with different variable names for different times.
    • Pre-sorting to InterLeave SAS data sets: BY ID;
    • Using RETAIN to keep a variable for subsequent records
    • Using FIRST. and LAST. variables with BY statement.
    • Using Automatic IN variables to control processing
  • Merging SAS Data Sets
    • Sort and MATCH merge
    • BY statement
    • Avoid overlapping variables: last data set takes priority (order matters)
    • Duplicate records
    • Merging by more than one variable
    • Using IN variables in a merge
    • Using FIRST. and LAST. variables in a MERGE


Lecture 21

  • Creating a Codebook
  • Creating a WEB site to Document Data


Lecture 22

  • Selecting a simple random sample with replacement 
    • Using a Uniform Random Number Generator
  • Selecting a simple random sample without replacement
    • Using a RETAIN statement
    • Using OUTPUT statements
    • Using PUT statements to see how it works.
  • Selecting a stratified simple random sample


Lecture 23

  • Simulations (Illustrating the central limit theorem)
    • SAS random number generators
    • Controlling the random numbers with (SEED)
    • Using a CALL statement to control several strings of random numbers
    • Using RANBIN to generate binomial random numbers
    • Using RANNOR to generate normal (0,1) random variables
    • Evaluating the Central Limit Theorem with n=7
    • Evaluating the Central Limit Theorem with n=30
    • Using MACRO variables the Simulation.


Lecture 24

  •  Simulations (Evaluating how good the Relative Odds ratio is to the Risk Ratio in studies)
    • The Problem
    • The Cohort Study
    • The Case Control Study
    • Writing a MACRO program
      • %MACRO name (a,b);
      • %MEND name;
      • %name(2,5)
    • A simple example
    • Designing the simulation
    • Using Arrays
    • Conducting the Simulation
    • Using MACROs to help


Lecture 25

 


Produced and maintained by the Dept of BioEpi at UMASS
Send comments or questions about this web site to Ed Stanek
Email:
stanek@schoolph.umass.edu
\be691f\web\webready\outline.html
Lst Update: 10/27/99