Practical Data Management
and Statistical Computing (BioEp691F)
Outline: Lec21 Lec22
Lec23 lec24
Lec25
Lectures: Lec21
Lec22
Lec23
Lec24
Lec25
Lecture 21
Creating a Codebook
A coding manual or codebook completely describes the data and how
it was processed. It is the last step in data management in a study,
and serves as a reference for analyses. The codebook includes:
- Brief summary of data collection protocol, pre-processing
protocol, and questionnaire with variable names.
- Data entry protocol for the study, including special files for
entry and verification.
- Special variable creation and coding decisions.
- Contents and names of ASCII and SAS data sets.
- Optional - Proc FREQ (Univariate) for all variables.
- Optional - Printed list,
- ASCII Data set and list
- Names and locations of all data sets
Example: Epi-Info Smoking Study
(Homework
#14)
Pre-processing
Review questionnaires, enter ID number if missing if
adjacent forms match sex and birthdate, verify that circle codes
are blackened when response is given.
Data Entry Protocol
- Order questionnaires by ID number from smallest to
largest.
- Enter data beginning with ID, study date, date of birth,
gender, age, and then other questions.
- Enter a 2 digit ID, with leading "0".
- Use a single character for sex (m or f).
- Enter leading "0" for leading blanks with dates, with no "/".
All should be 6 columns.
- Enter a 2 digit age, including a leading "0" if
necessary.
- Leave missing values blank.
- Verify all data entry (with EpiInfo).
(see LEC22P1.SAS for example)
Special Variable Creation/Coding Decisions
- Check age calculation, and replace with age based on study
dates and birthdate if different. Create list of ages where there
is a miss-match, and manually examine them.
- Check for consistency of NA responses for smoking.(see
LEC22P2.SAS and output).
Creating a Study Data
Documentation WEB site
A Study WEB site is a valuable way to document data. Such a site
is
- permanent
- organized
- easy to find
The WEB site need not be elaborate. A simple site will consist of
the following:
- Site home page
- Title
- Brief (3 line) description
- Links to:
- Instrument (pdf)
- Codebook Page
- Variable definitions
- Special considerations
- Data Page
- Diagrams of data set creation
- Proc contents of data set
- Link to:
- SAS Programs
- Entry programs
- More Details
Page
- Diagram of the site
Example