Practical Data Management and Statistical Computing (BioEp691F)


Contacts

Outline
Assignments

Resources


Research
Problems

Homework

Group Problems/
Exams

Grades


Homework 16

Due: 12/9/99


Selecting Samples


1. Rats in a lab are housed in numbered cages. You are asked to select a simple random sample with replacement of n=15 rat cages. The cages containing rats are numbered as follows:

1 2 4 6 8 9 10 12 16 19 22 23 24 25 26 27 30 33 38 40 55 56 58 61 72

Write a SAS program to select your sample. Include in a 1/2 page write up a list of the sample cages selected, and a copy of your program in an appendix.


2. The computer code to select a simple random sample without replacement appears to give different chances of selection to different subjects. We consider here a simple setting where the population consists of N=5 subjects (A, B, C, D, and E), and we wish to select a simple random sample of size 2. Using the logic of the computer program, write out all the possible ways in which the samples could occur, and the probability of each one occurring. (You may use flow-diagrams to help in this.) Turn in your diagrams. [The diagrams should verify that each pair of subjects is equally likely to be selected].
3. After reviewing the sample that was selected in Exercise #1, the researcher decided that a simple random sample without replacement should have been selected. Write a SAS program to perform the selection, and summarize your results in a brief writeup, with the program in an appendix.
Additional Optional Problems
4. In a study of soil ingestion at ANACONDA, not all children or families were eligible for inclusion in the study. Although 258 families were initially identified, numerous families were excluded for a variety of reasons.

Families were excluded if they had moved out of the area (n=3); if children were under split custody (n=9); if there had been problems with keeping appointments (n=7); if the family had completely refused a previous urine collection survey (n=2); or if families refused to submit urine samples (n=2). A total of 23 families were excluded based on these criteria (8.9%), with 25 children excluded. (Reference: ANC3.sas)

A total of 283 children between the ages of 1 and 4 were identified in the 235 remaining families. Along with the exclusions for families, children were excluded from participation if they had a disability (n=2); the child refused to give urine (n=4); or the children attended day care (n=57). This resulted in 63 additional children excluded (63/283*100= 22.2%). The most common cause for exclusion was attendance of day care. These children were excluded primarily for feasibility reasons, since children attended numerous small day care centers in Anaconda. Exclusion of these children resulted in no eligible children in an additional 47 families. As a result, a total of 220 children in 188 families served as the population for sample selection in this study. Of these 220 children, 29 were in an "at home" day care setting. Thirteen of the families had not yet completed the urine collection survey.

A stratified simple random sample was selected after dividing the community into six geographic areas. Families served as the unit of sample selection, with a simple random sample of 64 families selected from the 188 in the eligible population (34%). When more than one child was eligible in a selected family, a constrained random selection was used to pick the child so as to most closely balance the sample size in age groupings 12-23 months, 24-35 months and 36-47 months. The same sampling fraction was used in each stratum, so as to result in a self-weighted sample. Suppose these 188 children are distributed in the following manner by stratum, select a stratified simple random sample without replacement of 64 children.

Stratum

1
2
3
4
5
6

# of Children

20
40
15
19
80
14
Write a report indicating your sample, and including the program in an appendix.


5. Selecting a Sample from the Hispanic Community in Springfield using a Geographic Field Listing

[Files needed: IT1.DTA with Listing of DUs by Census tract and Block ]

The Hispanic population in Springfield is concentrated in a relatively small area. There is a neighborhood community health center (Brightwood) that serves the community. A coalition of health and neighborhood leaders is interested in the extent to which the health care needs of the Hispanic women are being met in the community. They are also interested in identifying any possible barriers towards receivind health care. As a result, a survey was planned for the community.

In part to satisfy the interested group, the survey was planned to cover all blocks in a geographically defined area. The area is defined by census tracts and block boundaries, with households enumerated in the community. Although census counts are available for the number of Dwelling units (DUs) in each block, the census counts are not necessarily accurate. For this reason, the number of DU's was enumerated via street canvassing of the area for each block

In each block, each DU on the block was listed. An example of the listing process is included with this question. The listing sheets describe the DU's in the census tract and blocks as follows:

Listing of Columns

Census Tract

Block

Census tract abbreviation

Block

Notes

8008

201

080

201

3 pages, total of 55 DUs

8008

203

080

203

3 pages, total of 57 DUs

8008

204

080

204

4 pages (only the 4th page is included), 78 DUs etc.

A total of 20 DU's can be listed on a single listing page. The line numbers and page numbers serve as the ID numbers for DU's within a block. When more than one page is needed for a block, the line numbers on the second page correspond to numbers 21-40, etc., up to the total number of lines needed for the DU's on the block.

An accompanying sheet summarizes the number of DU's in each census tract and block that is to be included in the study. Data from this sheet have been entered in a file called IT1.DTA . A listing of the data follows.

1. Using the listing data, select a simple random sample without replacement of 500 dwelling units.

2. Using the block in a tract as a stratum, select a stratified simple random sample of size 500.

3. List the sample that you selected in #2, and write a set of instructions that the field workers can use to identify the sample dwelling units.



Last Update: 12/7/99
Comments: Ed Stanek
Email:
stanek@schoolph.umass.edu
\ed\web\be691f\webready\hw16.html