Sampling Exercises


Simple Random Sampling With Replacement of a Population

This assignment explores Samples and Populations, and the basic connection between the two resulting from probability sampling. We use computer selections of samples to understand the ideas. The exercises require running example SAS programs.

Introduction

Populations are rarely studied in their entirety. Instead, a sample of subjects is selected, and the results on the sample are used to estimate a parameter in the population. We describe how the results from a sample relate to the population by repeating the sampling many times, and comparing the estimates from the sample with the population parameter. The sample estimate is rarely equal to the population parameter. We consider two characteristics of sample estimates to describe how they relate to a population parameter.

We can draw many samples in the same manner using the computer. This exercise uses the computer to help understand what the distribution of estimates looks like when we sample repeatedly from a population.

This exercise develops ideas about:


Selecting Simple Random Samples with Replacement

We use a small example to illustrate ideas. Suppose a population of N=6 fragile diabetics in a clinic have been identified. We wish to know the average number of hypoglycemic episodes reported in the past month by the patients. We assume that the actual number of hypoglycemic episodes reported in the patient's record in the past month for the diabetics is the following: (ID, Value):

Subject ID

# Hypoglycemic Episodes

1

7

2

6

3

1

4

2

5

16

6

4

The first program will do the following:

1. Do the following:

Use the SAS output to answer the following questions.


Rather than reviewing all records, suppose we wish to save time by selecting a sample of patient records (with replacement) for n=3 patients and use the sample to estimate the number of hypoglycemic episodes.

The next program will:

 2. Do the following:

Answer the following questions:


We can  use the computer program to select many samples, and evaluate the results of the sample selections. Use use this technique to understand the properties of statistics calculated from samples. The properties we develop concern the average value of the statistic (expected value), and the variance of the statistic.

3. Do the following:

The next program will:


Answer the following questions:

In the program, the sample mean is calculated for each sample.

Histograms are included in Figures 2-6 for the percent frequency distribution of estimates (based on the sample) of the variance, the standard deviation, the minimum, the maximum, and the range. Inspect each of these distributions. Distributions that have a long tail to the right are called right skewed, while those with long tails to the left are called left skewed.

When the expected value of a statistic is equal to the value in the population, the statistic is called "unbiased".


Computer programs can generate many simple random samples effortlessly. We can use the programs to study the properties of the sample mean as a function of the number of subjects selected in the sample. You will study how the mean and variance of the sample mean varies with the sample size here.

The next program will:


4. Do the following:

Complete the following:

Modify the program a54p12.sas to select simple random samples without replacement of size n=4 by changing the line to:

%LET tsamp=4;

Rerun the program.

Use your results to summarize the relationship between the sample size and the mean and variance of the sample mean. Please do the following: