The Data Analysis Plan
I. Check the data
II. Summarize the data
A. Descriptive Statistics
1) Purpose
2) Measures of Central Tendency (Mean, Median, Mode)
3) Measures of Variability (Range, variance, standard deviation)
B. Graphical Summaries
III. Confirm what the data reveal: Inferential statistics
A. Purpose
B. Five Critical Terms
1) null hypothesis
2) alternative hypothesis
3) Type I error (alpha)
4) Type II error (beta)
5) Power
C. Three steps to a statistical decision
1) Assume the null hypothesis
2) ‘Calculate the probability of results as or more extreme than those obtained under the null hypothesis
3) decide whether you are willing to accept this risk of error. Decide to reject or fail to reject (retain) the null hypothesis
I. Checking the data
A. Checking for outliers and errors- Do the numbers make sense? For example, an imported data file from your survey should have a column for each variable (one for responses to each Likert Scale item and then one for each demographic variable). Each row represents the data from one participant:
|
Item1 |
Item2 |
Item3 |
… |
Item n |
Sex |
Religious fervor |
Political affiliation |
|
3 |
2 |
1 |
|
3 |
0 |
1 |
1 |
|
1 |
4 |
1 |
|
5 |
0 |
3 |
4 |
|
5 |
4 |
1 |
|
2 |
0 |
2 |
3 |
|
3 |
1 |
1 |
|
4 |
1 |
2 |
3 |
|
3 |
5 |
1 |
|
9 |
1 |
3 |
1 |
|
5 |
2 |
1 |
|
3 |
0 |
2 |
7 |
|
1 |
4 |
1 |
|
2 |
5 |
1 |
1 |
|
|
|
|
|
|
|
|
|
Looking at these data, I would notice several peculiarities. The
columns with likert responses should never contain
numbers outside the range of the scale. For
example, I would not expect to see the “9” under item n. I would also be aware of the uniformity of
the responses to item3. This is not
necessarily an error, but may indicate a problem in the particular item. Within the demographic section, a “5” under
sex leads me to suspect a coding error, as would a “7” under political
affiliation. These are all most likely
errors on the part of the participant or in coding the data.
Another type of
anomaly is called an “outlier”. These
are extreme values on the DV, but are generally not coding errors but rather
true tendencies of the participant to give extreme scores on occasion. The participant may not have followed
instructions, may be distracted, or may truly be an “unusual” participant. For example, suppose your DV were a reaction
time measure. You could have a
participant who, overall, gives extremely fast (or slow) RT. In the Hadden
study, for example, there was one family choosen for
the study that was extremely hard to contact.
Rather than a few days between observations, this family often had weeks
or months between observations. The data
from this family were unusual and so she dropped that family from her analysis.
You can use a
histogram or a stem and leaf plot to determine the occurrence of outliers. This will also allow you to look for unusual
response distributions, such as in item3 above.
You will see a “skewed” response distribution.
Whatever you do
about your outliers or mistakes you must formalize your decision rule and
report this in the research report.
Indicate how much data was omitted and why. You should not drop data with
out informing the reader.