The Data Analysis Plan

I.                    Check the data

II.                Summarize the data

A.    Descriptive Statistics

1)       Purpose

2)       Measures of Central Tendency (Mean, Median, Mode)

3)       Measures of Variability (Range, variance, standard deviation)

B.     Graphical Summaries

III.             Confirm what the data reveal: Inferential statistics

A.    Purpose

B.     Five Critical Terms

1)      null hypothesis

2)      alternative hypothesis

3)      Type I error (alpha)

4)      Type II error (beta)

5)      Power

C.     Three steps to a statistical decision

1)      Assume the null hypothesis

2)      ‘Calculate the probability of results as or more extreme than those obtained under the null hypothesis

3)      decide whether you are willing to accept this risk of error.  Decide to reject or fail to reject (retain) the null hypothesis

I.                   Checking the data

A.    Checking for outliers and errors-  Do the numbers make sense?  For example, an imported data file from your survey should have a column for each variable (one for responses to each Likert Scale item and then one for each demographic variable).  Each row represents the data from one participant:

 Item1 Item2 Item3 … Item n Sex Religious fervor Political affiliation 3 2 1 3 0 1 1 1 4 1 5 0 3 4 5 4 1 2 0 2 3 3 1 1 4 1 2 3 3 5 1 9 1 3 1 5 2 1 3 0 2 7 1 4 1 2 5 1 1

Looking at these data, I  would notice several peculiarities.  The columns with likert responses should never contain numbers outside the range of the scale.  For example, I would not expect to see the “9” under item n.  I would also be aware of the uniformity of the responses to item3.  This is not necessarily an error, but may indicate a problem in the particular item.  Within the demographic section, a “5” under sex leads me to suspect a coding error, as would a “7” under political affiliation.  These are all most likely errors on the part of the participant or in coding the data.

Another type of anomaly is called an “outlier”.  These are extreme values on the DV, but are generally not coding errors but rather true tendencies of the participant to give extreme scores on occasion.  The participant may not have followed instructions, may be distracted, or may truly be an “unusual” participant.  For example, suppose your DV were a reaction time measure.  You could have a participant who, overall, gives extremely fast (or slow) RT.  In the Hadden study, for example, there was one family choosen for the study that was extremely hard to contact.  Rather than a few days between observations, this family often had weeks or months between observations.  The data from this family were unusual and so she dropped that family from her analysis.

You can use a histogram or a stem and leaf plot to determine the occurrence of outliers.  This will also allow you to look for unusual response distributions, such as in item3 above.  You will see a “skewed” response distribution.

Whatever you do about your outliers or mistakes you must formalize your decision rule and report this in the research report.  Indicate how much data was omitted and why.  You should not drop data with out informing the reader.