LINGUIST 609: Quantitative analysis of linguistic data (Formal Foundations)

Who Brian Dillon

When 10:45AM-12:05 Mondays & Wednesdays: Live coding sessions.

Where Our Zoom room (pw required; request from Brian)

Course overview

The goal of this course is to provide an intensive introduction to the quantitative analysis of linguistic data. There are three main course objectives:

  • Learn to use the statistical programming environment R, and RStudio.
  • Introduce the fundamentals of the quantitative analysis of linguistic data.
  • Introduce the fundamentals of inferential statistics.

Across nearly all fields of linguistics, linguists use quantitative methods to summarize and communicate key features of their data. The goal of this course is to develop the basics of those quantitative tools. The tools we will develop in this class are intended to be general enough to be useful no matter what form your specific research takes, from corpus work to experimental psycholinguistics.

Our primary tool for doing this will be the R programming language, and RStudio. descriptive statistics that help you explore and understand the structure of a data set. We will also spend a good amount of time discussing inferential statistics, or formal techniques that allow you to draw inferences that go beyond a set of data you’ve collected.

Course structure

This course is an entirely remote class, structured as a ‘flipped’ classroom. This means that I will prepare lectures for you to watch, create lecture notes to accompany these short videos, and identify material for you to read each week. I will post videos and lecture notes by Thursday night each week. I will endeavor to keep videos very brief and focused: No more then 10 minutes a piece.

We have two meetings each week, beginning at 10:45AM Eastern Time. We will use these meeting times as ‘flipped’ course sessions, meaning that we will meet live during the class time to work on course assignments together. I will be present during these meetings to answer questions that you may have about the readings and the lectures, and discuss the course material with you. You will have one-two weekly workbooks that will comprise about 2-4 hours of work a week. The goal of these workbooks is to give you worked, hands-on assignments to practice and master the material.

Course materials

Our official textbook in this course is Danielle Navarro’s excellent Learning Statistics with R, which has an accompanying website that contains datasets, errata, and other helpful tidbits.

Otherwise, course videos and lecture notes will be linked below.

Course replication experiment

The capstone project in the course is a replication experiment. We will select one experiment to replicate in class.

Course requirements

This current situation isn’t normal for anyone, so I have designed this course so that it allows you to attend when you can, and go at your own pace, assuming that we are all struggling in various ways this semester.

There are two formal requirements of this course this semester:

  • That you complete the assigned R coding workbooks by by the end of the semester, 12/4/2020. You may skip two notebooks without any excuse, although I do ask that you let me know if you are choosing to skip a workbook. If you need to skip more than that, please let me know as soon as you can, and we can work together to find a satisfactory set of coursework to meet your current needs.
  • That you participate in developing, running, and writing up the in-class replication experiment.

R workbooks should be turned in to me via email.

Beyond these requirements, I do not expect you to attend every class session. Come when you can, but please feel free to work on your own, at your own pace, as you see fit. If you are having a hard time keeping up with the course, please get in touch with me as soon as possible so we can find a way to help you navigate this semester.

Course schedule

When Topic Workbook Reading
M 824 Introduction to RStudio Workbook 1 Navarro, Chapter 3
W 826 Basic descriptive statistics Workbook 2 Navarro, Chapter 5, 5.1-5.5
M 831 Correlation and standardization Workbook 3 Navarro, Chapter 5, 5.6 - 5.9
W 92 Basic probability theory: Binomial distribution Workbook 4 Navarro, Chapter 9
M 97 Samples, populations, standard error Workbook 5 Navarro, Chapter 10, 10-10.3
W 99 The 95% confidence interval Workbook 6 Navarro, Chapter 10, 10.3-10.6
M 914 95% CIs and hypothesis testing Workbook 7 Navarro, Chapter 11
W 916 Hypothesis testing Workbook 8 Navarro, Chapter 11
M 921 Hypothesis testing: t-tests Navarro, Chapter 13, 13.5 - 13.11
W 923 Experimental sample: Xiang et al (2019). Paper, Slides
M 928 Hypothesis testing: Two-sample t-tests Workbook 9 Navarro, Chapter 13
W 930 Hypothesis testing: Paired-sample t-tests and non-parametric tests Workbook 10; Turn in sample PCIbex experiment by this day Navarro, Chapter 13
M 105 Effect size, power and reliability Workbook 11 Navarro, Chapter 11, specifically 11.8
W 107 One-way ANOVA Mini Quiz Navarro, Chapter 14, 14.0 - 14.4
M 1012 One-way ANOVA & post-hoc comparisons Workbook 12 Navarro, Chapter 14, 14.5 - end
W 1014 Linear regression Workbook 13 Navarro, Ch 15, 15.1 - 15.5
M 1019 Getting Xiang et al running! No workbook; group coding session No reading
W 1021 More on linear regression Workbook 14 Navarro, Ch 15, 15.5 - end
M 1026 Brian away - work on workbook 14 Navarro, Chapter 16, 16.1 - 16.6
W 1028 Factorial designs Workbook 15 Navarro, Chapter 16, 16.1 - 16.6, Optional: Sprouse et al. (2012)
M 112 Logistic regression Workbook 16 Sonderegger et al (2018), Ch. 5
W 114 Multilevel linear regression Sonderegger et al (2018), Ch. 7.1 - 7.4
M 119 Multilevel linear regression Sonderegger et al (2018), Ch. 7.5 - 7.10
W 1111 Multilevel linear regression: Hands-on example Workbook 17 Sonderegger et al (2018), Ch. 8
M 1116 Multilevel logistic regression Sonderegger et al (2018), Ch. 8
W 1118 Experimental results presentation Workbook 18 is available in Box
Brian Dillon
Associate Professor

I am a psycholinguist who studies syntax, semantics, working memory, and sentence comprehension.