Navigation:

Updated on:
1/10/24

This syllabus is subject to change. The latest version on this website is the binding syllabus

Schedule

English 491DH

See the STRUCTURE OF THE COURSE.

 

WEEK 1

WED 1.26 Introduction to course and books:

  • Jodie Archer & Matthew Jockers, The Bestseller Code. St. Martins, 2016. Amazon. $10.
  • Janelle Shane, You Look Like a Thing and I Love You. Voracious, 2019. Amazon. $20.
  • Bird, Klein, Loper. The Natural Language Tool Kit (free here)
  • Allen B. Downey, Think Python (free here) (full text pdf here)
  • Sarah Boslaugh, Statistics in a Nutshell (free here)
  • Jurafsky & Martin, Speech and Language Processing (free here)

For extra help, try another introduction to Python: Tim Hall et al., Python 3 for Absolute Beginners (free here)


FRI 1.28 Introduction to Computers

What are computers? What are they good at? How do they process data?

  • Bits and bytes, memory and processors, input and output.
  • READ this handout for a start.
  • Aside: What do you think about Replika? About. Website.

 

 

WEEK 2

M 1.31 Setting Up:

We will use Google's Colab. It saves to your Google Drive. (Don't forget to sign in.)

If you want Python on your computer to use when you're offline:

  • WATCH Corey Shaffer, setting up Python for Mac and Windows.
  • If you have Windows 10, you can install Linux without compromising your OS. UMass recommends Ubuntu. Python comes standard with Linux.
  • You can also get PyCharm or Anaconda for free. Use your UMass id to get the professional version free!

 

WED 2.2 From flowchart to program

What are the basic commands for a computer?

  • READ Downey, Think Python, chapter 1 and Chapter 2.
  • In class: Machine language overview
  • In-class exercise: make flowcharts.

 

FRI 2.4 Metaphors of Computing

  • IN CLASS A window is a metaphor for a computer interface. Desktop is a metaphor. So are dragging, swiping, texting, and so on. They are useful until they inhibit our understanding. What other metaphors inhibit our understanding not only of machines, but of people around us? Before class, make a list of how you personally evaluate behavior, expression (verbal and non-verbal communication), attitude, and other people's values or interests. We are looking for data points, for metrics, and asking what kinds of measurement are possible.

 

WEEK 3

MON 2.7 Data

  • Browse the Preface and Chapter 1 of Statistics in a Nutshell. (Free here). What is data? What are its kinds? Also good is Khan Academy on statistics (linked on the right, below the calendar).
  • DATA handout
  • A decent, short video on Big Data

 

WED 2.9 HTML

We will discuss HTML and text data (raw and pdf).

 

FRI 2.11 Data structures & HTML

  • IN CLASS Structuring and cleaning your data is an essential first step. What kind of data do you extract from literary works? Historical works? Cultural critiques? Also consider the various methods you have mastered in order to read and analyze literature, history, politics, math, chemistry, and road signs. How do these methods produce structured data?

 

 

WEEK 4

MON 2.14 Functions:

We will look at functions and how to distribute your workload.

 

WED 2.16 Strings, Lists, and Dictionaries

  • READ Downey, Think Python, chapters 8 and 9.
  • In class: practice programming

We will look at data types and how to manipulate strings. Then we will write some practice programs by hand, and put them into the computer.

 

FRI 2.18 Strings, Lists, and Dictionaries

  • READ Downey, Think Python, chapters 10 and 11.
  • In class: practice programming
  • Assignment #1 due. LINK

 

 

WEEK 5

MON 2.21 No Class:

  • No Class

 

TUES 2.22 UMass "Monday"

  • Lists and dictionaries practice
  • Json files and platform-specific data, in-class

 

WED 2.23 Words, phrases, sentences, sounds:

  • What are sentences?
  • REVIEW Downey, Think Python, chapters 8 – 11.

Command Lines and BASH scripts:

  • Windows Users: here are directions for using the command line.
  • Windows Users: how to install and use python. You don't need Visual Studio, but if you want to use it, here is Corey Shaffer on how to install and set up VS.
  • Mac Users: Unix and the command line are already on your computer.

How is text structured into data? What is a letter, and how does it relate to a sound? What is a word? What is a string and how does it relate to phrases and sentences?

 

FRI 2.25 Python packages

Write a practice program by hand. If you want to learn how to make an interface like a window, read Downey, Think Python, chapter 4. If you want some pointers on strings, review Downey, Think Python, chapters 8 and 9.

 

 

WEEK 6

MON 2.28 NLTK:

What methods do humans use to process and understand texts? if you're looking in a text for information, what precisely are you looking for?

 

WED 3.2 NLTK

 

FRI 3.4 NLTK

  • Even More The Natural Language Tool Kit (free)
  • We will read in a text from Project Gutenberg and tokenize it. If you are preparing a midterm or final paper, consider using some of the texts you arre reading. We will discuss some of the questions one asks of a literary text, deriving those questions from particular schools of literary criticism.
  • NLTK Cheat Sheet
  • Assignment #2 due. LINK

 

 

WEEK 7

MON 3.7 Priors:

How does one research relationships between literary works, whether by the same author or different authors? What kinds of comparisons does one make, and what are the assumptions (priors) behind those comparisons? Consider our systematic errors in judgment as described by Tversky and Kahneman (READ)

 

WED 3.9 NLTK

  • In class: practice using NLTK
  • How to install NLTK on Windows (YouTube)
  • For our rhyming program, here is the CMU_dictionary in json format. Caution: 6 Megs!

To call it:


import json
def pronounce(word):
    word = word.upper()
    value2 = cmu[word]
    return value2

print(pronounce('fish'))

 

FRI 3.11 Priors, Fallacies, and Passionate Thinking

 

WEEK 8

MON 3.14 No Class

WED 3.16 No Class

FRI 3.18 No Class

 

WEEK 9

MON 3.21 Literary Analysis, an example:

  • READ Archer & Jockers, Bestseller Code. Chapters 1 and 2.
  • Here are the bestsellers this week: Publishers Weekly (click on BESTSELLERS)
  • Book Industry Fiction Codes

What can we measure in books of fiction and what do we learn? What surprised you about best-selling fiction? How does it differ from what is commonly considered literature?

 

WED 3.23 Literary Analysis:

  • READ Archer & Jockers, Bestseller Code. Chapters 3 and 4.
  • Suggestions for final project are here.
  • Don't forget the nltk page linked on the right!

 

FRI 3.25 Literary Analysis 2:

  • READ Archer & Jockers, Bestseller Code. Chapters 3 and 4.
  • Assignment #3 due. LINK
  • What is literary style? Why does it matter? How do we quantify it?
  • ASSIGNMENT: Decide 1) team or solo final project, 2) general topic for final project. Tell me.

 

 

WEEK 9

MON 3.28 Literary Style:

  • READ Archer & Jockers, Bestseller Code. Chapters 5 and 6.
  • NLTK code for reuse can be found at https://pythonspot.com/category/nltk/.
  • In class: hand-analyze poems by Frost, Dickinson, and e e cummings.
  • Suggestions for final project are here.

 

WED 3.30 To Quantify:

  • In class: Turning texts into numbers
  • WEBSITE for nltk.
  • READ Boslaugh, Statistics in a Nutshell, chapter 2.
  • WATCH YouTube video on Bayes Theorem.

How do we turn texts into numerical data that can be measured? We will explore 4 methods: linear regression, k-nearest-neighbor, k-means, and Naive Bayes.

 

WED 4.1 To Quantify 2:

  • In class: finish unfinished business

 

 

WEEK 10

MON 4.4 Sentiment Analysis:

 

WED 4.6 Sentiment Analysis 2:

Let's see what we can discover together about some texts from the web. We will try to quantify their features. Here are the Fall 2022 courses for English.

 

FRI 4.8 AI in its current state:

  • Putting words into a "friend" map: a word's closest friends, friends of friends, and so forth. Words are said to be embedded in a context. To work with word-embeddings, let's look briefly at word2vec. Here is a tutorial.
  • Assignment #4 due. LINK
  • Suggestions for final project are here.

Some articles for your consideration. Read one or all of them.

  • READ Coldwell, "What Happens when an AI Knows How You Feel?" Wired.
  • READ Faggella, "AI Applications for Lending and Loan Management," Emerj
  • READ Campell, China and Social Credit Scores, Time
  • READ Daly, AI and Consumer Prices, AI Business

 

WEEK 11

MON 4.11 Artificial Intelligence behind the curtain:

What are the conceptual shifts in AI 3.0? (Continued from Friday)

 

WED 4.13 SEMANTICS:

  • Read Danielle Shane, You Look Like a Thing (chapers 3, 4 and 5)
  • In class: make a Markov chain text generator
  • Check out Markovify
  • What does it mean to mean? What kind of intelligence does it take to discover meaning (semantics)? Consider pragmatics (body language, facial expression, and so on), conventional symbols (emoijis, an open hand, an arrow, and so on), and different kinds of signs (smoke means fire, etc.).

 

FRI 4.15Around the HFA:

  • Read Danielle Shane, You Look Like a Thing (chapter 6 and then as far as you can get)
  • How to Build your own classifier: NLTK Book, chapter 6
  • Discussion: what are the central issues in humanities departments? Here is a list of HFA. What kind of data would one need to answer relevant questions? PREPARE: Use a final project in one of your other classes, list a few issues and data points.

 

WEEK 12

MON 4.18 No Class

  • No class

 

WED 4.20 Around the HFA 2:

Can we answer major questions in the humanities with keyword analysis? What are some of the methods of humanist inquiry? What social, political, or economic effect does computerization have—in other words, how do the limits of computers change the questions we ask?

 

FRI 4.22 Multiple-language study:

How specific to a language are the questions we ask? Are there assumptions that we make about connections between concepts that derive solely from synonymy or connotation?

 

 

WEEK 13

MON 4.25 Graphic Design

 

WED 4.27 Interface Design

  • Apple Interface Guidelines: iOS overview
  • Explore CSS Grids ( check out the Golden Ratio )
  • Find a few websites you think are well designed and websites you don't like. Be prepared to share these with the class and to offer an explanation (not just "It sucks" or "It's cool"!)

 

FRI 4.29 Book Design

 

WEEK 14

MON 5.2 Around the Table

  • Describe your project briefly. 1) What question did you want to answer? 2) What was your data set? 3) What assumptions did you make about the data, for example the relation between features? 4) What was your method? 5) What were your results?

 

WED 5.4 Around the Table and Wrap-Up

  • Describe your project briefly. 1) What question did you want to answer? 2) What was your data set? 3) What assumptions did you make about the data, for example the relation between features? 4) What was your method? 5) What were your results?
  • Summary of course, any thoughts or ideas to share?

FINAL Your choice. Here are suggested topics. You can also make up your own topic. Please make sure to clear the topic with me first.

For a research paper, send me an 8- to 10-page pdf, not including data and program.

For a working Python program, send me a heavily-commented python script and a 3- to 5-page paper describing data, assumptions, methods, and conclusions.

Here is a template for your cover page.

 

 

WEEK 15

THURS 5.12 Project Due

  • FINAL PROJECT DUE. By email, CD-Rom, USB flash drive, Google Docs, or paper. Please include your data.

 

Key:

class N   holiday N
exam N   quiz N

January

Su M Tu W Th F Sa
          1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31          

February

Su M Tu W Th F Sa
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28          

March

Su M Tu W Th F Sa
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    

April

Su M Tu W Th F Sa
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

May

Su M Tu W Th F Sa
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          

LINKS.

Academic Schedule

Sites

COLAB at Google
Python.org
Python Packages.

Data Science in general

UMass Library
NLTK
NLTK Book

Topic Modelling (LDA)

Videos

VIDEO: Corey Shaffer Python
VIDEO: Socratica: Python
VIDEO: Khan Academy Statistics

Corpora

Oxford Text Archive
Project Gutenberg
Corpus of Western Lit

Kaggle Data Sets
US Government data
Mass Data sets
MA Attourney General Data
Boston Data
National Weather and more
US Census
UMass Amherst Data
Amherst MA Data
Books & Publishing Data

A million headlines (AUS)