NLTK Introduction. Part 2: Python

This section shows you some python examples. For more information and direction on python, log into the UMass Library and search for "introduction" and "python." We subscribe to several databases where you can find up-to-date instructional manuals to read online.

 

1. Python.

Python is one of the world's most popular programming languages. Here is its main page. You will find a beginner's guide there. Another is here. My favorite YouTube tutorials are by Corey Schaffer and Socratica. Remember to use Python 3. Anaconda 3.x comes with python 3. (Python 2 comes standard on Macintosh computers, but has different syntax than Python 3.)

Python is a combination of a programming language (like C or Fortran) and a functional scripting language (like bash or AppleScript). Imagine python as a building contractor who calls electricians, plumbers, HVAC specialists, and so forth to come to a job site and do the physcial work. Likewise, Python calls up blocks of code from the internet to come into your program and do the heavy lifting. You only have to learn the phone numbers!

Let's dive in. Traditionally, the first program for any programmer is called "Hello, World!" Your new python 3 jupyter notebook should look something like this:

Type the following into the textbox found after "In [1]:" or copy and paste the text from this page:

print("Hello, World!")

Then, click the run button, which looks like a play button on an old CD player. Voilá.

Your input ("In") was an instruction to the compiler which instantly translated your instruction into machine code. The output was written to your screen inside the Anaconda environment. You can also write to a file, to an email, to a website, to a printer, and so on.

 

2. Functions and variables

Print is a function, also commonly called a method. Other functions (and methods) allow you to count letters, reverse text, capitalize text, search for patterns, import e-books, get the contents of a website, compare texts, and much much more. You can also write your own functions.

 

Besides functions (which are the verbs of python), there are variables (the nouns of python). A variable is an empty box, like a mailbox. You can put letters into it, numbers, pictures, lists, whatever you like. And like a mailbox, it has an address (its address is a physical localtion in computer memory).

To put something into your variable, you use an equal sign (=). In python, the equal sign does not mean equal. It means assign. So the following assigns the value five to a random variable we will call x:

x = 5

That says, Assign five to the variable x. We can put anything into the variable. Here, we put some text into a variable. Enclose the text in double or single quotes.

x = "The quick brown fox jumped over the lazy dogs"

Then, you can print the variable:

print(x)

The result is the text inside the quotation marks. So, you send a variable to a function. You send x to the print function. Or, the print function takes the variable x. The function called print takes your variable, x, which holds the address of your mailbox. Print then finds the mailbox and displays its contents on the screen. In this case, the contents of the mailbox at x is a sentence about lazy dogs.

Any time you use a string (that is, text), you can do stuff to it. The syntax for that is the name of your variable, followed by a dot, followed by the method (methods are distinguished from variables by putting two parentheses after them). No spaces. It looks like this

x.upper()

In this case, we read backwards: apply the method upper() to the data found in variable x. Stings can be sent to dozens of methods. Search Google for "python string methods" to see.

 

Try it: How? In one of the input boxes in Anaconda assign a text to a variable called x, just as above. Then click play (or type SHIFT-RETURN). You will get nothing as output because all you did was put text into a variable—there's nothing to show. In the next input box, type x.upper() and click play. Your result should look like this:

Out [2]: 'THE QUICK BROWN FOX JUMPED OVER THE LAZY DOGS'

Here is a list of string methods. Some do things, some test things (like testing if the string is in lower case). Try some!

 

Jargon: Texts are a class of objects. In python they are called strings. They can be as short as one letter or as long as Gravity's Rainbow. Functions that operate within classes are called methods. So, the string class contains string objects and string methods that you can use to manipulate strings.
    You can also put texts into lists. You might want a list of all the sentences in a book (perhaps to compare their length), or a list of all words in a sentence (perhaps to get their grammatical function). Python comes with dozens of list methods. And you can put texts into pairs. For example, you might want to have "name" and "Jane" paired up for a database. These pairs are called dictionaries, and python comes with dictionary methods.

 

Why am I doing this?

Once you can manipulate strings, you can write a complex search engine for any text. With NLTK, you can access millions of books, documents, and webpages. With python, you can search them for whatever you want.

 

NEXT