The most basic dimension that organizes speech sounds has to do with the
presence of some sort of constriction in the mouth. Some sounds are made
with the mouth fully open in a way that allows air to flow freely out. The
vowel sounds we produce in the middle of words like keep, cop and
coop are like this. Notice that if you leave off the consonants at the
beginnings and ends of these words you can sing or sustain the vowel sounds
by themselves for as long as you have enough breath to continue. Other sounds,
however, cannot be sustained at all. The k sound at the beginning
of these three words is not sustainable (notice it is the same sound
despite its being represented in writing by k in one word and by
c in the other two). You also cannot sustain p and t
sounds. This fact is illustrated in Figure 1, which shows a recording of
the word apple. Notice that there is a silent interval in the middle
that coincides with articulation of the (single) p sound in the middle
of this word.
Thus, the most basic way to organize speech sounds is to separate them
into two groups according to whether or not they involve significant constriction
of the vocal tract. Vowels are those sounds that have little or no constriction,
while consonants are all those that involve some degree of constriction,
from total to moderate.
If you consider some of the other consonant sounds, you will quickly see that constriction is a matter of degree, not either/or. Producing the s in aspire involves much more constriction of the vocal tract than is found in vowels, but less than occurs in sounds like p, t, and k. Even though the tongue mostly blocks the flow of air while producing an s sound, still the sound can be sustained, which shows that the blockage is not total. Sounds such as the r in raw involve still less constriction, even though this degree of constriction is still greater than that for vowels.
It is not enough, however, just to be able to distinguish consonants from
vowels; we must also be able to capture the differences in articulation
from one consonant to another and differences from vowel to vowel.
Within the two categories weíve established so far, consonants and vowels, the differences among sounds are described in quite different terms. Among consonants we will rely on three major dimensions or parameters: 1) where the constriction is made in the mouth (e.g., at the lips, against the roof of the mouth, etc.), 2) how much the flow of air is constricted (e.g., a complete blockage of the flow of air, as in p, or only a partial blockage, as in s), and 3) whether or not the sound involves ìvoicingî.
Since there is little or no constriction for vowels, and vowels are (almost) always voiced, the dimensions that are useful for consonants will be of little help. We will see below that most differences among vowels can be specified in terms of tongue position and degree of lip rounding.
However, before we review a system for describing differences among consonants and vowels, we need to attend to two other matters.
Unfortunately, the standard spellings of English words are often rather
indirect and inconsistent indicators of the typical sounds of those words.
For example, the ph in phonetic denotes the same sound as
the f in foam and the gh in enough. The ti
in national is associated with the same sound as the sh in
show. The letter e may indicate no sound at all, as in indicate,
and the letter a in this same word may represent the same sound as
the ei in freight (which happens to rhyme with bait)
or the very different vowel in cat. In fact, the playwright George
Bernard Shaw once noticed that there is a very common English word in which
o denotes the same vowel sound as appears in bit, which led
him to suggest that a possible spelling of fish in English might
These issues obviously arise in English, but there are similar problems that bear on writing systems generally. The most important difficulty is simply that the different symbols are often used to represent the same sound across different languages and there are also many cases where the same symbol represents two or more different sounds across different languages. There are also many languages that have no official writing system at all.
In order for scholars and researchers to study and compare sounds in the worldís languages it is essential that there be a single standardized way of representing sounds. The most widely used system is called the International Phonetic Alphabet. We will use this system, the IPA, in this course, though we will make some minor amendments that are commonly used to make the system more convenient for representing English sounds.
By far the most important requirement for the IPA is that there is a one-for-one relationship between sounds and symbols. Thus, if we were to decide that the symbol for the vowel sound in the name Kate were to be , then we would use this same symbol wherever a word used that sound. Following this strategy, we might write the words Kate, freight, and bait as Kt, frt and bt. (We drop the e from Kate because there is no such thing as a ìsilentî symbol when we are insisting on a one-for-one relation between sounds and symbols. No sound, no symbol.)
For the purposes of a student trying to learn the IPA, it might have been better if the creators of the system had gone on to make up a completely original set of symbols for all the sounds used in the worldís languages. Then when you learned a symbol you could associate that symbol with the one and only sound it was meant to represent. The dark side of this plan, however, would be that anyone learning the IPA would have to learn to recognize and write dozens of new symbols.
Instead, the IPA uses a somewhat eclectic mixture of pre-existing symbols, including many common ìlettersî from the Roman and Greek alphabets, special variations on these, plus various diacritical and accent marks that are used in various languages or that were made up specially. This invites confusion. Some of the IPA symbols have essentially the same uses in the IPA that they do in ordinary English orthography (ORTHOGRAPHY = system of writing), but some familiar symbols from the English alphabet are used very differently in the IPA. This is especially true for symbols used to represent vowels. While the IPA uses the symbol i, it stands for the vowel sound that appears in the word peek, not for any of the various sounds it is usually associated with in English.
The human vocal tract, within which speech sounds are produced, is made
up of a number of structures in the head and neck, extending from the lips
and nostrils down to the larynx at the top of the trachea. A cross-section
of the vocal tract (at the mid-line of the head) is shown in Figure 2. Though
all of the structures in the human vocal tract also appear in the vocal
tracts of chimpanzees, other apes, and monkeys, the overall layout and arrangement
of these structures, especially at the back of the throat, is strikingly
different in humans than it is other primates. These differences appear
to be related to the uniquely human capacity for speech.
The structures that are used to form speech sounds (principally the tongue, teeth and lips) are called articulators. Some of the more important structures in the vocal tract are described below.
The larynx (or voice box) is made mostly of cartilage and sits at the top of the trachea (the ìwind pipeî that connects the nose and mouth with the lungs). The larynx provides a rigid framework within which two bands of muscle, the vocal folds (sometimes called ìvocal chordsî) are stretched across the top of the airway to the lungs. When fully tensed and drawn together, the vocal folds can effectively block the flow of air out of the lungs (or provide a last ditch barrier against food or water that threatens to get into the lungs). In a somewhat more relaxed state, the vocal folds vibrate as air from the lungs is forced between them. This process is characteristic of the production of vowel sounds in all the worldís languages. The vocal folds can be positioned in a variety of ways that are used to produce different vowel qualities in various languages and sometimes are also used in forming consonant sounds. The vocal folds are drawn fully apart when breathing, especially during heavy exertion. The human larynx, however, can only open to about half the cross-sectional area of the trachea and so always somewhat resists the flow of air into and out of the lungs.
The tongue, as indicated above, plays a decisive role in forming the constrictions for many consonants and in distinguishing vowels. The tongue is, by far, the most mobile and flexible structure in the vocal tract. It is able to assume a wide variety of complex three-dimensional shapes and to touch all the other structures in the mouth from the lips to the back wall of the pharynx. In forming many consonant sounds the tongue plays a key role in making the constriction in the vocal tract that characterizes the consonant. Differences in vowel quality are determined largely by shapes the tongue assumes without significantly constricting the vocal tract.
The pharynx is the open space at the back of the throat that runs from the back of the nasal cavity down to the larynx. A crucial distinguishing feature of this cavity in humans is that the front wall of the oral pharynx (below the velum) is formed by the back (or root) of the tongue. Mostly because of the flexibility of the tongue this means that the shape and size of the pharynx can vary greatly.
The velum is the back part of the soft palate, the fleshy part of the roof of your mouth that you can feel with your tongue or finger about half to two-thirds of the way back from your teeth. The velum is a moveable structure that, when pressed up and back, closes the airway from the mouth into the nasal cavity.
The epiglottis is the small structure that projects backward into the airway
just above the larynx and vocal folds. It helps to keep food and water out
of the larynx. The human epiglottis cannot touch the velum, but in other
mammals the epiglottis and larynx can make a tight closure with the opening
into the nasal cavity. This makes it possible for them to drink and breath
at the same time because water (or food) can pass around the larynx into
the esophagus without risk of getting into the airway. Adult humans cannot
match this feat, though infants can.
Consonants can be differentiated in any language by reference to three parameters; place of articulation, manner of articulation, and voicing. Other parameters will also be relevant in some languages. We will apply this principle here to the description and differentiation of English consonants.
The place of articulation for a consonant is the point in the vocal tract where the constriction for that consonant is formed. For each of the places of articulation listed below, consider what other consonants there might be (other than those used as examples below) that use the same place of articulation.
A bilabial place of articulation is used for the first sound in words like pin and bin. Notice that in saying these words you begin by bringing your lips together.
Words like fin begin with a labiodental articulation in which the upper teeth contact or approach the lower lip.
Dental articulations are those like the first consonant in thin that involve the tongue touching or approaching the back of the teeth.
The front of the tongue touches or approaches the alveolar ridge in forming consonants such at those at the beginning of tin and den.
Notice that the first sounds in chump and jump also involve the front of the tongue touching the roof of the mouth, but a bit further back than with the alveolar examples above. This more back point of contact is the (hard) palate. Though most palatal sounds use the front of the tongue, there is one in English that uses the back of the tongue; this is the first sound in yet.
In the first sounds in cow and gout, the back of the tongue rises high enough to touch the velum, making a closure there.
Sometimes the vocal folds are drawn close enough together to produce a slight hissing or whispering sound. This is called a glottal place of articulation and occurs in the first sound of words like how and who in English.
Obviously, there must be some further way to differentiate consonants because in English there are two or more consonants that are produced at each of the places of articulation described above (except for glottals). The next basic distinction has to do with how much the flow of air is constricted in the vocal tract. Tack and sack both begin with alveolar sounds, but they are not identical. What distinguishes them is the extent to which a constriction is made at the alveolar ridge in these two cases.
Tack begins with what is known as a stop consonant. Stop consonants are those where there is a momentary complete closure of the vocal tract. Notice that while making the first sound in tack you cannot hum or breathe. If you were to start to say tack very slowly and a little loudly (as though you were trying very hard to be clear in a noisy environment), and you then were to freeze at the moment when the tongue touches the alveolar ridge, your vocal tract would be completely closed, with no air able to enter or leave through your mouth or nose. You canít hum through stop consonants because humming requires moving air through the vocal folds, which you canít do when the vocal tract is completely blocked higher up. Such a complete blockage is characteristic of consonants that have the stop manner of articulation. The constriction that characterizes the consonant is made by briefly completely stopping the flow of air. In normal fast speech, however, this interruption of the flow of air can be extremely brief, sometimes only a few milliseconds (thousandths of a second).
Another way to interrupt the flow of air out of the mouth occurs in the first sound in sack. Here the tongue approaches the alveolar ridge, but allows a small channel to form between the tongue and the roof of the mouth. Air rushing through this small channel becomes very turbulent and produces the hissing sound that is characteristic of this sort of consonant. Notice that the first sound in sack can be sustained. You can take a deep breath and make the s in ssssssssack last as long as your air holds out.
Affricates combine the stop and fricative manners of articulation into a single new type. In words like chat the first sound begins with a palatal stop, but then very quickly moves into a fricative at the same point of articulation.
The first sound in Macintosh is a nasal, a sound where the flow of air is blocked in the mouth but allowed to flow freely through the nasal cavity. Nasals involve an articulation inside the oral cavity that corresponds to some stop. Thus, the first sounds in Mack and back are both stop consonants in so far as the activity of the lips is concerned (closing off the airstream altogether). However, youíll notice that you can hum through the first sound in Mack, but not the first sound in back. The reason for this is that we produce nasals by lowering the velum to allow air to pass from the pharynx into the nasal cavity and out the nose.
Liquids are somewhat vowel-like articulations that allow quite free passage of air around an obstruction. The air may flow freely around the sides of the tongue, as in the first sound in lake, or it may flow over a curled back tongue, as in the first sound in rake.
The first sounds in we and yes are called glides, which are the most vowel-like of the consonants. In these sounds the air flow is quite free. Notice that the first sound in we is very similar to the first sound in oops, and the first sound in yes is quite similar to the first sound in eat.
Overlaid on top of the two dimensions of place of articulation and manner
of articulation there is a third dimension, that of voicing. As weíll
see, there are pairs of consonants that have the same place and manner of
articulation, but different voicing properties.
If you were to watch a slow motion video of someone saying sap and zap it would be difficult to impossible to tell which was which without the sound because the motions in the mouth for these two consonants are identical. Nevertheless, you can not only hear but also feel a difference between these two. To make the difference clear, place your fingers on your adamís apple and produce a long hissing sound that alternates between being an s or z sound, like this: ssssszzzzzssssszzzzzssssszzzzzssssszzzzz. You should feel a slight buzzing sensation in your fingers on the z sounds (but not on the s sounds). The source of this buzzing sensation is vibration of the vocal folds. During the z articulation, the vocal folds are drawn close together and air is forced between them, which causes them to vibrate. During the s articulations, the folds are held apart and air flows freely through the glottis (the opening between the vocal folds). Thus, we say that zap begins with a voiced consonant while sap begins with a voiceless consonant.
This contrast is used widely in English. In each of the following pairs there are two consonants of the same place and manner of articulation that are distinguished in terms of voicing: pat and bat, fat and vat, thin and then, and cot and got.
Vowels are voiced and vowel articulations involve little constriction of
the vocal tract. Thus, vowels are distinguished by way of different timbres
or qualities in the sound that are produced by giving the inside of the
mouth different shapes. You may have noticed that if you speak or sing into
a large barrel or a length of large-diameter pipe, your voice suddenly sounds
very different. In fact, it will sound noticeably different in different
diameters and lengths of pipe. The vocal tract takes advantage of the same
acoustical principles that produce these differences to produce the acoustical
qualities of different vowels.
This is achieved largely by shifting the tongue into different postures. By raising the tongue high into the forward part of the mouth (and enlarging the spaces at the back of the mouth) we produce the vowel quality in words like bee and key. By pulling the tongue down and somewhat back toward the back wall of the pharynx we produce sounds like the vowels in cot and pot. The vowels in loot and coot are produced by raising the back of the tongue toward the velum, but not getting it close enough to produce any constriction or noise.
These differences in tongue posture can be described in terms of two parameters, those of tongue height and backness. Thus, the vowel in key is a high front vowel, the first one in father is a low back vowel and the one in coot is a high back vowel.
Another important factor in the differentiation of vowels is lip rounding. The vowels in keep and coop are different in two respects. The first is high front and the second high back, but you will notice that if you switch back and forth between these vowels, you will purse your lips somewhat on the vowel in coop, but not on the vowel in keep. This drawing together of the lips is called rounding and it plays a role in a number of back vowels in English.
The various ways of distinguishing consonants and vowels that we have discussed above are used in the two charts shown below. These charts illustrate how the consonants and vowels of English can be distinguished by reference to the several parameters we have discussed above. An IPA symbol is given for each sound in each table, along with a common English word that uses the sound.