Parochial and Universal Semantics:

Semantic Typology and Little Studied Languages

Emmon Bach, SOAS, UMass(Amherst)
pre-ALT Workshop on
Language Documentation and Linguistic Typology
Paris 25 September 2007

Paris 25 September 2007

...the true difference between languages is not in what may or may not be expressed but in what must or must not be conveyed by the speakers.

Roman Jakobson, 1959


  1. Prelude: Quantifiers in the New Yorker?!
  2. What is linguistic typology?
  3. Little-studied languages and typology?
  4. Semantic typology?
  5. Exhibits from Some Languages
  6.       Haisla: Number and Deixis
          Kiowa Number
          Pirahã: typology of linguists
          English: gaps in language and culture
  7. Conclusions

  2. It is not often that issues of syntax and semantics make the headlines. A notable exception in the recent past was provided by the flurry of excitement about Dan Everett and his claims about the Amazonian language Pirahã. Probably the most prominent of these journalistic reports was in the April 16 2007 edition of The New Yorker. A bit of Googling will turn up a whole spate of reactions and spin-offs from this and related stories.

    The sometimes heated interchanges that followed Everett's article, starting with the commentaries by a number of researchers published with the paper in Current Anthropology and Everett's response (see References below for Everett 2005 and Nevins et al 2007) show that questions about Universal Grammar, the uniqueness of languages, Humboldt-Cassirer-Sapir-Whorfian ruminations, and the like find a vivid resonance among specialists and laypeople alike.

    In this presentation, I will take up a number of questions about semantics across languages, semantic typology, and semantic universals. First, however, a few general remarks.

    Everett makes a number of claims about Pirahã. Among them, these, which I will use to guide my discussion here (in part):

    • No quantification.
    • No counting
    • No number
    • No perfect tenses
    • No recursion
    • Cultural roots for the above and other characteristics
  3. What is linguistic typology?
  4. Languages do not differ randomly across the space of possible distinctions. There is a long history of attempts to correlate various properties of languages with each other. In modern times, the work of Joseph Greenberg has been of fundamental importance. An older typology was based on morphological characteristics of languages, using such terms as "analytic, synthetic - polysynthetic," and "isolating, fusional, inflectional, agglutinative." Sapir devoted major parts of his popular book Language (1921) to questions of linguistic form and content along the lines of this kind of typology.

    Sapir's schema is based on two kinds of criteria: the formal difference between independent roots and three kinds of operations on them, with varying degrees of fusion or internal modification, and the semantic differences among root, pure relational, and mixed relational elements of content or meaning. We will return to a classification inspired by Sapir's discussion below. But let it be noted here that Sapir recognized that languages displayed these characteristics in varying degrees so that the schema was more like a set of ideal types than a ready-made and absolute set of categories. We will note some of this below as well.

    Sapir's typology revisited.

    In Language (p. 138 in Harvest book edition), Sapir proposes this general scheme:
    I. Pure-relational Languages {A. Simple
    {B. Complex
    II. Mixed-relational Languages {C. Simple
    {D. Complex

    We can take "relational" here as referring to purely "abstract" grammatical functions.

    Sapir's schema, which is elaborated quite bit beyond this global summary, is based on formal as well as semantic criteria. The distinction between more or less free-standing items (word, stem, root) on the one side and processes of affixation, internal modification.

  5. What can little-studied languages tell us about typology?
  6. Before going into the special topics of my talk, I would like to make a few remarks about the general enterprise of working on local, often endangered languages in the context of investigating language typology, or indeed any other topic of general theoretical interest.

    Does the investigation of such languages have a special role or importance for typology? Many would answer Yes. But why? Here are some things to think about.

    Just for extending the empirical base, any language will do, its importance will be a function of the given mass of knowledge that we have about languages, and of the extent to which the little studied language exhibits new hitherto unknown properties. So being endangered plays no special role here, exept in the sense that the language may be lost without making its possibly unique contribution to the fund of knowledge we need for making hypotheses about Language.

    Sapir's advice

    There is hardly a classificatory peculiarity which does not receive a wealth of illumination from American Indian languages. It is safe to say that no sound general treatment of language is possible without constant recourse to these materials. Edward Sapir, Collected Works, V: 145 (from an Encyclopedia Brittanica article [14th edition, 1929, Vol. 5, 138-141]:

    Excursus: reasons for studying and documenting endangered languages are not just scientific.

  7. Universals and universals
  8. Semantic typology?
  9. There seem to be two common attitudes toward questions about semantics for different languages:

    1. The basic semantic building blocks for different languages are obviously the same
    2. The basic semantic building blocks for different languages are obviously different

    One of these views seems to be based on an impulse to attribute a universal conceptual space to humans and the belief that semantics must connect up to this universal conceptatorium.

    The other view is fed by Whorfian and semi-Whorfian beliefs about the special space of meanings that go with different languages and cultures. It is also quite in line with the judgments of multilingual speakers that various words "just don't translate" from one language to another.

    In several recent papers I have tried to address the seeming contradiction between these two stances by an appeal to a difference between structure and texture, which is analogous to the differences we attribute to those between grammar and style (Bach [2003], 2004).

    Obviously, discussions like this cannot be carried on sensibly if they don't take into account the widest variety of languages. And in the last decade the number of encounters between "little-studied" languages and model-theoretic semantics has been increasing.

    Therefore, it would seem to be analytic that studying little studied language cannot help but contribute to linguistic typology, indeed, to linguisic theory period.

  10. Semantics and semantics
  11. Basic aspects of model-theoretic semantics: one aspect of meaning is provided by setting up a model structure to be understood as the space of denotations that are associated with expressions of the language. Take these English sentences for a start:

    1. Sam ate an orange
    2. Horses are mammals.
    3. I am hungry.

    To interpret sentences like these we need something like these ingredients, at least: a set of individuals for expressions like Tom, subsets of the set of individuals such as the sets of oranges, horses, mammals, things that are hungry. We need something like worlds with respect to which we evaluate the truth of sentences. Importantly, for examples like (3) and for the temporal side of these sentences, access to the context of evaluation, so that we can understand I as the speaker and the truth of (3) as evaluated with the speaker as whoever says the sentence and the time as whatever time is the evaluating context. So if I say (3) right now here, then the sentence denotes the True in this world and this time just in case Emmon is hungry at time 9:35 (say) in Paris (say). There is much more to be said about these sentences, but I won't say it yet.

    1. model structures
    2. One part of studying natural language semantics is to set up a model structure which provides the basic building blocks for an interpretation. Setting aside the context box for now, we might start with this more or less standard setup, call it M1 :

        M1 =
      1. BOOL= {1,0 }: truth values (the True, the False)
      2. E: set of individuals
      3. S: set of worlds or situations
      4. F: set of all functions built out of i - iii
      In addition to the sets just enumerated for M1, the model structure needs to specify some inherent relations among some of the elements, for example, relations of accessibility, inclusion and precedence among situations in S. I want to take these up separately below.

      Note: the model structure of Montague's most wellknown paper on natural language (1973: PTQ) differs from M1 in that our S here is split into two: a set I of worlds and a set of times J with an antisymmetric ordering on the set of times.

      This much of a model structure is surely common to the semantics of any language, since it is little more than a schema for spelling out possible denotations.

    3. Time
    4. Semantics and Grammar
    5. Semantics and Lexicon

  12. Three languages plus a bit of some others
    1. Haisla: Number
    2. About Haisla: a Northern Wakashan language (along with Heiltsuk, Ooweky'ala and a number of languages lumped together as Kwakw'ala -- Franz Boas's Kwakiutl). Haisla is spoken in Kitamaat Village in northern British Columbia in the far west of Canada.

      Some Haisla examples (e = ə):

      1. begʷánem
      2. person, people
      3. bíbegʷanem
      4. people
      5. t̓íxʷa
      6. black bear(s)
      7. t̓ít̓exʷa
      8. black bears
      9. ketá
      10. shoot
      11. kiketá
      12. shoot "plural"
      13. ketátlnugʷa
      14. I am going to shoot.
      15. ketátlnis / kiketátlnis
      16. we (inclusive) are going to shoot
      17. kiketátlnugʷa.
      18. I am going to shoot repeatedly / several times
      19. kiketátlnugʷaʼi.
      20. I am going to shoot them.
      21. kiketá begʷánemax̄i t̓íxʷix̄i.
      22. The man/men shot/repeatedly the bear/bears (repeatedly).
      Note: there are a number of different reduplicative and ablaut forms, a number of which are used for marking plurality etc.

      Like many North American languages (and in fact probably languages around the world), plurality is an optional category in Haisla and other Wakashan languages. As one might expect not every word has a definite plural form and the common hierarchy applies: nounss near the high end of an animacy scale tend to have plurals; near the low end, they don't.

      Here's how I would model a system like this: a plain noun like t̓íxʷa denote sets from *[[t̓íxʷ]]a the big domain covering atoms and sets all the way up.

      You can tell a straighforward Gricean story about the interpretations of plain and plural forms in such a language, as opposed to a language in which plurality is enforced.

      Obligatory choice: The interlocutor had to make a choice so I will interpret the choice as telling me something important!

      Optional choice: There may be no particular reason for making or not making this choice, so I can't really conclude anything. If its important context will probably tell me.

      Number enters into Haisla and a number of neighboring languages in a different way at the level of lexical choice. Some predicates are specialized as to shape or other characteristics of the subject or object, among them number. So for example Haisla hená means `to be located (somewhere): of a long cylindrical object.' Coast Tsimshian baa `run singular' k̓oł `run plural.'

    3. Kiowa Number
    4. The Tanoan languages offer a fascinating system of number in the nominal and pronominal systems. You can find a short introduction in Mithun, 1999: 81-82 (data Jemez). Watkins (1984) gives an extended description of Kiowa (not to be confused with Kiowa Apache, an Athapaskan language). I will cite Kiowa here from Laurel Watkins' work (Watkins 1984, see also now Harbour 2003, 2007).

      The most notable feature of the system is inverse number. Nouns are divided into four classes. There are three number categories: singular, dual, plural. According to its class membership, each noun has basic or inherent number or numbers.

      1. singular/dual inverse: plural primarily animate
      2. dual/plural inverse: singular
      3. dual inverse: singular/plural
      4. (nouns in this class do not use the inverse suffix
      (I omit discussion of the Class IV nouns.) As you can see, the inverse picks the complement meaning with respect to plurality. Disambiguation of the choices (e.g. dual/plural) comes about by combinatorics with number marking in pronominal affixes on verbs and other elements.


      1. tógúl `young man, two young men'
      2. tógúlgɔ̀ `(more than two) young men'
      But in combination with an Intransitive Prefix èͅ- on a predoicate, the first word must be interpreted as dual. This prefix marks intransitives as predicates over pairs of entities.

      1. gú: ribs (dual/plural)
      2. gú:gò rib
      In combination with an Intransitive Prefix -èͅ, the first word must be interpreted as dual, the second as singular, while to get the plural the intransitive prefix gyà- is required. You are invited to consult the sources for more details of this complex system.

      Now a question: is there any way to assign a uniform semantic value to the inverse suffix?

      Here's a try. Let's adopt the kind of structured domain proposed by Link and others. For any common noun we have the set of all groups formed from the atoms or basic atoms of the domain. For languages like the Tanoan languages, for any noun N we have D(N) = the union of the singletons (or atoms) I(N), the pairs II(N), and the pluralities III+(N). The denotation of an uninflected Class I noun is just the union of the singletons and the pairs, for Class II just the union of the pairs and the plurals, for Class III the pairs. Now the inverse can be interpreted as an operation that takes the denotation of the bare noun and delivers the complement of that denotation within the whole domain of the common noun. This treatment requires that we have available a denotation that is not directly associated with any of the various forms of the noun itself. The effect of the various inflections on other elements that disambiguate the expressions then can be achieved by intersecting the denotation of the nominal expression with cardinality sets, as in the example above.

    5. Pirahã disputes
    6. I focus here not on Pirahã itself, but rather on the controversies that have attended the publication of Daniel Everett's paper. Thus this section may be thought of as a contribution to the sociology or anthropology of linguistics.

    7. English: Gaps in Language and Culture
    8. English of course is not a little studied language, but recent research has shown that it has a number of astonishing gaps. We focus here on kinship terminology. Most speakers of English are limited to the following terms:

      1. father, mother, son, daughter, child, parent, grandfather, grandmother, cousin, uncle, aunt, nephew, niece, husband, wife, grand[child/son/daughter], spouse, partner
      2. [first four items in (19) + in-law]
      3. step + [some items in (19)]
      4. Recursion is limited to three small portions of this vocabulary:

      5. great*grand[father/mother]
      6. ex*[wife/husband/spouse]
      7. Some speakers use the items step- recursively as well.

        Gaps: there is no primary way to distinguish sexes in items like cousin nor to distinguish paternal and maternal versions of such items as grandfather, nor is there any standard way to distinguish first, second, ... wives/husbands/spouses/partners.

        Some older speakers can use further terminology, such as first cousin once-removed but there is much confusion about these items among most speakers.

        It has been conjectured that the severe gaps in kinship terminology noted here result from cultural constraints such as this one:

        Don't remember more than one former spouse/partner!

        It is possible, of course, to use paraphrases to fill some of these gaps:

      8. My mother's father... my maternal grandfather
      9. But most speakers reject attempts to push such expressions beyond a limit of 2:

      10. ??John's mother's father's spouse's daughter's son...
      11. Repeated attempts to teach adults the more elaborate older system have failed. In fact, attempts to teach children have failed as well, as they quickly tire of the task and turn to listening to their inevitable MP3 players. We do not believe that this situation represents a cognitive deficit, but is no doubt purely cultural in genesis.

        Another astonishing gap in English occurs in its impoverished deictic system. Phrases like his pencil are completely unspecified as to whether the person referred to by his as well as the possessum in question is near the speaker or the hearer or in some other place, real and visible or invisible and perhaps non-existent, or whether either person or pencil was here recently and is now gone (visible or invisible) (cf. Boas, 1947, and Bach 2006).

    9. Conclusions and Outlook
  13. References
