Impetus toward this theory has come also from a quite different quarter: Hale, Jeanne, and Platero (1977) and others have suggested that not all languages are phrase-structure languages. It is my hope to develop a theory in which phrase-structure grammars are seen as a special case of a more general type of grammar, one which allows only a very small and restricted set of transformation-like operations. In the present paper, I will concentrate on the problem of "discontinuous constituents." ^{[3]}
[2] I take it as a necessary part of our job to provide a theory of syntax and semantics (and phonology) for natural languages. It's worth noting that the primary motivation behind Ajdukiewicz's construction, in so far as I can see it, was to give a theory with a clear and consistent relation between syntax and semantics.
In Section 1, I will review briefly the basic ideas of categorial grammar; in
Section 2, I will discuss the problem of discontinuous constituents; in
Section 3 we will look at a generalization of categorial grammars, and in
Section 4 I will present a sample grammar for a small part of English syntax.
1. Categorial grammars. The basic idea of categorial grammar is this:
think of a phrase as composed of a seequence of words, each assigned to a
category which acts either as an argument or as a function taking arguments of
a certain category to make a certain resultant category. A sequence of words,
each assigned to a category, is a phrase of category a, if it is
possible to find a way of assigning functions to arguments repeatedly in such
a way as to end up with just the category a by exhausting the elements
of the string. The system has obvious analogies to ordinary multiplication and
my first example is couched in these terms.
We first define an infinite set of categories (CAT) as follows:
Elements of CAT may be used as indices on sets of expressions. Let's use the categories for the sets of expressions consisting of the categories themselves.
- 2, 3, 5 ∈ CAT
- If α ∈, and β ∈ CAT, then (α|β) ∈ CAT
- (iii) Nothing else is in CAT
B_{A} = {A}
(B_{A} denotes the set of basic, or lexical, elements of category A.) So:
B_{2} = {2}(I will usually suppress outermost parentheses.)
B_{2|3} = {2|3}, etc.
We can now define the language induced by the set of categories as the union of the sets of phrases in each category A, P_{A}, where we can make explicit the ideas sketched above as follows:
[3]
We can specify a semantics for this "language" quite simply by letting our domain of individuals be the positive rational numbers and interpreting each basic expression as the number ordinarily denoted by it, "2" denotes two, (2|3) denotes two-thirds, etc., and letting concatenation be interpreted as multiplication.Thus, for example, (2|3) ∈ P_{2|3}, 3 ∈ P_{3} so (2|3)3 and 3(2|3) ∈ P_{2}
- If α ∈ B_{A}, then α ∈ P_{A}
- If α ∈ P_{A|B}, β ∈ P_{B}, then αβ and βα ∈ P_{A}
Let's now consider a linguistically more interesting example. Let 2 stand for the category of sentences (with no basic members). Members of other categories might be:
B_{3} = {John, Mary, Bill}Then the language induced by the set of categories and lists of basic expression includes these sorts of phrases:
B_{2|3} = {walks, runs, talks}
B_{(2|3)|3} = {sees, kisses, loves}
B_{(2|3)|(2|3)} = {slowly, enjoyably}
P_{2}: John walks, Mary slowly walks, walks Bill, sees John Mary,...Some expressions are excluded ^{[3a]}:
P_{2|3}: sees Mary, walks, Mary kisses enjoyably,...
This follows from the fact that functions must be immediately contiguous to their arguments.*slowly John Mary sees slowly John Mary sees -------_{(2|3)|(2|3)} --------------_{2} John Mary sees ----_{3} ---------_{2|3} Mary sees ----_{3} ----_{(2|3)|3}
[4] The kind of language we have with a system of categories of the above type is what I have called an "M-language" (M for "mobile"): all and only permutations of siblings are allowed. (Every M-language is obviously context-free but not all CF languages are M-languages, e.g. {x | x = a^{n}b^{n}} is not, as Stanley Peters pointed out to me, cf. Bach, 1975.)
We can introduce ordering restrictions into our system by adopting a notation introduced by Bar-Hillel (1953) and Lambek (1961). Let a/b (b\a) index the set of expressions "looking for" arguments of category b to their "right"("left") to make expressions of category a. To approximate English we can replace the above categories by these:
B_{3\2}, B_{(3\2)/3}, B_{(2|3)|(2|3)}
The system now generates these well-formed members of P_{2} (= S):
John enjoyably kisses Mary.
Mary walks slowly.
The system is obviously still context-free (now completely equivalent to CF
grammars in weak generative capacity). [Exercise 1: make a set of categories
for the language {x|x = a^{n}b^{n}.] But we are still unable
to do discontinuous constituents.
2. Discontinuous constituents. If all natural languages could be
described by plain vanilla two-way categorial grammars of the sort we've
looked at, we'd be home free. But natural languages notoriously display, or
appear to display, discontinuous constructions. Some examples from English
are these:
1. (a) John looked up the number. (b) John looked the number up.
2. (a) I persuaded to leave all the men in the room. (b) I persuaded him to leave.
3. (a) A package from India is here. (b) A package is here from India.
A more extreme case is provided by languages which allow a lot of freedom in ordering. Here linguists have typically set up grammars which specify a fixed base order and then appealed to operations of scrambling (but see Hale, Jeanne, and Platero, 1977; Lapointe, 1980). Hale (cf. above reference and Hale, 1979) has claimed that Walbiri allows for permutations of all words around a fixed, second-place element in each clause. And there are many examples of free constituent order in relatively fixed-order languages like English.
Imagine a language in which any order of constituents is possible, i.e. for any sentence all "scrambles" or words (or morphemes) is also a sentence. We could represent functions that are looking for arguments "anywhere" as a||b. [Exercise 2: show that the free "scramble" of a context-free language is not necessarily context-free.] Similarly, we could introduce the notation a//b (b\\a) for functions looking for arguments indefinitely far to their "right" ("left").
The introduction of such categories entails that we will not be able to construct phrase-markers for all possible permutations. For example, consider a sentence that is disallowed for "contiguous English" but allowed in "scrambled English." We can analyze the sentence after the fashion of a Montague analysis tree, but the analysis can't be mapped onto a phrase-marker:
(This is only one of two possible analyses; in the other John is the subject.)4. sees slowly John Mary ---(2|3)||3 ---(2|3)||(2|3) --3 --3 sees slowly John Mary slowly sees...John sees John
Of course, no language has such complete freedom of order; typically, certain
constituents act as "boundary domains" for scrambling. (I conjecture that
such bounding as the effect of keep them all "almost context-free" (if not
context-free) i.e [6] that such grammars with bounding on recursive nodes do
not generate the full class of context-sensitive languages.) We can achieve
this result by stating general conditions, e.g. that S's be coherent.
3. Generalized Categorial Grammars. Let me sum up what I've suggested
as a generalization of the syntactic categories of categorial grammar in the
form of a recursive definition schema (for universal grammar):
I assume that this basic schema will be supplemented by a system of features, so that we can include e.g. a/_{1}b to index a function that requires an argument immediately to its right with the additional stipulation that the argument have the case value 1 (e.g. accusative, more on this in Bach ms [1982]).(i) a, b ∈ CAT (a. b primitive categories) (ii) If a, b ∈ CAT, so are (a) a/b (b) b\a (c) a|b (d) a//b (e) b\\a (f) a||b (iii) Nothing else is in CAT.
So much for syntax. The more difficult question is the semantics. For a first step. I'll assume (along the lines of Montague's PTQ) that there is a uniform mapping from categories to types and that every function argument rule (cancellation rule) carries with it automatically a rule of interperetation which says that the interpretation of the resultant category is the result of applying the interpretation of the function expression to the intension of the interpretation of the argument expression.
I've also not said very precisely how we stipulate how to analyze (or
construct) sentences with these "long-distance" functions. I consider it
still and open problem to try to give a general characterization of such
syntactic operations. In the next section I'll show how to accomplish a
syntactic/semantic for several particular cases in English. The basic idea is
this: we let functions expressions of category a//b act syntactically
as expressions of category a, and supply a free variable of the type of
intensions of the type corresponding to b in the translation. This
variable remains free until we find an expression of category b, then
we bind the variable with a lambda and let it loose. [7]
4. Some English Constructions. I now want to show how a generalized
categorial grammar can be used to capture certain facts about English. I will
exhibit a grammar which includes persuade and promise and
so-called heavy NP-shifts.
The primitive categories (I ignore further analysis) are S (sentence),
T (term-phrase, NP [=DP], CN (common noun), IV (tenseless
VP's), ĪV̅ (to VP's). The basic categories we need are
these, with some typical members.
B_{CN} = {woman, fish, elephant,...}
B_{T} = {Bill, Mary, John,...}
B_{T/CN} = {every, the, a,...}
B_{T\S} = {walks, runs,...}
B_{(T\S)/T} = {hits, sees, loves,...}
B_{(T\S/ĪV̅} = {tries,...}
B_{((T\S/T))//ĪV̅} = {persuades,...}
B_{((T\S)/ĪV̅)/T} = { ,...}
B_{(T\S)\(T\S)} = {slowly, today,...}
B_{ĪV̅/IV} = {to,...}
We need the following Rule Schemata (from Universal Grammar). The rules may be
read as follows: A:= B C: If α ∈ B, β ∈ C, then
α β ∈ A, and if α translates as α' (i.e. 1') and
β as β' (2') then α β translates as (0'=) whatever follows =
under the rule. I ignore completely, for simplicity, conditions on feature
values for constituents, except for two; VST ("variable store") is a function
which tells us what free variables have been put into the translation and
remain to be bound on at appropriate points, while [-A] is a feature telling
us that there is "missing" phrase of category A (comparable to Gazdar's slash
categories, Gazdar, forthcoming [1982]). Note further that the rules are
schemata, and they (as well as the list of categories^{[4]}) must be further specified for particular languages.
Here are two examples of derivations, one involving heavy NP-shift (using an instantiation of Rule Schema 3) and one using Rule Schema 2. (This corresponds to sub-operation Right-wrap of Bach, 1979, misleadingly called a subfunction there.
0. P_{A} := B_{A} 0'= 1' 1a. P_{A} := P_{A/B} P_{B} 1b. P_{A} := P_{B} P_{B\A} P_{A} := P_{A//B} P_{B} P_{A} := P_{B} P_{B\\A} 0' = 1'(^2') 0' = 2'(^1') [8] 2. P_{A/C} := P_{(A/B)//C} P_{B} 0' = λX_{i, <s,C>}[1'(X_{i, <s,C>})(^2')] (RIGHT-WRAP) 3. P_{A} := P_{A/B} [-B] 0' = 1'(X_{i,<s,B>) } Add X_{i,<s,B>} to VST(0) "variable store" 4. P_{A} := P_{A} [-B] 0' = λX_{i,<s,B>[1'](^2')} Remove X_{i,<s,B> from VST(0)}
Mary sees today the elephant --- ---- ---- --------- T (T\S)/T (T\S)\(T\S) T ---- T\S VST:℘_{3} ---------------- T\S T\S VST:℘_{3} [-T] ------------------------------------------- T\S (VST:empty) -----------------------T\S ---------------------------------------------------------S Translation: λ℘_{3}[today'(see'(℘_{3}))](^the(^elephant))[^Mary') ==> see'(the_elephant')(Mary')
Mary persuaded Bill to go ---- --------- ----- -- -- T ((T\S)/T)//ĪV̅ T ĪV̅/IV IV ---------------- -------- (T\S)/ĪV̅ ĪV̅ ---------------------------------- T\S ----------------------------------------------- S Translation: λP_{4}[persuade'(P_{4})(^Bill')](^go')(^Mary') ==> persuade'(^go')(^Bill')(^Mary') [9]
Finally, let' consider phrases like give Bill a fish (not in our fragment). I've argued (Bach, 1980) that give (in this usage) belongs to the category B_{(IV/T)//T}. The standard derivation follows exactly that of persuade Bill to go.
give Bill ∈ P_{IV/T} (by R2)and is translated as λ℘[give'(℘)(^Bill')], so by 1a we have
give Bill a fish ∈Note that the inner argument of give, in this usage, is the thing given and that Bill is the direct object of the TVP give a fish.
λ℘[give'(℘)(^Bill')](^a_fish')
==> give'(^a_fish)(^Bill')
This same standard derivation will assign to the phrase give a fish Bill an interpretation that reduces to this: give'(^Bill)(^a_fish') and means the same as give Bill to a fish.
What remains to be explained is why the option of R3 and R4 is not available to yield "heavy NP-shift" interpretations of phrases like give a fish the elephant (that is heavy). Note that R1a allows this derivation˙
give ∈ P_{(IV/T)//T} give'(Compare above: this is identical with the the result of the standard derivation of give Bill a fish.) We don't want to block the option of letting double slash categories take immediately following arguments ("anywhere to the right" must include "immediately following"). I propose that what is crucial here is the identity of the inner and outer arguments, so we can amend the rule schema (second option) as follows:
give a fish ∈ P_{IV/T} give'(^a_fish')
give a fish the elephant ∈ P_{IV} give'(^a_fish')(^the_elephant')
P_{A} := P_{A//B} P_{B} unless A = (X/B) for some X ^{[5]}, ^{[6]}.
B_{S/S'} = {a}
B_{S/b} = {b}
B_{S\S'} = {b}
B_{b} = {b}
Exercise 2: Let L = {X|X = (abc)^{n}. L is CF (in fact regular). But L'' = Scramble(L) is not CF. For let L' = {X|X = a^{n}b^{m}c^{k} then L'' ∩ L' = {X|X = a^{n}b^{n}c^{n}} is not CF, but since the intersection of a CF language and a regular language is CF, L'' (= Scramble(L)) can't be CF.