# Day 1: Harmonic Grammar vs. Optimality Theory

Today:

• A brief history of the genesis of Optimality Theory (ranked constraints) and Harmonic Grammar (weighted constraints)

• An in-depth look at how ranked and weighted constaints do, and do not differ, in the types of pattern they can produce.

## History

Generative linguistics – Chomsky (1957):

Neural networks – Rosenblatt (1957):

The two traditions clashed in the "great past debate" in the 1980s. Prince and Smolensky were on opposite sides of that debate, but began to collaborate shortly after meeting in 1988.

The offered a 1991 Linguistic Institute course entitled “Connectionism and Harmony Theory in Linguistics”. Shortly thereafter, they abandoned weighted constraints in favor of the ranked constraints of Optimality Theory (OT; Prince and Smolensky 1993/2004).

This became a very successful framework for generative linguistics; at this point there are nearly 1300 papers in the Rutgers Optimality Archive.

My work in this area began when I started looking critically at the arguments for OT over the weighted constraints of Harmonic Grammar (HG: Legendre, Miyata and Smolensky 1990, Smolensky and Legendre 2006). This culminated in a 2009 Cognitive Science paper, an extended version of which was the first reading for this course.

We'll start by reminding ourselves in what ways ranked constraints can be seen as an improvement over principles and parameters theory (Chomsky 1981).

## HG revisited

A weighted constraint tableau from the reading – violations are negative integers, Harmony score (weighted sum) is shown to the right, and the optimum, shown with an arrow, has the highest Harmony / lowest penalty.

Discussion points:

• Weighting conditions: which weights will make [bat] optimal? Which weights will make [bad] optimal? What about the other candidates?
• What would happen if we let weights go negative?

We can express the weighting conditions as comparative vectors (like OT's Elementary Ranking Conditions). The sum product of the weights and the vector must be greater than zero.

A gang effect in Japanese loanword devoicing.

Discussion points:

• Statement of weighting conditions.
• Why this pattern yields an inconsistent set of ranking conditions in OT.
• Variation in outcome in (9).

## OT-Help and more OT-HG differences

With bigger problems, it can be difficult to find a set of correct weights by hand. It can be even more difficult to determine if no weights can make the desired winners optimal (consistency check), and to determine the set of languages produced by a given set of constraints.

One great advantage of weighted constraints is that they permit us to use existing computational methods developed for neural networks and other models. For example, as we will see soon, Rosenblatt's Percepton convergence procedure can be used for HG learning.

Because the weighting conditions are a set of linear inequalities, we can use Linear Programming's Simplex Algorithm to determine if a correct set of weights exists, and if so, to provide one. This approach is presented in the Potts et al. reading, and is implemented in Staubs et al.'s OT-Help.

Demonstration 1:

1. Load the file for Japanese into OT-Help
2. Click "HG Solution"
3. Click "OT Solution"

The typology is calculated by checking which sets of candidates across tableaux can be made jointly optimal. This is also done for OT using Tesar's Recursive Constraint Demotion, as in Hayes et al.'s OT-Soft.

Typology demonstration 1:

1. Click "Calculate typology (HG & OT)"
2. How can we describe the languages generated by both OT and HG?

Typology demonstration 2:

1. Load the weight-to-stress + alignment file.
2. Calculate the typology. What are the languages generated by both OT and HG? Only HG?

This case exemplifies another kind of gang effect, where multiple violations of a single lower weighted constraint gang up. A simple abstract case:

Con 1 Con 2
A -1
B -1
Con 1 Con 2
C -1
D -2

Discussion point: What is the typology predicted by OT for this case? By HG? That is, which of the following sets can be made jointly optimal? What are the ranking / weighting conditions for each one?

{A,C}, {A,D}, {B,C}, {B,D}

We will likely return to the large stress windows later in the course (see sec. 4 "Unbounded trade-offs and locality" of Pater 2016).

Exercise: Construct OT-Help tableaux for the following constraints, inputs, and candidates. This will build on our discussion of the analysis of allophony yesterday, as well as provide another example of OT-HG similarities and differences.

• Constraints:

Assign a violation mark to every alveopalatal fricative

*si
Assign a violation mark to every alveolar fricative followed by a high front vowel

Ident-place
Assign a violation to corresponding segments that differ in place of articulation.

• Inputs

/sa/, /si/, /ʃa/, /ʃi/

• Candidates

Forms with both [s] and [ʃ], e.g. for Input /sa/ the candidates will be [sa] and [ʃa].

Update: Here is a file that meets the OT-Help requirements and is constructed as described above.

In an unpublished ms., Carroll (2012: UCSD) points out that the HG-only pattern is attested in Gujarati, citing this paper:

Pandit, P. 1954. “Indo-Aryan sibilants in Gujarati”. Indian Linguistics 14, pp. 503-511.

## Why not weight?

So why did Prince and Smolensly abandon weighted constraints? My take:

• A combination of genuine concerns about overgeneration, combined with some overestimates of their power.

Both of these are shown in the passage from their 1997 Science paper, discussed on p. 8 of Pater (2016).

HG is less powerful than P&S (1997) claim because it requires "asymmetric tradeoffs" to produce a cumulative interaction. Compare the symmetric tradeoff in (10) to the asymmetric trade-off in the Japanese loanword case.

OT with local constraint conjunction is an alternative theory of cumulative constraint interaction which does not require asymmetric tradeoffs. There are many patterns that it generates that seem very problematic – see p. 9 ff. of Pater (2016).

An example of a genuine concern for HG – see p. 25 of Pater (2016).

As discussed there, one way to limit some aspects of HG overgeneration is to move to a serial version of HG (see also Shakuntala Mahanta's dissertation). We'll be looking at serial HG in some depth.

## Where are we now?

It is particularly difficult to decide between HG and OT because HG can use a different constraint set to analyze attested phenomena. When we use different constraint sets, HG is not necessarily more powerful.

An extended case study is found on p. 15 ff. of Pater (2016).

Choosing between OT and HG is also difficult because the relative power of the two theories of constraint interaction is affected by the choice of a serial vs. parallel model of candidate generation.

## Another OT-HG comparison

In Potts et al. (2010), there is an extended discussion of Lango ATR harmony, which draws on the greater power of HG to use a smaller constraint set. See p. 92 ff.

Demonstration / discussion:

• Load the Lango OT-Help file. Can we eliminate some of the winner-loser pairs and still have the necessary weighting conditions? That is, are some of these redundant?

Exercise: Find the smallest set of winner-loser pairs that will yield the same grammar when run in OT-Help. That is, the minimal set of weighting conditions.

Note: an excellent extension of OT-Help would be to have the ability to do this automatically.

For an excellent overview of approaches to Vowel and Consonant Harmony in OT, see:

Rose, Sharon, and Rachel Walker. 2011. Harmony systems. In The handbook of phonological theory, ed. John Goldsmith, Jason Riggle, and Alan C. L. Yu, 2nd ed., 240–290. Malden, Massachusetts: Blackwell.

For an extension of Potts et al.'s general approach to Lango in Serial HG, including a critique of some of its problems, see:

Mullin, Kevin. 2011. Strength in harmony systems: Trigger and directional asymmetries. Unpublished manuscript, University of Massachusetts Amherst. [Available on the Rutgers Optimality Archive, ROA-1134.]