Linear Programming and Harmonic Grammar: A Linguistic Application

Harmonic Grammar is a sub-field of linguistics that seeks to explain the system of constraints leading to languages’ disparate preferences for certain forms. For example, the sound produced by the combination of letters “ng” cannot appear at the beginning of a word in English, but does appear at the beginning of words in other languages, particularly those of the Bantu family. From such contrasts in phonological preferences linguists deduce that speakers of English and speakers of Bantu apply different systems of constraints when determining what constitutes a valid word in their language.

Harmonic Grammar approaches this genre of question through a mathematical framework that encodes each linguistic preference as a constraint. Given an input structure I, the candidate output structures form a set A_I, called the candidate set. We optimize over a set of constraints c_1, c_2, \ldots, c_n and a weighting vector W = w_1, w_2, \ldots, w_n. A candidate linguistic form’s ‘score’ is given by the sum of the weights of the constraints it violates, that is,

H_{C_n, W_n}(A) = w_1(c_1(A)) + w_2(c_2(A)) + \ldots + w_n(c_n(A))

Thus, a candidate pair A = <I, O> is considered optimal if:

H_{C_n, W_n}(A) \ge H_{C_n, W_n}(A') \enspace \forall \enspace A' \in A_I

Unlike in the typical linear programming framework, however, Harmonic Grammar does not seek to identify the optimal form; the optimal form is already known by observation. Instead, Harmonic Grammar seeks to define the weights W_n that produce the linguistic forms known to be optimal. For example, in English, most sentences follow a subject-verb-object structure. This structure is violated in interrogative clauses like “what did he give you?,” where “what,” the grammatical object of the clause, precedes “you,” the grammatical subject. Examples of this nature suggest an implicit ranking of constraints, in this case that introducing interrogative phrases with a question word pre-empts the conventional word ordering.

Such phenomena both pose the central conundrum in Harmonic Grammar — the existence of an unknown ranking of linguistic constraints — and allow us to define the constraints on the optimal weights. Call our constraints C_1 and C_2 and our candidate forms A_1 and A_2. We can tabulate each candidate’s constraint violations as follows:

Input C1 C2
A1 0 a
A2 b 0

Suppose we know by observation that A_1 is the preferred form. Then, since each violation subtracts points from the candidate’s score, we can write the constraint of our LP as:

-a\cdot W(C_2) \ge -b\cdot W(C_1)
-a\cdot W(C_2) + b\cdot W(C_1) \ge 0

Combining such comparisons for all observed candidate sets for a given input produces a set of linear constraints with coefficients encoded in a matrix C. The LP formulation then becomes clear:

min \sum_{i=1}^n w_i \mbox{ such that}

C\mathbf{w} \ge 0

\mathbf{w} \ge 0

To apply stricter inequalities, that is, to ensure that an observed optimum outranks all other candidates by a fixed amount z, we merely reformulate the linear constraints as C\mathbf{w} \ge z, \mathbf{w} \ge z. Once presented in this format, the LP can be solved by the Simplex Method or any other linear programming algorithm.

This application is intriguing not only because it evinces the scope of linear programming through its extension to conventionally qualitative fields but also because it reveals a new slant on the procedure’s objective. In linguistics, the optimal grammatical or phonological structure is known; it’s the most preferred, that is, the most frequently observed, of the candidate structures. In fact, the system of linear constraints hinges on the availability of information about the relative acceptability of each candidate in the candidate set, striking a sharp contrast with the formulation of most LPs. Harmonic Grammarians pose a different question: what rankings produce the observed data? LPs give linguists insight into the unconscious processes we apply whenever we use language.

References:

Potts, Christopher, Joe Pater, Karen Jesney, Rajesh Bhatt, and Michael Becker, 2010. “Harmonic Grammar with linear programming: from linear systems to linguistic typology.” Phonology 27: 77-117.

Pater, Joe, Rajesh Bhatt, and Christopher Potts, 2007.  “Linguistic Optimization.”
http://cogprints.org/5691/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s