# Linear Programming and Harmonic Grammar: A Linguistic Application

Harmonic Grammar is a sub-field of linguistics that seeks to explain the system of constraints leading to languages’ disparate preferences for certain forms. For example, the sound produced by the combination of letters “ng” cannot appear at the beginning of a word in English, but does appear at the beginning of words in other languages, particularly those of the Bantu family. From such contrasts in phonological preferences linguists deduce that speakers of English and speakers of Bantu apply different systems of constraints when determining what constitutes a valid word in their language.

Harmonic Grammar approaches this genre of question through a mathematical framework that encodes each linguistic preference as a constraint. Given an input structure $I$, the candidate output structures form a set $A_I$, called the candidate set. We optimize over a set of constraints $c_1, c_2, \ldots, c_n$ and a weighting vector $W = w_1, w_2, \ldots, w_n$. A candidate linguistic form’s ‘score’ is given by the sum of the weights of the constraints it violates, that is,

$H_{C_n, W_n}(A) = w_1(c_1(A)) + w_2(c_2(A)) + \ldots + w_n(c_n(A))$

Thus, a candidate pair $A = $ is considered optimal if:

$H_{C_n, W_n}(A) \ge H_{C_n, W_n}(A') \enspace \forall \enspace A' \in A_I$

Unlike in the typical linear programming framework, however, Harmonic Grammar does not seek to identify the optimal form; the optimal form is already known by observation. Instead, Harmonic Grammar seeks to define the weights $W_n$ that produce the linguistic forms known to be optimal. For example, in English, most sentences follow a subject-verb-object structure. This structure is violated in interrogative clauses like “what did he give you?,” where “what,” the grammatical object of the clause, precedes “you,” the grammatical subject. Examples of this nature suggest an implicit ranking of constraints, in this case that introducing interrogative phrases with a question word pre-empts the conventional word ordering.

Such phenomena both pose the central conundrum in Harmonic Grammar — the existence of an unknown ranking of linguistic constraints — and allow us to define the constraints on the optimal weights. Call our constraints $C_1$ and $C_2$ and our candidate forms $A_1$ and $A_2$. We can tabulate each candidate’s constraint violations as follows:

 Input C1 C2 A1 0 a A2 b 0

Suppose we know by observation that $A_1$ is the preferred form. Then, since each violation subtracts points from the candidate’s score, we can write the constraint of our LP as:

$-a\cdot W(C_2) \ge -b\cdot W(C_1)$
$-a\cdot W(C_2) + b\cdot W(C_1) \ge 0$

Combining such comparisons for all observed candidate sets for a given input produces a set of linear constraints with coefficients encoded in a matrix $C$. The LP formulation then becomes clear:

$min \sum_{i=1}^n w_i \mbox{ such that}$

$C\mathbf{w} \ge 0$

$\mathbf{w} \ge 0$

To apply stricter inequalities, that is, to ensure that an observed optimum outranks all other candidates by a fixed amount $z$, we merely reformulate the linear constraints as $C\mathbf{w} \ge z, \mathbf{w} \ge z$. Once presented in this format, the LP can be solved by the Simplex Method or any other linear programming algorithm.

This application is intriguing not only because it evinces the scope of linear programming through its extension to conventionally qualitative fields but also because it reveals a new slant on the procedure’s objective. In linguistics, the optimal grammatical or phonological structure is known; it’s the most preferred, that is, the most frequently observed, of the candidate structures. In fact, the system of linear constraints hinges on the availability of information about the relative acceptability of each candidate in the candidate set, striking a sharp contrast with the formulation of most LPs. Harmonic Grammarians pose a different question: what rankings produce the observed data? LPs give linguists insight into the unconscious processes we apply whenever we use language.

References:

Potts, Christopher, Joe Pater, Karen Jesney, Rajesh Bhatt, and Michael Becker, 2010. “Harmonic Grammar with linear programming: from linear systems to linguistic typology.” Phonology 27: 77-117.

Pater, Joe, Rajesh Bhatt, and Christopher Potts, 2007.  “Linguistic Optimization.”
http://cogprints.org/5691/