# CS计算机代考程序代写 Haskell 8. Tree grammars

8. Tree grammars

1 Review: Stringsets and string grammars

The kind of thing we’ve done with strings many times now follows this pattern:

(1) a. Identify an alphabet of symbols; call it Σ.

b. This determines a certain set of strings over this alphabet; usually written Σ∗. c. Identify some subset of Σ∗ as the stringset of interest; call this L, so L ⊆ Σ∗. d. Ask what (string) grammar(s) can generate exactly that set of strings L.

Remember that step (1b) involves an important recursive definition:

(2) For any set Σ, we define Σ∗ as the smallest set such that: • ε∈Σ∗,and

• ifx∈Σandu∈Σ∗ then(x:u)∈Σ∗.

So if Σ = {a, b}, then Σ∗ contains things like a:(a:(b:ε)), which we abbreviate as ‘aab’.

Then, in step (1c), we identify some stringsets that we might be interested in:

(3) a. L1 = {w | w ∈ Σ∗ and every ‘a’ is immediately followed by a ‘b’}

b. L2 ={w|w∈Σ∗ andw̸=εandthefirstandlastsymbolsofwarethesame}

c. L3 ={w|w∈Σ∗ andthenumberofoccurrencesof‘a’inwiseven}

d. L4 = {w | w ∈ Σ∗ and w contains an ‘a’ that is followed (not necessarily immediately) by a ‘b’} e. L5 ={anbn |n∈Nandn≥0}

f. L6 ={w+wR |w∈Σ∗} (wherewR isthereverseofthestringw) g. L7 ={w+w|w∈Σ∗}

And for each such stringset L, we can ask (step (1d)) what kinds of grammars can generate exactly L.

2 Generalizing: Treesets and tree grammars

Things will follow an analogous pattern here:

(4) a. Identify an alphabet of symbols; call it Σ.

b. This determines a certain set of trees over this alphabet; usually written TΣ. c. Identify some subset of TΣ as the treeset of interest; call this L, so L ⊆ TΣ. d. Ask what (tree) grammar(s) can generate exactly that set of trees L.

Ling185A, Winter 2021 — Tim Hunter, UCLA

2.1 The set of trees over an alphabet

(5) For any set Σ, we define TΣ as the smallest set such that:

• ifx∈Σ,thenx[]∈TΣ,and

• if x ∈ Σ and t1,t2,…,tk ∈ TΣ, then x[t1,t2,…,tk] ∈ TΣ.

Those square brackets in this definition are analogous to the colon in the definition of Σ∗. The colon makes strings out of symbols, and the square brackets make trees out of symbols. (These pieces of punctuation correspond to constructors in Haskell.)

So for example, if Σ = {a, b, c}, then the set TΣ looks something like this:

(6) TΣ = a[] , b[] , c[] , a[a[]] , …, a[b[],b[],c[]] , …, b[c[a[]],a[b[],b[]]] , …

But just as we allow ourselves to write a:(a:(b:ε)) more conveniently as ‘aab’, we allow ourselves to write b[c[a[]], a[b[], b[]]] more conveniently as:

(7)

b

ca

abb

Also it’s sometimes convenient to leave off empty pairs of brackets, so instead of b[c[a[]],a[b[],b[]]] we sometimes write b[c[a], a[b, b]].

One more definition is useful:

(8) For any set Σ and any natural number n, we define TΣn as the set of all trees in TΣ in which every

node has at most n daughters.

So the tree in (7), for example, is a member of TΣ2 and is also a member of TΣ3, but is not a member of TΣ1.

The largest number of daughters of any node in a tree is sometimes called the tree’s branching degree. So TΣn is the set of all trees in TΣ with branching degree less than or equal to n. The branching degree of the tree in (7) is 2.

2.2 Subsets of TΣ (“treesets”)

Using the alphabet Σ = {a, b}, here are some treesets we might be interested in:

(9)

2.3

(10)

a. L1 = {t ∈ TΣ2 | the number of occurrences of ‘a’ in t is even}

b. L2 = {t ∈ TΣ2 | every ‘b’ in t dominates a binary-branching ‘a’}

c. L3 = {t ∈ TΣ2 | t contains a binary-branching ‘a’ whose left daughter subtree contains an ‘a’

and whose right daughter subtree contains a ‘b’} d. L4 = {t ∈ TΣ2 | t contains equal numbers of occurrences of ‘a’ and ‘b’}

One kind of tree grammar

A (bottom-up) finite-state tree automaton (FSTA) is a four-tuple (Q, Σ, F, ∆) where:

• Q is a finite set of states;

• Σ, the alphabet, is a finite set of symbols;

• F ⊆ Q is the set of ending states; and

• ∆⊆Q∗ ×Σ×Qisthesetoftransitions,whichmustbefinite.

Ling185A, Winter 2021 — Tim Hunter, UCLA

For any FSTA G = (Q, Σ, F, ∆), underG is a function from TΣ × Q to booleans: (11) underG(x[])(q) = ∆([], x, q)

underG(x[t1,…,tk])(q)= ··· ∆([q1,…,qk],x,q)∧underG(t1)(q1)∧···∧underG(tk)(qk) q1 ∈Q qk ∈Q

And L(G) is a subset of TΣ:

(12) t ∈ L(G) ⇐⇒ underG(t)(q) ∧ F (q)

q∈Q

As usual, slot in your favourite semiring as desired!

2.4 Examples

2.4.1 Even/odd

The FSTA G1 in (13) generates the treeset L1 from (9) above (requiring an even number of ‘a’s).

(13) G1 = ({even, odd}, {a, b}, {even}, ∆) where∆= {

([even, even], a, ([even, odd], a, ([odd, even], a, ([odd, odd], a, ([even], a, ([odd], a, ([], a,

odd), ([even, even], b, even), ([even, odd], b, even), ([odd, even], b, odd), ([odd, odd], b, odd), ([even], b, even), ([odd], b, odd), ([], b,

even), odd), odd), even), even), odd), even),

(14)

odd

aba

}

even

a

odd

b

odd even

a

even odd

b

This grammar is bottom-up deterministic: given a sequence of “child states” and a symbol, there’s at most one applicable transition. This reflects the fact that there’s a function that determines whether a tree contains an even or odd number of ‘a’s. But this grammar is not top-down deterministic: note the “choices” one has to make at binary-branching nodes when working top-down.

Ling185A, Winter 2021 — Tim Hunter, UCLA

2.4.2 Another abstract example

The grammar in (15) generates the treeset L2 from (9) above (requiring that every ‘b’ dominates a binary- branching ‘a’).

(15)

(16)

G2 = ({0,1},{a,b},{0,1},∆)

where ∆ = {

}

([0,0], ([0, 1], ([1, 0], ([1, 1], ([0], ([1], ([],

a, 1), a, 1), a, 1), a, 1), a, 0), a, 1), a, 0),

([0, 1], ([1, 0], ([1, 1],

b, 1), b, 1), b, 1),

([1], b, 1),

11

bbb

1011

aa

bbb

2.4.3

000101

aaaaaa

11

aa

00 00

aa aa

00

aa

A more linguistic example

Now let’s suppose that the alphabet Σ is the set of English words, plus the additional symbol ∗. (17) Σ = {∗, the, cat, dog, anybody, ever, not, nobody, . . . }

Then the FSTA in (19) encodes a simple version of the NPI-licensing constraint: an NPI such as ‘anybody’ or ‘ever’ must be c-commanded by a licensor such as ‘not’ or ‘nobody’.

(18)

(19)

a. Nobody met anybody

b. * John met anybody

c. Nobody thinks that John met anybody

d. The fact that nobody met anybody surprised John e. * The fact that nobody met John surprised anybody

G3 = ({0, lic, neg}, Σ, {0, lic}, ∆)

where∆= {

([neg, neg], ([0, neg], ([neg, 0], ([0, 0],

([lic, neg], ([lic, 0], ([0, lic], ([lic, lic],

∗, neg), ∗, neg), ∗, neg), ∗, 0),

∗, 0),

∗, 0),

∗, 0),

∗, 0) }

([], anybody,

([], ever,

([], not,

([], nobody,

([], s,

neg),

neg),

lic),

lic),

0) foranyothers∈Σ−{∗},

Ling185A, Winter 2021 — Tim Hunter, UCLA

(20)

0

∗

0 ∗0∗

0 that 0 ∗

lic

∗

neg

0

met anybody

00

surprised John

nobody

neg

Notice that the pattern of two-daughter transitions and zero-daughter transitions in this grammar ensures that the generated trees will contain only (i) binary nodes with the symbol ∗, and (ii) leaf nodes with other symbols.

2.5 So what do FSTAs gain for us?

But wait a minute — if the goal is just to account for the facts about strings in (18), then we can do the same thing with a plain old CFG.

(21)

0

00

0000

that lic neg surprised John nobody 0 neg

met anybody

Of course we’re used to seeing other things as the labels for those internal nodes, and using those labels to enforce certain other requirements (e.g. the requirement that an S is made up of an NP and a VP). But we can just bundle all that information together.1

(22) S0

CP0

VP0

C0S0 V0NP0

that NPlic VPneg surprised John

nobody V0 NPneg met anybody

We can even use a similar trick for “movement”!

1Because the cartesian product of finite sets is necessarily finite.

Ling185A, Winter 2021 — Tim Hunter, UCLA

(23)

S0

NP0

VP0

NP0 surprised John

So while FSTAs are a useful tool for conceptualizing long-distance dependencies (such as NPI-licensing or wh-movement), it turns out that if a stringset can be derived by “reading along the bottom” of all the trees generated by an FSTA, then that stringset can also be generated by a CFG.

3 Stringsets beyond context-free

Some examples of stringsets that cannot be generated by a CFG:

(24) a. {w+w|w∈{a,b}∗} b. {anbncn | n ≥ 0}

c. {anbncndn | n ≥ 0}

d. {aibjcidj |i≥0,j≥0}

These all exhibit crossing dependencies, rather than nesting dependencies of the sort that CFGs can handle. (Imagine trying to recognize these stringsets by by moving through strings from left to right, with an unbounded stack as your available memory.)

3.1 Non-context-free string patterns in natural language

To start, consider the following kinds of sentences in English:

(25) a. we [paint houses]

b. we [help [SC John paint houses]]

c. we [let [SC children help [SC John paint houses]]]

The subject of each “small clause” (SC) gets its case from the verb just above it. In many languages this would be shown overtly on the noun phrases somehow. And in many languages, the choice of verb would affect exactly which case (e.g. accusative or dative) gets assigned to each small clause subject. So we can imagine that the surface strings in fact look like this:

(26) a. we [paint houses]

b. we [help-dat [SC John-dat paint houses]]

c. we [let-acc [SC children-acc help-dat [SC John-dat paint houses]]]

Let’s restrict attention to cases where the accusative-subject small clauses are all “outside” the dative- subject small clauses. Then the English word order pattern can be generated by an FSA: each accusative- assigning verb needs an accusative NP immediately after it, and likewise for each dative-assigning verb. The pattern is analogous to {(V1N1)i(V2N2)j | i ≥ 0, j ≥ 0}.

In a head-final language, we might expect to see a word-order like this:

NP0 RC0

those NP+wh

who NP0 we

met

V0

S=wh

VP=wh

V0 NP=wh

ε

Ling185A, Winter 2021 — Tim Hunter, UCLA

(27) a. we [houses paint]

b. we [[SC John-dat houses paint] help-dat]

c. we [[SC children-acc [SC John-dat houses paint] help-dat] let-acc]

This is beyond an FSA, but possible with a CFG: the very first NP is associated with the very last verb. This is analogous to {N1iN2jV2jV1i | i ≥ 0,j ≥ 0}.

Now the amazing fact: in (at least a certain dialect of) Swiss German, we find the following word order:

(28) a. we [houses paint]

b. we [John-dat houses help-dat paint]

c. we [children-acc John-dat houses let-acc help-dat paint]

This pattern is analogous to {N1iN2jV1iV2j | i ≥ 0,j ≥ 0}, which no CFG can generate. For the record, this is what the relevant parts of the actual sentences look like.

(29) a.

b.

daß mer that we “that we daß mer that we “that we

em Hans es huus ha ̈lfe aastru ̈che

Hans.dat the house.acc helped paint

helped Hans paint the house”

d’chind em Hans es huus lond ha ̈lfe aastru ̈che the children.acc Hans.dat the house.acc let help paint

let the children help Hans paint the house”

Papers presenting this argument were published in the mid-1980s by Riny Huybregts and by Stuart Shieber. Geoff Pullum’s short article entitled “Footloose and context-free” (1986, NLLT) is a very amusing little (six-page) account of the historical development of the ideas.

3.2 A more powerful grammar formalism

Here’s the big idea, building on FSTAs:

• We’ve seen that in order to allow a left-to-right string-processing automaton to recognize patterns

like anbn, we need memory in the form of an unbounded stack, rather than just a single state.

• Let’s introduce the same kind of unbounded stack memory into a bottom-to-top tree-processing

automaton.

• We’ll restrict/simplify the use of the stack slightly: stack information can only flow between a parent

and one of its daughters.

This means that we can generate patterns like anbn along the leaves of a strictly left-branching (or strictly

right-branching) tree. It’s sort of like CFG parsing in disguise.

(30)

[]

Xa

[]

Y

[A]

Yb

[A,A]

Yb

[A,A]

X

[A]

Xa

ε

Ling185A, Winter 2021 — Tim Hunter, UCLA

But when we combine this stack-based memory with center-embedding structures, magic happens! Here’s the rough idea for how we can generate {anbncndn | n ≥ 0}.

(31)

a

a

bX

[]

bXc

ε

Notice that the highest/outermost (a,d) pair is “matched up with” the lowest/innermost (b,c) pair: think- ing bottom-up, the lowest (b,c) pair pushed the deepest ‘A’ onto the stack, and the highest (a,d) pair popped off that ‘A’.

This means that the first ‘a’ is in a dependency with the first ‘c’ — so we have generated crossing dependencies!

Here’s the rough idea for how we can generate {w + w | w ∈ {a, b}∗}.

(32)

a

a

[]

Y

[A]

Yd

[A,A]

Yd

[A,A]

X

[A]

c

[]

Y

[A]

Y

[A,A]

Y

bY

[A,A,B]

X

[A,A]

Xb

[A]

a

[A,A,B]

X

[]

Xa

ε

In a similar way we can generate {aibjcidj | i ≥ 0,j ≥ 0}, which corresponds to the Swiss German case-marking pattern.

Ling185A, Winter 2021 — Tim Hunter, UCLA