# CS代考计算机代写 algorithm 3. Introducing finite-state automata

3. Introducing finite-state automata

First some standard stage-setting definitions:

(1) For any set Σ, we define Σ∗ as the smallest set such that: • ε∈Σ∗,and

• ifx∈Σandu∈Σ∗ then(x:u)∈Σ∗.

We often call Σ an alphabet, call the members of Σ symbols, and call the members of Σ∗ strings.

(2) Foranytwostringsu∈Σ∗ andv∈Σ∗,wedefineu+vasfollows: • ε+v=v

• (x:w)+v=x:(w+v)

Although these definitions provide the “official” notation, I’ll sometimes be slightly lazy and abbreviate ‘x:ε’ as ‘x’, and abbreviate both ‘s:t’ and ‘s + t’ as just ‘st’ in cases where it should be clear what’s intended.

I’ll generally use x, y and z for individual symbols of an alphabet Σ, and use u, v and w for strings in Σ∗. This should help to clarify whether a ‘:’ or a ‘+’ has been left out.

1 Finite-state automata, informally

Below, in (3) and (4), are graphical representations of two finite-state automata (FSAs). The circles represent states. The initial states are indicated by an “arrow from nowhere”; the final or accepting states are indicated by an “arrow to nowhere”.

The FSA in (3) generates the subset of {C, V}∗ consisting of all and only strings that have at least one occurrence of ‘V’.

CC

(3)

30

V

31

V

The FSA in (4) generates the subset of {C, V}∗ consisting of all and only strings that contain either two adjacent ‘C’s or two adjacent ‘V’s (or both).

Ling185A, Winter 2021 — Tim Hunter, UCLA

41

CC CC

(4)

40

43

VV

VV 42

The FSA in (5) generates the subset of {C, V}∗ consisting of all and only strings which end in ‘VC’. C

VC 123

V

If we think of state 1 as indicating syllable boundaries, then FSA in (6) generates sequences of syllables of the form ‘(C)V(C)’. The string ‘VCV’, for example, can be generated via two different paths, 1-1-2-1 and 1-3-1-1, corresponding to different syllabifications.

(5)

(6)

2

(7)

C 12 V

VCVV 3

Formal definition of an FSA

A finite-state automaton (FSA) is a five-tuple (Q, Σ, I, F, ∆) where:

• Q is a finite set of states;

• Σ, the alphabet, is a finite set of symbols;

• I ⊆ Q is the set of initial states;

• F ⊆ Q is the set of ending states; and

• ∆⊆Q×Σ×Qisthesetoftransitions.

So strictly speaking, (4) is a picture of the following mathematical object: (8) {40, 41, 42, 43}, {C, V}, {40}, {43},

{(40, C, 40), (40, C, 41), (40, V, 40), (40, V, 42), (41, C, 43), (42, V, 43), (43, C, 43), (43, V, 43)} You should convince yourself that (4) and (8) really do contain the same information.

Now let’s try to say more precisely what it means for an automaton M = (Q, Σ, I, F, ∆) to generate/accept a string.

(9) For M to generate a string of three symbols, say x1x2x3, there must be four states q0, q1, q2, and q3 such that

Ling185A, Winter 2021 — Tim Hunter, UCLA

• q0∈I,and

• (q0, x1, q1) ∈ ∆, and

• (q1, x2, q2) ∈ ∆, and

• (q2, x3, q3) ∈ ∆, and

• q3∈F.

.

(10) More generally, M generates a string of n symbols, say x1x2 . . . xn, iff: there are n + 1 states q0, q1, q2, …qn such that

• q0∈I,and

• for every i ∈ {1,2,…,n}, (qi−1,xi,qi) ∈ ∆, and

• qn∈F.

To take a concrete example:

(11) The automaton in (4)/(8) generates the string ‘VCCVC’ because we can choose q0, q1, q2, q3, q4 and q5 to be the states 40, 40, 41, 43, 43 and 43 (respectively), and then it’s true that:

• 40∈I,and

• (40,V,40)∈∆,and

• (40,C,41)∈∆,and

• (41,C,43)∈∆,and

• (43,V,43)∈∆,and

• (43,C,43)∈∆,and

• 43∈F.

Side remark: Note that abstractly, (10) is not all that different from:

(12) A tree-based grammar will generate a string x1x2 . . . xn iff: there is some collection of nonter- minal symbols that we can choose such that

• those nonterminal symbols and the symbols x1, x2, etc. can all be clicked together into a tree structure in ways that the grammar allows, and

• the nonterminal “at the top” is the start symbol. (Much more on this in a few weeks!)

We’ll write L(M) for the set of strings generated by an FSA M. So stated roughly, the important idea is: (13) w∈L(M)

⇐⇒

all possible paths p

⇐⇒

all possible paths p

string w can be generated by path p

step s is allowed and generates the appropriate part of w

all steps s in p

It’s handy to write I(q0) in place of q0 ∈ I, and likewise for F and ∆. Then one way to make (13) precise is:

(14) x1x2…xn∈L(M)

⇐⇒ ··· I(q0)∧∆(q0,x1,q1)∧···∧∆(qn−1,xn,qn)∧F(qn)

q0∈Q q1∈Q qn−1∈Q qn∈Q

But it’s practical and enlightening to break this down in a couple of different ways.

Ling185A, Winter 2021 — Tim Hunter, UCLA

2.1 Forward values

For any FSA M there’s a two-place predicate fwdM , relating states to strings in an important way:

(15) fwdM (w)(q) is true iff there’s a path through M from some initial state to the state q, emitting the

string w

Given a way to work out fwdM(w)(q) for any string and any state, we can easily use this to check for

membership in L(M):

(16) w∈L(M)⇐⇒ fwdM(w)(qn)∧F(qn)

qn ∈Q

We can represent the predicate fwdM in a table. Each column shows fwdM values for the entire prefix

consisting of the header symbols to its left. The first column shows values for the empty string. (17) Here’s the table of fwdM values for prefixes of the string ‘CVCCVVC’ for the FSA in (5).

State CVCCVVC

1 11111111 2 00100110 3 00010001

Notice that filling in the values in the leftmost column is easy: this column just says which states are initial states. And with a little bit of thought you should be able to convince yourself that, in order to fill in a column of this table, you only need to know:

• the values in the column immediately to its left, and

• the symbol immediately to its left. More generally, this means that:

(18) The fwdM values for a non-empty string x1 . . . xn depend only on

• the fwdM values for the string x1 . . . xn−1, and

• the symbol xn.

This means that we can give a recursive definition of fwdM : (19) fwdM (ε)(q) = I(q)

fwdM(x1…xn)(q)= fwdM(x1…xn−1)(qn−1)∧∆(qn−1,xn,q) qn−1 ∈Q

This suggests a natural and efficient algorithm for calculating these values: write out the table, start by filling in the leftmost column, and then fill in other columns from left to right. This is where the name “forward” comes from.

2.2 Backward values

We can do all the same things, flipped around in the other direction.

For any FSA M there’s a two-place predicate bwdM , relating states to strings in an important way:

(20) bwdM (w)(q) is true iff there’s a path through M from the state q to some ending state, emitting the string w

Ling185A, Winter 2021 — Tim Hunter, UCLA

Given a way to work out bwdM(w)(q) for any string and any state, we can easily use this to check for membership in L(M):

(21) w∈L(M)⇐⇒ I(q0)∧bwdM(w)(q0) q0 ∈Q

We can represent the predicate bwdM in a table. Each column shows bwdM values for the entire suffix consisting of the header symbols to its right. The last column shows values for the empty string.

(22) Here’s the table of bwdM values for suffixes of the string ‘CVCCVVC’ for the FSA in (5). State CVCCVVC

1 11111100 2 00000010 3 00000001

In this case, filling in the last column is easy, and each other column can be filled in simply by looking at the values immediately to its right.

(23) The bwdM values for a non-empty string x1 . . . xn depend only on

• the bwdM values for the string x2 . . . xn, and

• the symbol x1.

So bwdM can also be defined recursively.

(24)

2.3

Now (25)

And (26)

(27)

bwdM (ε)(q) = F (q)

bwdM(x1…xn)(q)= ∆(q,x1,q1)∧bwdM(x2…xn)(q1)

q1 ∈Q

Forward values and backward values together

we can say something beautiful:

uv∈L(M) ⇐⇒ fwdM(u)(q)∧bwdM(v)(q)

q∈Q

in fact (16) and (21) are just special cases of (25), with u or v chosen to be the empty string: w∈L(M) ⇐⇒ fwdM(w)(q)∧bwdM(ε)(q) ⇐⇒ fwdM(w)(q)∧F(q)

q∈Q q∈Q

w∈L(M) ⇐⇒ fwdM(ε)(q)∧bwdM(w)(q) ⇐⇒ I(q)∧bwdM(w)(q) q∈Q q∈Q

Ling185A, Winter 2021 — Tim Hunter, UCLA