CS代考程序代写 computational biology algorithm Lecture 6:

Lecture 6:
The University of Sydney
Page 1
Fast Fourier Transform

Dynamic Programming Summary
– 1Ddynamicprogramming
– Weightedintervalscheduling
– SegmentedLeastSquares
– Maximum-sumcontiguoussubarray – Longestincreasingsubsequence
– 2Ddynamicprogramming – Knapsack
– Sequencealignment
– Dynamicprogrammingoverintervals
– RNASecondaryStructure
– Dynamicprogrammingoversubsets – TSP
– k-path – Playlist
The University of Sydney
Page 2

How does Google know what I want?
The University of Sydney Page 3

How does Google know what I want?
The University of Sydney Page 4

Edit distance
– How similar are two strings? – S = “ocurrance”
– T = “occurrence”
o
c

u
r
r
a
n
c
e
o
c
c
u
r
r
e
n
c
e
1 mismatch, 1 gap
Modify S to get T by adding c, swapping a-e The University of Sydney
Page 5

Edit distance
– How similar are two strings? – S = “ocurrance”
– T = “occurrence”
o
c

u
r
r

a
n
c
e
o
c
c
u
r
r
e

n
c
e
0 mismatches, 3 gaps
Modify S to get T by adding c, e, deleting e The University of Sydney
Page 6

Edit distance
– How similar are two strings? – S = “ocurrance”
– T = “occurrence”
o
c
u
r
r
a
n
c
e

o
c
c
u
r
r
e
n
c
e
5 mismatches, 1 gap
Modify S to get T by swapping u-c, r-u, a-r, n-e, c-n, e-c, adding e
The University of Sydney Page 7

Which edit is better?
o
c

u
r
r
a
n
c
e
o
c
c
u
r
r
e
n
c
e
1 mismatch, 1 gap
o
c

u
r
r

a
n
c
e
o
c
c
u
r
r
e

n
c
e
0 mismatches, 3 gaps
o
c
u
r
r
a
n
c
e

o
c
c
u
r
r
e
n
c
e
The University of Sydney
5 mismatches, 1 gap
Page 8

Edit Distance
– Applications.
– Basis for Unix diff.
– Speech recognition.
– Computational biology.
– Edit distance. [Levenshtein 1966, Needleman-Wunsch 1970] – Gap penalty d and mismatch penalty apq.
– Cost = sum of gap and mismatch penalties.
C
T
G
A
C
C
T
A
C
C
T

C
T
G
A
C
C
T
A
C
C
T
C
C
T
G
A
C
T
A
C
A
T
C
C
T
G
A
C

T
A
C
A
T
aTC + aGT + aAG+ 2aCA
The University of Sydney Page 9
2d + aCA

Sequence Alignment
– Goal: Given two strings X = x1 x2 . . . xm and Y = y1 y2 . . . yn find alignment of minimum cost.
– Definition: An alignment M is a set of ordered pairs xi-yj such that each item occurs in at most one pair and no crossings.
– Definition: The pair xi-yj and xa-yb cross if i < a, but j > b.
Example: CTACCG vs. TACATG.
x1 x2 x3 x4 x5 x6
C
T
A
C
C

G

T
A
C
A
T
G
The University of Sydney M = x2-y1, x3-y2, x4-y3, x5-y4, x6-y6. Page 10
y1 y2 y3 y4 y5 y6

Sequence Alignment
cost(M) = Saxi,yj + Sd + Sd (xi,yj)ÎM i: xi unmatched j: yj unmatched
mismatch Example: CTACCG vs. TACATG.
x1 x2 x3 x4 x5 x6
C
T
A
C
C

G

T
A
C
A
T
G
The University of Sydney
Page 11
y1 y2 y3 y4 y5 y6
M = x2-y1, x3-y2, x4-y3, x5-y4, x6-y6.

Key steps: Dynamic programming
1. Define subproblems 2. Find recurrences
3. Solve the base cases
4. Transform recurrence into an efficient algorithm
The University of Sydney Page 12

Sequence Alignment: Problem Structure
Step 1: Define subproblems ???
x1 x2 x3 x4 x5 xi
··· ···
C
T
A
C
C

G

T
A
C
A
T
G
The University of Sydney
Page 13
y1 y2 y3 y4 y5 yj

Sequence Alignment: Problem Structure Step 1: Define subproblems
OPT(i, j) =
min cost of aligning strings x1 x2 . . . xi and y1 y2 …yj.
x1 x2 x3 x4 x5 xi
··· ···
C
T
A
C
C

G

T
A
C
A
T
G
The University of Sydney
Page 14
y1 y2 y3 y4 y5 yj

Sequence Alignment: Problem Structure
Definition: OPT(i, j) = Step 2: Find recurrences
min cost of aligning strings x1 x2 . . . xi and y1 y2 …yj.
x1 x2 x3 x4 x5 xi
C
T
A
C
C
G
T
A
C
A
T
G
The University of Sydney
Page 15
y1 y2 y3 y4 y5 yj

Sequence Alignment: Problem Structure
Definition: OPT(i, j) = min cost of aligning strings x1 x2 . . . xi and y1 y2 …yj.
x1 x2 x3 x4 x5 xi
C
T
A
C
C
G
Step 2: Find recurrences
– Case 1: OPT matches xi-yj.
• pay mismatch for xi-yj + min cost of aligning twostringsx1 x2 …xi-1 andy1 y2 …yj-1
y1 y2 y3 y4
y5 yj
T
A
C
A
T
G
The University of Sydney
Page 16

Sequence Alignment: Problem Structure
Definition: OPT(i, j) = min cost of aligning strings x1 x2 . . . xi and y1 y2 …yj.
x1 x2 x3 x4 x5 xi
Step 2: Find recurrences
– Case 1: OPT matches xi-yj. y1 y2 y3 y4 y5 yj • pay mismatch for xi-yj + min cost of aligning
twostringsx1 x2 …xi-1 andy1 y2 …yj-1 – Case 2a: OPT leaves xi unmatched.
• paygapforxi andmincostofaligningx1 x2 …xi-1 andy1 y2 …yj
C
T
A
C
C
G
T
A
C
A
T
G

The University of Sydney Page 17

Sequence Alignment: Problem Structure
Definition: OPT(i, j) = min cost of aligning strings x1 x2 . . . xi and y1 y2 …yj.
x1 x2 x3 x4 x5 xi
Step 2: Find recurrences
– Case 1: OPT matches xi-yj. y1 y2 y3 y4 y5 yj • pay mismatch for xi-yj + min cost of aligning
twostringsx1 x2 …xi-1 andy1 y2 …yj-1 – Case 2a: OPT leaves xi unmatched.
• paygapforxi andmincostofaligningx1 x2 …xi-1 andy1 y2 …yj – Case 2b: OPT leaves yj unmatched.
• paygapforyj andmincostofaligningx1 x2 …xi andy1 y2 …yj-1 The University of Sydney Page 18
C
T
A
C
C
G

T
A
C
A
T
G

Sequence Alignment: Problem Structure
– Definition: OPT(i, j) = min cost of aligning strings x1 x2 . . . xi and y1 y2 …yj.
– Case 1: OPT matches xi-yj.
• pay mismatch for xi-yj + min cost of aligning
twostringsx1 x2 …xi-1 andy1 y2 …yj-1 – Case 2a: OPT leaves xi unmatched.
• paygapforxi andmincostofaligningx1 x2 …xi-1 andy1 y2 …yj – Case 2b: OPT leaves yj unmatched.
• paygapforyj andmincostofaligningx1 x2 …xi andy1 y2 …yj-1
OPT(i, j) = min
axiyj + OPT(i-1,j-1) d + OPT(i-1, j)
d + OPT(i, j-1)
The University of Sydney Page 19

Sequence Alignment: Problem Structure Step 3: Solve the base cases
OPT(0,j) = j·d and OPT(i,0) = i·d
OPT(i,j) = j·d if i=0 OPI(i, j) = OPT(i,j) = i·d if j=0
min{axiyj + OPT(i-1, j-1),d + OPT(i-1, j),d + OPT(i, j-1)} otherwise
The University of Sydney
Page 20

Sequence Alignment: Algorithm
Sequence-Alignment(m, n, x1x2…xm, y1y2…yn, d, a) { for i = 0 to m
M[0, i] = id for j = 0 to n
M[j, 0] = jd
for i = 1 to m
for j = 1 to n
M[i, j] = min{a[xi, yj] + M[i-1, j-1], d + M[i-1, j],
return M[m, n]
}
d + M[i, j-1]}
– Analysis. Q(mn) time and space.
– English words or sentences: m, n £ 100.
– Computational biology: m = n = 100,000. 10 billions ops OK, but 10GB array?
The University of Sydney
Page 21

Sequence Alignment: Algorithm
Sequence-Alignment(m, n, x1x2…xm, y1y2…yn, d, a) { for i = 0 to m
M[0, i] = id for j = 0 to n
M[j, 0] = jd
for i = 1 to m
for j = 1 to n
M[i, j] = min{a[xi, yj] + M[i-1, j-1], d + M[i-1, j],
return M[m, n]
}
d + M[i, j-1]}
– To get the alignment itself we can trace back through the array M.
The University of Sydney Page 22

Sequence Alignment: Linear Space
Question: Can we avoid using quadratic space?
The University of Sydney Page 23

Sequence Alignment: Linear Space
Question: Can we avoid using quadratic space?
Easy. Optimal value in O(m + n) space and O(mn) time.
– Compute OPT(i, j) from OPT(i-1, j), OPT(i-1, j-1) and OPT(i, j-1). – BUT! No longer a simple way to recover alignment itself.
The University of Sydney
Page 24

Sequence Alignment: Linear Space
Question: Can we avoid using quadratic space?
Easy. Optimal value in O(m + n) space and O(mn) time.
– Compute OPT(i, j) from OPT(i-1, j), OPT(i-1, j-1) and OPT(i, j-1). – BUT! No longer a simple way to recover alignment itself.
Theorem: [Hirschberg 1975] Optimal alignment in O(m + n) space and O(mn) time.
– Clever combination of divide-and-conquer and dynamic programming. – Inspired by idea of Savitch from complexity theory.
The University of Sydney Page 25

Sequence Alignment: Linear Space
– Edit distance graph.
– m ́n grid graph GXY (as shown in the figure)
– Horizontal/vertical edges have cost d (gap)
– Diagonal edges from (i-1, j-1) to (i,j) have cost axiyj. (match)
e y1 y2 y3 y4 y5 y6 0-0
axiyj d
d i-j
e
x1 x2
x3
m-n
The University of Sydney
Page 26

Sequence Alignment: Linear Space
– Edit distance graph.
– Let f(i, j) be cheapest path from (0,0) to (i, j). – Observation: f(i, j) = OPT(i, j).
e y1 y2 y3 y4 y5 y6 0-0
axiyj d
d i-j
e
x1 x2
x3
m-n
The University of Sydney
Page 27

Sequence Alignment: Linear Space
min{axiyj + OPT(i-1, j-1),d + OPT(i-1, j),d + OPT(i, j-1)}
e
x1 x2
x3
e y1 y2 y3 y4 y5 y6 0-0
axiyj d
d i-j
m-n
The University of Sydney
Page 28

Sequence Alignment: Linear Space
– Edit distance graph.
– Let f(i, j) be cheapest path from (0,0) to (i, j).
– Can compute f(m,n) in O(mn) time and O(mn) space.
j
y4
i-j
e y1 y2 y3 0-0
x1 x2
y5 y6
e
x3
m-n
The University of Sydney
Page 29

Sequence Alignment: Linear Space
– Edit distance graph.
– Let f(i, j) be cheapest path from (0,0) to (i, j).
– If only interested in the value of the optimal alignment we do it in O(mn)
time and O(m + n) space.
e y1 y2 y3 0-0
x1 x2
j
y4
i-j
y5 y6
e
x3
m-n
The University of Sydney
Page 30