Homology and pairwise alignment CompSci 369, 2022

School of Computer Science, University of Auckland

Copyright By cscodehelp代写 加微信 cscodehelp

Homologous traits are ones that share a common ancestry.

Homologous sequences or regions are ones that share a common ancestry.

Homologous traits/regions are called homologs.

Similarlity of homologous traits/regions depends on distance to most recent common ancestor.

Can get similarity without homology

Traits that are similar but not homologs are analogs.

So homology does not imply similarity and similarity does not imply homology.

Pairwise alignment

Given two homologous sequences how do they align with each other?

That is, exactly which sites in the sequence are homologous with each other?

Example: x = GAATTC and y = GATTA could align as

For 2 sequences of length , there are

different ways of aligning them.

nπ√ ≈)n( n22 n2

Have sequences and of length and . is the th symbol of . So

Symbols could be 4 DNA bases, 4 RNA bases or 20 amino acids.

Put gaps in sequences to allow them to align better with each other. Gaps correspond to insertions and deletions.

Gaps exist in sequence only relative to sequence — they are not an intrinsic part of sequence .

mx … 3x2x1x = x x i ix nm yx

Scoring alignments

To decide on best alignment, want a way of scoring them.

Easy to come up with ad-hoc scores, e.g. for each site, score 2 when residues agree, -1 when they don’t

Example: Scoring 2 when residues agree, -1 when they do not agree If x = GAATTC and y = GGATTA are aligned

6=1−2+2+2+1−2

ATTAGG CTTAAG

Model of random sequences

Want a model based method of scoring.

The probability (or likelihood) of sequence is

and both length .

and are random sequences that are independent of each other.

appears with frequency

The likelihood of an alignment of and is just the joint probability,

1=i 1=i 1=i

.iyqixq∏= iyq∏ixq∏=)y(P)x(P=)y,x(P

ixq∏ = )x(P

Model of homologous sequences

If and are related, they are not independent so don’t get .

Instead, let the probability of observing (from ) and (from ) aligned at a locus is

Probability (likelihood) of the alignment is the product

)y(P)x(P = )y,x(P

iyixp ∏ = )y ,x(P

Now compare models by taking ratio of likelihoods

Easier to work with log-likelihoods so take logs and define the score for an alignment

is the score matrix or substituion matrix If there are possible residues, is .

.) bqaq (gol = )b ,a(s bap

i )iy,x(s ∑ = S

iyqixq 1=i iyqixq 1=i∏

.∏=n iyixp iyixp 1=i∏

BLOSUM45 matrix

Scoring gaps

Gaps are rare and want to penalise their use Also want to encourage gaps to fall into clumps

For gap of length , let Two types of penalty:

Linear penalty: Affine penalty:

is open gap penalty,

Affine gap forces nearby gaps together

More complex penalities can be used but come at computational cost

be the penalty

is extension penalty

ed 0 > e > d e)1 − k( − d− = )k(γ

0 > d kd− = )k(γ )k(γ k

程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: kyit630461@163.com