CS代考 BLOSUM45 matrix – cscodehelp代写

Homology and pairwise alignment CompSci 369, 2022

School of Computer Science, University of Auckland

Homologous traits are ones that share a common ancestry.
Homologous sequences or regions are ones that share a common ancestry.
Homologous traits/regions are called homologs.
Similarlity of homologous traits/regions depends on distance to most recent common ancestor.
Can get similarity without homology
Traits that are similar but not homologs are analogs.
So homology does not imply similarity and similarity does not imply homology.

Pairwise alignment
Given two homologous sequences how do they align with each other?
That is, exactly which sites in the sequence are homologous with each other?

Example: x = GAATTC and y = GATTA could align as
For 2 sequences of length , there are
different ways of aligning them.
nπ√ ≈)n( n22 n2

Have sequences and of length and . is the th symbol of . So
Symbols could be 4 DNA bases, 4 RNA bases or 20 amino acids.
Put gaps in sequences to allow them to align better with each other. Gaps correspond to insertions and deletions.
Gaps exist in sequence only relative to sequence — they are not an intrinsic part of sequence .
mx … 3x2x1x = x x i ix nm yx

Scoring alignments
To decide on best alignment, want a way of scoring them.
Easy to come up with ad-hoc scores, e.g. for each site, score 2 when residues agree, -1 when they don’t
Example: Scoring 2 when residues agree, -1 when they do not agree If x = GAATTC and y = GGATTA are aligned
6=1−2+2+2+1−2
ATTAGG CTTAAG

Model of random sequences
Want a model based method of scoring.
The probability (or likelihood) of sequence is
and both length .
and are random sequences that are independent of each other.
appears with frequency
The likelihood of an alignment of and is just the joint probability,
1=i 1=i 1=i
.iyqixq∏= iyq∏ixq∏=)y(P)x(P=)y,x(P
ixq∏ = )x(P

Model of homologous sequences
If and are related, they are not independent so don’t get .
Instead, let the probability of observing (from ) and (from ) aligned at a locus is
Probability (likelihood) of the alignment is the product
)y(P)x(P = )y,x(P
iyixp ∏ = )y ,x(P

Now compare models by taking ratio of likelihoods
Easier to work with log-likelihoods so take logs and define the score for an alignment
is the score matrix or substituion matrix If there are possible residues, is .
.) bqaq (gol = )b ,a(s bap
i )iy,x(s ∑ = S
iyqixq 1=i iyqixq 1=i∏
.∏=n iyixp iyixp 1=i∏

BLOSUM45 matrix

Scoring gaps
Gaps are rare and want to penalise their use Also want to encourage gaps to fall into clumps
For gap of length , let Two types of penalty:
Linear penalty: Affine penalty:
is open gap penalty,
Affine gap forces nearby gaps together
More complex penalities can be used but come at computational cost
be the penalty
is extension penalty
ed 0 > e > d e)1 − k( − d− = )k(γ
0 > d kd− = )k(γ )k(γ k