# 程序代写代做代考 Bioinformatics c++ matlab scheme algorithm HW4

HW4

Homework #4 for Bioimaging and Bioinformatics
BME2210 – Spring 2017
Bioinformatics portion

Due (as 5 files – 4 .cpp files and 1 .txt file – to the ICON dropbox) by 11am on Monday, February
20

Implement Needleman-Wunsch algorithm

Implement the global alignment Needleman-Wunsch algorithm for protein sequences in an
iterative fashion (i.e., do not use recursion, which simplifies the implementation for those that
do not have experience with recursion.) For an iterative solution, we will use 3 matrices:

• The “A” matrix will store the final alignment scores.
• The “B” matrix is simply an initialized “A” matrix with Blosum62 scores for each pair of

residues.
• The “D” matrix will store our directions.

If you wish, you can look at the sample code (included below), which provides you with
functions to:

• initialize A and/or B matrices
• print matrices, and
• use the BLOSUM62 substitution matrix for scoring

However, note that the sample code is written in Matlab, to help you understand the overall
algorithm, but you need to turn in your code for parts 1-4 in C++ (.cpp) files. Thus, while the
Matlab code limits the number of amino acids to 25, your code should use strings in order to
handle any size of input; and while the Matlab code merely initializes the matrix to be filled
with zeros (instead of undefined values), you actually need to fill the matrix with real values. Its’
main use is to show you how a matrix can be fairly easily printed.

Hint 1: C++ can easily handle padding a string with spaces using the setw and setfill commands,
for which you would need to include iomanip as well as iostream – i.e., as described at
http://www.cplusplus.com/reference/iomanip/setw/. (actually you won’t even need setfill(),
since spaces are the default padding material – I just mention that for future reference) If you
try to use the older C style of using a statement such as ‘printf (“% 3s
”, x);’, for which you
would need to include stdio.h, you will have a much harder time printing off the matrices in this
assignment, since you will have to figure out a way to convert an object of string class (and
being of any arbitrarily-given size) to the older C-style ‘character array’ that printf demands.
Thus, I highly recommend the C++ style, as shown in the code below where I print an arbitrary
string (here, a value of -2) with an arbitrary amount of space-padding (here, 3 spaces, although
note that the maximum width of any value in the BLOSUM62 matrix is only 2 character spaces,
since its’ maximum value is ‘11’ and its’ minimum value is ‘-4’):

#include
#include
#include
using namespace std;

int main()
{
string x=”-2″;
that one reason to do this is that if you do not deal with such scenarios, then the code becomes
non-deterministic (meaning that if I reverse the order of the two input sequences – i.e., swap
sequence #1 and #2 – I can get a different answer, mainly b/c of arbitrarily picking V, and
therefore swapping H & V will affect the outcome).

Continuing the sample output from above:

D matrix
A R N C D E
0 H H H H H H
A V D H H H H H

R V V D H H H H
N V V V D H H H
D V V V V D D H
C V V V V D H H
Q V V V V V D D
E V V V V V D D

Part 4: Create a file hw4.4.cpp. In this part, it is now straight-forward to use the “D” matrix to
generate the alignment. Do that, and then print out that alignment.

Continuing the sample output from above, the alignment would be:

ARNDCQE
||| | |
ARN-CDE

Note how the vertical bar is only shown for identical letters, but not for gaps or mismatches.

[Optional Challenge]. Purely for your own interest, you could implement this Needleman-
Wunsch algorithm using recursion. There are a number of reasons that you might want to do
this – but if you end up doing that, which do you find that you prefer?

Part 5: Create a plain-text file hw4.5.txt. In this part, write a few sentences to briefly discuss
(i.e., just a few sentences each) the following issues:

a) What were the major obstacles/hurdles that you had to overcome during this
assignment?

a. Personally, was it your mastery of the tool that you used (C++), or basic biology
(knowing what an ‘amino acid’ is), or bioinformatics (knowing what an
‘alignment’ is), or just finding time to do the homework? Those are all honest
and valid responses to this question.

b. What I am particularly interested in, though, is: what were the major engineering
challenges that had to be overcome to solve problems of this type?

b) Now that you have code that implements the Needleman-Wunsch algorithm, what
would be the next step(s) for someone to take? (don’t worry, it doesn’t have to be you,

at least not for this class:-)

a. If you were going to make improvements to your code, what are some ideas that
you could start off with, and why might you want to do those improvements?

(I gave you some ideas above, like adding symbols that go beyond just H/V/D,
which would allow more options to be explored instead of being completely
arbitrary; and another idea would be to read in the BLOSUM62 matrix from a file
instead of hard-coding it in, b/c that would allow you to use other, alternative
matrices in the future, such as BLOSUM45, or PAM250, etc. But since I gave you
these ideas, please pick other ones – note that recursion is fine to include, as
long as you state why it would be useful, but in that case please also include at
least one more idea as well)

b. Going beyond just this code, what are some more general improvements that
you could make, and why would you want to do them?

Hint: the main answer that I am looking for here is the name of an algorithm that
is better than Needleman-Wunsch…can you think of any?

c) Pretend that I am a world-class expert at the subject of bioinformatics (e.g., I know all
about alignments, and algorithms, and scoring, etc.), but yet somehow I knew nothing
about the Needleman-Wunsch algorithm (as if that were possible! Maybe I’m from an
alternate universe or something:-). Describe for me what Needleman-Wunsch is in
general, what it does in particular, and what it can do for me. Like, when should I use it?
When would I be better off not to use it, and should use something else instead?
Notably, what are the key concepts that distinguish it from all other bioinformatics
algorithms?

15 points for working code, part 1.
15 points for working code, part 2.
30 points for working code, part 3.
40 points for working code, part 4.
5 points for answers, part 5.

NOTE: If your program does not compile you will receive a zero on that part of your homework.
In addition, late homework will not be accepted. �DO NOT WORK TOGETHER! Students caught
working together on this or any assignment will drop a whole letter grade for the course! �

% Code to start part HW4 from.
function HW4_0

% A trial sequence with all amino acids:
% ARNDCQEGHILKMFPSTWYV

% The trial sequences in the HW description:
% ARNDCQE
% ARNCDE

% Ask the user for protein sequence 1.
prompt = ‘Enter the first amino acid sequence of length 1 to 25: ‘;
seq1 = input(prompt, ‘s’);
len1 = length(seq1);
fprintf(1, ‘Seq 1 %s has %d residues
’, seq1, len1);

% Ask the user for protein sequence 2.
prompt = ‘Enter the second amino acid sequence of length 1 to 25: ‘;
seq2 = input(prompt, ‘s’);
len2 = length(seq2);
fprintf(1, ‘Seq 2 %s has %d residues
’, seq2, len2);

% Initialize a 25×25 matrix of zeros.
A = zeros(25,25);

% The supplied initMatrix must be modified!
A = initMatrix(A, seq1, seq2);

% Print the matrix A.
printMatrix(A, seq1, seq2);

end

% Print out a matrix
function mat = initMatrix(mat, seq1, seq2)
numRows = length(seq1) + 1;
numCols = length(seq2) + 1;

mat(1,1) = 0; % Init the value at (1,1) to 0.

% Initialize the first column.
for i=2:numRows
% Do something here!
end

% Initialize the first row.
for c=2:numCols
% Do something here!
end

% Init the rest of the matrix.
for r=2:numRows
for c=2:numCols
% Do something here!
end
end

end

% Print out a matrix
function printMatrix(mat, seq1, seq2)
len1 = length(seq1);
len2 = length(seq2);

% Print the first row of sequence letters
fprintf(1,’ ’);
for r=1:len2
fprintf(1,’%2c ’, seq2(r));
end
fprintf(1,’
’);

% Print the first row of scores.
fprintf(1,’ ’);
for c=1:len2+1
fprintf(‘%2d ’, mat(1,c));
end
fprintf(1,’
’);

% Print the rest of the scores.
for r=2:len1+1
% Print the first letter of vertical sequence
fprintf(1, ‘%c ’, seq1(r-1));
for c=1:len2+1
fprintf(1,’%2d ’,mat(r,c));
end
fprintf(1,’
’);
end
end

function value = scorePair(char1, char2)
% Order: ARNDCQEGHILKMFPSTWYV
i1 = aaToInt(char1);
i2 = aaToInt(char2);
A = [
4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0;
-1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3;
-2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3;
-2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3;
0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1;
-1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2;
-1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2;
0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3;
-2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3;
-1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3;
-1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1;
-1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2;
-1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1;
-2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1;
-1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2;
1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2;
0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0;
-3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3;
-2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1;

0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4];
value = A(i1, i2);
end

function value = aaToInt(aa)
switch aa
case ‘A’,
value = 1;
case ‘R’,
value = 2;
case ‘N’,
value = 3;
case ‘D’,
value = 4;
case ‘C’,
value = 5;
case ‘Q’,
value = 6;
case ‘E’,
value = 7;
case ‘G’,
value = 8;
case ‘H’,
value = 9;
case ‘I’,
value = 10;
case ‘L’,
value = 11;
case ‘K’,
value = 12;
case ‘M’,
value = 13;
case ‘F’,
value = 14;
case ‘P’,
value = 15;
case ‘S’,
value = 16;
case ‘T’,
value = 17;
case ‘W’,
value = 18;
case ‘Y’,
value = 19;
case ‘V’,
value = 20;
otherwise,
value = 1;
end
end

Posted in Uncategorized