Alignment is the tool of Bioinformatics. It is used to quantify and to visualize the sequence similarity. It is assumed that good alignment is equal to sequence similarity. In other words, the total number of alignment is huge, we need an objective function (scoring function) to identify the best alignment called "Optimal alignment". Similar sequences are likely to have similar structure and function. For example, if one sequence has known structure and function, sequence alignment can be used to "Map" this knowledge onto other similar sequences.
In bioinformatics, pairwise sequence alignment method is used to match two sequences. These sequences can be of DNA, RNA or proteins. Biological sequences that are similar but not exactly matched provide useful information between the two sequences. This information specifies the structural, functional or evolutionary relationship between two sequences. This is the most common way of finding similarity between two sequences by comparing them to one another. By comparing them, we directly come to know whether they are similar or not.
Strategies for Pairwise Alignment
There are two strategies for pairwise sequence alignment.
• Local Alignment: in local alignment, it takes small stretches of sequences and then progress. This method is useful when we have to align domains (functional unit of proteins) or find domains. But fails when we have to align a larger sequence or a larger database.
• Global Alignment: in case of global alignment, it takes long stretch of residues that attempts to align each residue in each sequence.
Significance of alignment
How good an alignment we get depends on the length of residues that are similar. There are basically three variants in sequence alignments; exact match, inexact match and gap (insertion/deletion). Significance of similarity is represented in the form of P-value or probability value and E-value or expected value.
P-value: of a similarity score 'S' is the probability that a score of at least 'S' should have been obtained in a match between any two unrelated protein sequences of similar composition and length. If p≤0.01 then we assume that target sequence is homologue of retrieved sequence.
E-value: is related to P-value and is the expected frequency of similarity scores of at least 'S' would occur by chance. If E-value is less than 1Ãâ€" 10-80 then it means that the two sequences are homologues and very similar even E-value is zero. If E-value lies between 1Ãâ€"10-50 to 1Ãâ€"10-2 then sequences may be similar or homologues because very less similarity. And if E-value is less then 1Ãâ€"10-2 then there will be no similarity.
Drawback of Pairwise sequence alignment
In pairwise alignment
-----M V L Y Q D-----
-----M V Q Y Q R-----
We do not have the surety that this above sequence comes throughout the sequence or it is due to point mutation (change in DNA). If we have a stretch of sequence which is not changing because it is very important for protein function, we called this stretch as "Domain".
Suppose I have a novel protein and I have to place it in a family which have the same characteristics as the novel protein and I will place it in that family , for this we have so many members that's why we do "Multiple alignment" (an extension of pairwire sequence alignment, more than two sequences are aligned). If we have 10 sequences and 'V' appears in every sequence, it means 'V' have very important role ('V' is conserved). And if the domain is same in these 10 sequences my confidence level will increase. More we do multiple alignments more increase in confidence level.
About Author / Additional Info: