|
Discontinuous Alignments
|
|
|
The alignments displayed by Cn3D, whether calculated by VAST or BLAST and related algorithms, are discontinuous pairwise alignments. This means that each sequence is related to the master sequence through a set of nonoverlapping alignments separated by unaligned regions.
Since the unaligned regions are completely unconstrained, they can be represented by spaces or the sequences can be centered or right- or left-justified; all of these are viewer options in DDV. Similarly, the sequences may have N- or C- termini which are not contained in the alignment, so it is left up to the user whether to display one or both of these "tails". Unaligned regions are shown in lowercase to distinguish them from aligned regions. Also, since gaps in unaligned regions exist purely for display purposes (if one unaligned sequence is longer than another, gaps need to be added to the shorter sequence so that they can be displayed correctly), unaligned gaps are represented by ~ instead of the traditional -.
|
|
|
Pairwise-to-multiple: the Intersect By Master algorithm
|
|
|
While the input alignments are all pairwise, the display shows what seems to be a multiple alignment. The display is created by combining and truncating the pairwise alignments, so that the residues of the master sequence which are shown as aligned in the final display are those residues which are aligned with all of the other sequences shown. The residues shown as aligned in the other rows are the residues aligned with those parts of the master sequence.
To see the intersect by master algorithm at work, hide one or more of the rows in the display (you are not allowed to hide the master row). In most cases, the aligned regions will get bigger and may even merge. This is because the pairwise alignment represented by the row that was hidden did not include some residues of the master sequence that were contained in the other alignments in the set, so those residues were designated "unaligned" by the intersect by master algorithm. When a row is hidden, the alignment that contains that row is removed from the set and the intersect by master algorithm is re-run, so those residues of the master sequence that are contained in all the alignments in the new set are now designated "aligned".
Note that sometimes the intersect by master algorithm will find that there are no residues of the master sequence which are aligned to all the other sequences; in these cases a null alignment is generated and only unaligned regions are shown.
Example: Alignment 1 (black is unaligned, blue is aligned) seq1 seq2 Alignment 2 (black is unaligned, yellow is aligned) seq1 seq3 Intersect by master output for these alignments (green is aligned) seq1 seq2 seq3
|
|
|
Importing a sequence
|
|
|
You can add a sequence to the sequence displayed in UDV or to the alignment displayed by DDV. You can choose to use either gapped or ungapped statistics to calculate the alignment between your sequence and the master sequence. The results from either gapped or ungapped BLAST are processed to give a new discontinuous alignment between your imported sequence and the master sequence.
- If you choose ungapped BLAST, the BLAST algorithm will produce several separate alignments, sometimes overlapping. These alignments are sorted by a greedy algorithm to find the highest-scoring consistent set of alignments which cover the sequences. Because ungapped BLAST often produces large aligments which overlap by only a few residues, some alignments may be trimmed by a few residues so that they do not overlap with neighboring alignments and cause those alignments to be thrown away.
- If you choose gapped BLAST, the algorithm again produces several separate and sometimes conflicting alignments. These alignments are processed by a greedy algorithm to generate the longest and highest-scoring set of alignments. Cn3D does not currently allow aligned gaps, so the gaps are removed from the alignments generated by gapped BLAST. The process that removes the gaps generates a set of smaller ungapped alignments, separated by small unaligned regions.
After a new pairwise alignment is generated by gapped or ungapped BLAST, this alignment is added to the current set of alignments and the intersect by master algorithm is applied. The new alignment can therefore cause aligned regions of the original alignment to shrink, split, or disappear altogether.
|
|
|
Definitions
|
|
|
- The master sequence is the sequence which is present in all of the pairwise alignments. Every sequence is aligned to the master sequence, so it defines the relationship among the sequences in the set. This is why the master sequence cannot be hidden; the other sequences are not actually aligned to each other, only to the master, and without the master sequence there is no alignment.
- DDV, the alignment viewer in Cn3D, has the ability to show unaligned regions. These are regions which are not included in the alignment, and are therefore only defined implicitly. To display these regions without making them appear to be aligned (since they have to be displayed in rows and columns) we allow the user to choose the display style.
- Sometimes, after importing a sequence or after the intersect by master algorithm is applied to an alignment, there is no alignment left. This is designated a null alignment. In order to have something in the alignment display, we represent null alignments as one unaligned region containing the full length of all the sequences in the set. All the residues will be lowercase, and all the gaps (which are only there to fill out the display anyway) are ~. To explore the relationships between the sequences in the set, hide rows until aligned regions start to emerge.
|
|