The consensus calculation employs only the used data (ie. non hidden data) aligned at each point in a contig and depends on the individual readings, the consensus cutoff value, and the quality cutoff value. The consensus character for any position is derived by calculating a weighted sum for each base type and dividing by the sum of sums. If the resulting value for any particular base type exceeds the consensus cutoff then that base appears in the consensus, otherwise a - is assigned. In the event that more than 1 base type exceeds the cutoff the bases are assigned in the descending order of precedence: A, C, G, T. If the user has set the quality cutoff to -1 then the weights assigned to each character are as shown in the table below. If the user has set the quality cutoff Q > -1 then all characters with accuracy estimates below Q will be given a weight of zero and the rest will have weight equal to their accuracy estimate. This is Rule II of Bonfield,J.K. and Staden,R. The application of numerical estimates of base calling accuracy to DNA sequencing projects. Nucleic Acids Research 23, 1406-1410 (1995).
Character Base type weight
-----------------------------------------
C C 1.00
T T 1.00
A A 1.00
G G 1.00
* * averaged*
- none 0.10
1 C 0.75
2 T 0.75
3 A 0.75
4 G 0.75
D C 1.00
V T 1.00
B A 1.00
H G 1.00
K C 1.00
L T 1.00
M A 1.00
N G 1.00
R none 0.10
Y none 0.10
5 none 0.10
6 none 0.10
7 none 0.10
8 none 0.10
* The quality value for a pad is computed as the average of the quality values of bases either side (except then these are also pads, where upon the nearest non pad is used).
Both the consensus cutoff and quality cutoff values can be set by using
the "Configure cutoffs" command (in the Options menu) of Gap. Within
the Contig Editor (see section Editing in gap4) these
values can be adjusted by clicking on the "<" and ">" symbols adjacent
to the "C:" (consensus cutoff) and "Q:" (quality cutoff) displays in the
top left corner of the editor. These buttons are repeating buttons - the
values will adjust for as long as the left mouse button is held down.
Changing these values lasts only as long as that invocation of the
contig editor.
The quality and the consensus calculations are really one and the same. The difference is simply that the output produced by the quality calculation is a measure of how reliable the consensus produced is. This quality is used as the basis for problem searches, such as find next problem, and the quality display within the Template Display (see section Template Display).
One mode of operation for both the quality and consensus calculations is deriving information for each strand separately. This method is useful for comparing one strand with another to check that the sequence has been reliably determined (consistently) on both strands. Setting the configuration (using the main "Configure Cutoffs" command) to treat sequences flagged using the "Special Chemistry" Experiment File line (CH field, bit 0) affects this calculation. When set, the sequence is used in the consensus and quality calculations for both strands, effectively pretending that there are two sequences - one of each strand.