first previous next last contents

Dialogue

[picture]

Several alternative types of search can be selected and each compares the "probe", which is of size "Probe length", in both senses against "all contigs" or a subset of contigs whose names are stored in a file or a list. If "file" or "list" is selected the browse button is activated and gives access to file or list browsers.

"Ends against all" compares segments from the ends of each contig against the full length of all other contigs. "All against all" compares the full length of each contig in nonoverlapping segments against all other contigs. "Ends against ends" compares only the ends of each contig. "Probe with single segment" compares a single segment from a single contig against the full length of all other contigs. In this case the "Contig identifier", "Start position" and "End position" dialogues are activated.

All the searches have an first phase where they find an exact match of size "Minimum initial match", then an alignment phase when gaps (pads) are inserted. If the alignment is within the "Maximum pads per sequence" and "Maximum percent mismatch" criteria it is reported as a match.

The default is to "Use hidden data" which means that where possible the contigs are extended using the poor quality data from the readings near their ends. To ensure that this additional data is not so poor that matches will be missed, the program uses the following algorithm. It slides a window of size "Window size for good data scan" along the hidden data for each reading and stops if it finds a window that contains more than "Max dashes in scan window" non-ACGT characters. The data that extends the contig the furthest is added to its consensus sequence. If the user toggles off the use of hidden data the "Window size for good data scan" and "Max number of dashes in scan window" dialogues will be greyed out.

If users elect not to "Use standard consensus" they can either "Mark active tags" or "Mask active tags", in which cases the "Select tags" button will be activated. Clicking on this button will bring up a check box dialogue to enable the user to select the tags types they wish to activate. Masking the active tags means that all segments covered by tags that are "active" will not be used in the first phase of the matching algorithm, but will be used in the second phase. That is matches will not be initiated within these segments but if they extend into them the alignment will be performed in the normal way. A typical use of this mode is to avoid finding matches in segments covered by tags of type ALUS (ie segments thought to be Alu sequence) or REPT (ie segment that are known to be repeated elsewhere in the data (see section Tag types). "Marking" is of less use: matches will be found in marked segments during the first phase of searching, but in the alignment shown in the Output Window, marked segments will be shown in lower case.

The "Ends against ends" algorithm is fastest and will find less spurious matches, but is least thorough. "Ends against all" is more thorough but can, in rare cases, still miss certain poor matches, so, in desperation, "All against all" can be applied. In highly repetitive sequences the masking options are very valuable.


first previous next last contents
This page is maintained by James Bonfield. Last generated on 29 April 1996.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/gap4_83.html