next last contents

Introduction

Gap4 Bonfield,J.K., Smith,K.F. and Staden,R. A new DNA sequence assembly program. Nucleic Acids Res. 24, 4992-4999 (1995) is a Genome Assembly Program. (In the near future gap4 will be renamed gap, when the old gap program will be renamed gap3!) It is being used for both large and small projects, but the majority are of cosmid size. By default the program creates a database of sufficient size to store approximately 8000 readings but this can be extended. The program contains all the tools that would be expected from an assembly program plus many unique features and a very easily used interface.

Gap can handle data produced by a variety of sequencing instruments including ABI 373A, ABI 377, Pharmacia A.L.F. and LiCor. It can also handle data entered using digitizers or that has been typed in by hand. Usually the trace files which are in proprietary format, such as those of ABI, are converted to SCF files (see section SCF introduction). By analysing the traces we also calculate base accuracy values Bonfield,J.K. and Staden,R. The application of numerical estimates of base calling accuracy to DNA sequencing projects. Nucleic Acids Research 23, 1406-1410 (1995). which are then stored in the SCF files. All the preassembly steps including those already mentioned, plus quality clipping, sequencing vector and cloning (cosmid) vector removal, are controlled by the script PREGAP (see section Pregap introduction). During this processing the readings are stored in Experiment files (see section Experiment files).

Experiment file format is similar to that of EMBL sequence entries in that each record starts with a two letter identifier, but we have invented new records specific to sequencing experiments. One of PREGAP's tasks is to augment the experiment files to include data about the vectors, primers and templates used in the production of each reading, and if necessary it can extract this information from external databases. Some of the information is needed by PREGAP and some by gap.

NOTE that in order to get the most from gap it is essential to make sure that it is supplied, via the experiment files, with all the information it needs. (We are aware of programs from other groups that perform similar tasks to PREGAP but which create incomplete or incorrect experiment files and which hence lessen the usefulness of gap and so can increase the time taken to complete projects).

Gap inputs reading data stored in experiment files and stores them in its own database. The only other files required during a project are trace files from sequencing instruments, but these are not copied into the database. The experiment file for a reading should contain the name of the trace file from which it was derived and this name is copied into the database so that gap4 can read the trace whenever it is required.

The final result from a sequencing project is a consensus sequence and gap4 can write these in experiment file format, fasta format or staden format. Of course the whole database and all the trace files are also useful for future reference as they allow any queries about the accuracy of the sequence to be answered quickly.

The main window for gap contains File, Edit, View, Options, Experiments, Lists and Assembly menus. The File menu includes database opening and copying functions and consensus calculation options. The Edit menu contains options that alter the contents of the database such as Edit Contig (see section Editor introduction), Join Contigs (see section Editor joining), Break Contig (see section Break Contig), Disassemble Readings (see section Break Contig), Double Strand (see section Double Strand), and Doctor Database (see section Doctor database).

The View menu contains Contig Selector (see section Contig Selector), ResultsManager (see section Results Manager), Find Internal Joins (see section Find Internal Joins), Find read Pairs (see section Find Read Pairs), Find repeats (see section Find repeats), Check Assembly (see section Check Assembly), Find Oligos (see section Find Oligos), Show Templates (see section Template Display), Show Relationships (see section Show Relationships), Restriction Enzyme map (see section Restriction Enzyme Search) and Stop Codon Map (see section Stop Codon Map).

The Options menu contains Configure Cutoffs (see section Configure Cutoffs) and Select tags (see section Configure Cutoffs).

The Experiment menu contains options to analyse the contigs and to suggest experimental solutions to problems including Suggest Long Readings (see section Suggest Long Readings), Suggest Primers (see section Suggest Primers), Compressions and Stops (see section Compressions and Stops) and Suggest Probes (see section Suggest Probes).

The Lists menu contains a set of options for creating and editing lists for use in other parts of the program (see section Lists Introduction), including Minimal Coverage (see section Lists Minimum Coverage), and Unattached Readings (see section Lists Unattached Readings).

The Assembly menu contains various assembly modes including Normal Shotgun Assembly (see section Normal Shotgun Assembly), Directed Assembly (see section Directed Assembly), Screen Only (see section Assembly Screen Only), Enter Pre-assembled data (see section Assemble Pre), and Assembly Independently (see section Assembly Independently).

The main window (shown below) contains an Output window for textual results and an Error window for error messages.

[picture]
(Click for full size image)

Other displays used by the program include (shown below) the Contig Selector,

[picture]

(shown below) the Contig Comparator,

[picture]
(Click for full size image)

(shown below) the Template Display,

[picture]
(Click for full size image)

(shown below) the Restriction Enzyme Map,

[picture]
(Click for full size image)

(shown below) the Stop Codon map,

[picture]
(Click for full size image)

(shown below) the Contig Editor

[picture]
(Click for full size image)

and (shown below) the Contig Joining Editor.

[picture]
(Click for full size image)

Only one copy of the Contig Selector and Contig Comparator can be shown, but any number of the other types of displays can be used simultaneously, even on the same contig. For example it is possible to have several contig editors running on the same contig.


next last contents
This page is maintained by James Bonfield. Last generated on 29 April 1996.
URL: http://www.mrc-lmb.cam.ac.uk/pubseq/manual/gap4_1.html