When processing a batch of trace files it is likely that not all of the readings will share all the same Experiment File fields. As described above Pregap supports using an external database for storing such information. In the following sections we outline methods for providing a very simple text based database.
The simplest of these cases is where all the readings to be processed
with Pregap are from the same project (let's call this project
TEST), sequenced using the same vectors, and all have the same
expected vector insert size. The only differing information for each
reading is it's primer type (whether a forward or reverse reading, and
whether using the universal primer or from a custom oligo), the reading
name, the template name, and whether the template is double stranded.
This information corresponds to the ID, ST, PR and
TN Experiment File line types. As the ID field is read
from the initial Experiment File created we only need to provide
database hooks for the other line types. The other necessary line types
can be defined as constants.
We add the following to our `.pregaprc' file.
#----------------------------------------------------------------------------- # Our database hooks # Constant things CN=TEST SF=m13mp18.vec SV=m13mp18 SP=41 SC=6249 SI=1400..2000 CF=lawrist7.seq CV=lawrist7 # Read from database file ST_com='lookup db-short $ID 1' PR_com='lookup db-short $ID 2' TN_com='lookup db-short $ID 3' # Constants evaluated as commands are pregap startup OP=`whoami` DT=`date`
The constant line types are defined simply using, for example,
CN=TEST (to define the gap database name). Following these are
the _com commands. These use a small program (supplied with the
package distribution) named lookup that takes two arguments. The
program searches a plain text file finding lines with the first word
matching the first argument ($ID in this case, which is the
reading name). It then prints the nth subsequent word on the line.
where n is the second argument to lookup. This means we can
store our primer, strand and template information in a simple text file,
named `db-short' in our example, as follows.
#ID ST PR TN a11bc.f1 2 1 a11bc a11bc.r1 2 2 a11bc a11bc.f2 2 3 a11bc a11bc.r2 2 4 a11bc a22bc.s1 1 1 a22bc
Note that the PR field now can hold both the old PR and
DR Experiment File line types.See section Experiment File.
Finally, some line types (chiefly DT for date and OP for
operator) are not known at the time of creating a `.pregaprc' file,
but are still constant for all readings. These have been defined last in
the `.pregaprc' file. Note the use of backquotes instead of forward
quotes. For example, DT=`date` sets the DT field for all
readings to be the output of the date command.