Longer Read Lengths on the Roche-454 GS20
University of
Oklahoma Advanced Center for Genome Technology
Steve Kenton, Doug White, Graham Wiley,
Simone MacMil and Bruce A. Roe
Version 3, Revised August 12,
2006
A. Background:
The Roche-454 GS20 sequencer requires three modifications to
obtain read lengths longer than ~100 bases at 42 flows.
Changing the
number of runs desired by modifying the appropriate control script file,
Chilling the
apyrase (definitely needed) and other reagents (recommended), and
Providing sufficient
memory to run the 454 assembly software for large projects requiring multiple
454 runs.
B. Modifying the control script (xxx.icl) files:
The 454 GS20 software and related files is located in:
/usr/local/rig/*
The run scripts are in:
/usr/local/rig/runScripts/*
There is a directory for each plate geometry in:
/usr/local/rig/runScripts/25x75/*
/usr/local/rig/runScripts/40x75/*
/usr/local/rig/runScripts/70x75/*
There is a subdirectory called TACG in each one.
/usr/local/rig/runScripts/25x75/TACG/*
/usr/local/rig/runScripts/40x75/TACG/*
/usr/local/rig/runScripts/70x75/TACG/*
Their control scripts in each of these directories end with
.icl
For example:
/usr/local/rig/runScripts/40x75/TACG/42xTACG_40X75.icl
What we did was to copy the 42xTACG file and rename it to
63x, 84x 105x and 126x
creating one file for each new flow count, with the
obvious names:
/usr/local/rig/runScripts/40x75/TACG/42xTACG_40X75.icl
/usr/local/rig/runScripts/40x75/TACG/63xTACG_40X75.icl
/usr/local/rig/runScripts/40x75/TACG/84xTACG_40X75.icl
/usr/local/rig/runScripts/40x75/TACG/105xTACG_40X75.icl
/usr/local/rig/runScripts/40x75/TACG/126xTACG_40X75.icl
and
/usr/local/rig/runScripts/70x75/TACG/42x_TACG_70x75.icl
/usr/local/rig/runScripts/70x75/TACG/63x_TACG_70x75.icl
/usr/local/rig/runScripts/70x75/TACG/84x_TACG_70x75.icl
/usr/local/rig/runScripts/70x75/TACG/105x_TACG_70x75.icl
/usr/local/rig/runScripts/70x75/TACG/126x_TACG_70x75.icl
In each of these files we then changed the single DEFINE
line from 42 to 63 or
84 or 105 or 126.
Notice these all are increments of half of 42. Don't ask why.
Now, when we click on the GS20 "RUN" icon and
start a new run, a graphical view of this directory structure is available so
that the operator can choose the desired run conditions files from the revised
list available on the pull down menus.
C. Chilling the apyrase and other reagents:
Increasing the number of flows also requires cooling the apyrase since it is reasonably unstable for extended
periods at room temperature. We
initially tried external chilling by placing the apyrase bottle in a styrofoam
ice chest and extending the flow lines back into the 454. Recently however, we have devised a
copper cooling system where we run water externally chilled water flows through
copper tubing surrounding the reagent bottles in the 454 chamber. A second modification diverts the waste
to an external collection bottle to prevent it from adding heat to the reagent
chamber during the run.
PowerPoint describing of our present external cooling system.
D. Run conditions:
454 Long Read Caution:
At present we are limiting our long read runs to only 63 flows (slightly greater than 150 bases), because the accuracy of the base calls drops dramatically as the number of flows is increased to 84 and beyond. These regions with lower quality at longer read lengths causes difficulty in obtaining an accurate assembly with both Newbler and subsequently with phrap.
63 flows with either a half plate or a full plate:
For these 63 flows with a half plate we are using the 70x75
(full-plate) GS 20 Sequencing Kit, and for 63 flows with a full plate we are
pooling both a 70x75 (full-plate) GS 20 Sequencing Kit with a 40x75
(half-plate) GS 20 Sequencing Kit.
Adjusting the 454 phred quality scores:
Since, the phred values calculated by the 454 Newbler Assembler are approximately 4 times higher than we think that they should be, we have written a script that changes the reported phred quality scores from a maximum of 64 to a maximum of 16 by dividing all 454 phred quality score values by 4. This script is incorporated into a tool called "replace 454_data", written by Jim White in our informatics group, that when run using the "-qd4" ("quality divide by 4") option on a Sun workstation, deletes any old 454 data present in the target project directory, employs "get_contig_ends" and "fastaq2phd", derived from the NCBI's fasta2phd program, to create the phd files and matching "fake" chromatograph files for the phrap assembly in the target chromat_dir and phd_dir, respectively. "Replace 454_data" and the associated fastaq2phd are available on our informatics ftp site at http://www.genome.ou.edu/informatics.html
High stringency phrap assembly:
Once this transfer to the Sun workstations is complete, the assembly and quality value files are combined with plasmid-based ABI 3730 end-paired shotgun reads using phrap to produce the combined assembly. As is our custom, we continue to use only a high stringency phrap assembly where the default values for min_match and min_score have been increased to 30 and 55, respectively.
PowerPoint
showing the accuracy of the control beads at flows representing 50,
100, 150 and 200 bases.
Before attempting flows greater than 63, as described below, please read the
454 Long Read Caution above.
84 flows with a half plate:
With a half plate, we use a full-plate reagent kit and add
15 ml of ddwater to all bottles except for the Tween20, bleach and washAB
bottles since these reagents only are used during the final clean-up and the
volumes are independent of the number of flows. We then use the 84xTACG_40X75.icl runScript to obtain high
quality reads out to slightly over 200 bases. One measure of the accuracy of a sequence run is to investigate
the accuracy of the control beads using the runPhoenix program supplied by
454. Below are screen dumps of
typical results for the control beads at various base read lengths using this
protocol.
84 flows with a full plate:
With a full plate, we pool two full-plate reagent kits with
no additionally added ddwater as the double volumes are sufficient for a double
length run. We then use the 84xTACG_70X75.icl
runScript to obtain high quality reads out to slightly over 200 bases.
126 flows with a half plate:
With a half plate, we pool one full-plate and one half-plate
reagent kits with no additionally added ddwater as the double volumes are
sufficient for a double length run.
We then use the 126xTACG_40X75.icl runScript to obtain high quality
reads out to slightly approaching 280+ bases.
E. Memory requirements for the 454 assembler:
At present the 454 software requires that all reads be the
same length and if you have short and long reads together, the 454 software
pads the short reads so that in memory, all reads are the same length. Thus, the amount of memory needed
for the 454 assembly increases greatly (n2) with the longer and
padded reads.
F. Ongoing studies:
Because of physical space constraints, increasing the number
of flows past 84 with a full plate is not yet reasonable with the reagents
contained in the internal reagent chamber. We are investigating additional changes that may allow this
that include modifying the reagent holder and/or removing the reagents from the
internal chamber and placing them outside the instrument with external cooling.
We also have investigating diluting the reagents but have
not yet been sufficiently pleased with this and at present have no recourse but
to use the full strength reagents.
Please direct any questions to Bruce Roe.
Bruce Roe, broe@ou.edu