Longer Read Lengths on the Roche-454 GS20

University of Oklahoma Advanced Center for Genome Technology

 Steve Kenton, Doug White, Graham Wiley, Simone MacMil and Bruce A. Roe

Version 3, Revised August 12, 2006

 

A. Background:

 

The Roche-454 GS20 sequencer requires three modifications to obtain read lengths longer than ~100 bases at 42 flows.

 

Changing the number of runs desired by modifying the appropriate control script file,

Chilling the apyrase (definitely needed) and other reagents (recommended), and

Providing sufficient memory to run the 454 assembly software for large projects requiring multiple 454 runs.

 

B. Modifying the control script (xxx.icl) files:

 

The 454 GS20 software and related files is located in:

/usr/local/rig/*

 

The run scripts are in:

            /usr/local/rig/runScripts/*

 

There is a directory for each plate geometry in:

/usr/local/rig/runScripts/25x75/*

/usr/local/rig/runScripts/40x75/*

/usr/local/rig/runScripts/70x75/*

 

There is a subdirectory called TACG in each one.

/usr/local/rig/runScripts/25x75/TACG/*

/usr/local/rig/runScripts/40x75/TACG/*

/usr/local/rig/runScripts/70x75/TACG/*

 

Their control scripts in each of these directories end with .icl

For example:   /usr/local/rig/runScripts/40x75/TACG/42xTACG_40X75.icl

 

What we did was to copy the 42xTACG file and rename it to 63x, 84x 105x and 126x

creating one file for each new flow count, with the obvious names:

 

/usr/local/rig/runScripts/40x75/TACG/42xTACG_40X75.icl

/usr/local/rig/runScripts/40x75/TACG/63xTACG_40X75.icl

/usr/local/rig/runScripts/40x75/TACG/84xTACG_40X75.icl

/usr/local/rig/runScripts/40x75/TACG/105xTACG_40X75.icl

/usr/local/rig/runScripts/40x75/TACG/126xTACG_40X75.icl

and

/usr/local/rig/runScripts/70x75/TACG/42x_TACG_70x75.icl /usr/local/rig/runScripts/70x75/TACG/63x_TACG_70x75.icl

/usr/local/rig/runScripts/70x75/TACG/84x_TACG_70x75.icl

/usr/local/rig/runScripts/70x75/TACG/105x_TACG_70x75.icl

/usr/local/rig/runScripts/70x75/TACG/126x_TACG_70x75.icl

 

In each of these files we then changed the single DEFINE line from 42 to 63 or

84 or 105 or 126.  Notice these all are increments of half of 42.  Don't ask why.

 

Now, when we click on the GS20 "RUN" icon and start a new run, a graphical view of this directory structure is available so that the operator can choose the desired run conditions files from the revised list available on the pull down menus.

 

C. Chilling the apyrase and other reagents:

 

Increasing the number of flows also requires cooling the apyrase since it is reasonably unstable for extended periods at room temperature.  We initially tried external chilling by placing the apyrase bottle in a styrofoam ice chest and extending the flow lines back into the 454.  Recently however, we have devised a copper cooling system where we run water externally chilled water flows through copper tubing surrounding the reagent bottles in the 454 chamber.  A second modification diverts the waste to an external collection bottle to prevent it from adding heat to the reagent chamber during the run. 

 

PowerPoint describing of our present external cooling system.

 

D. Run conditions:

454 Long Read Caution:

At present we are limiting our long read runs to only 63 flows (slightly greater than 150 bases), because the accuracy of the base calls drops dramatically as the number of flows is increased to 84 and beyond.  These regions with lower quality at longer read lengths causes difficulty in obtaining an accurate assembly with both Newbler and subsequently with phrap.

 

63 flows with either a half plate or a full plate:

For these 63 flows with a half plate we are using the 70x75 (full-plate) GS 20 Sequencing Kit, and for 63 flows with a full plate we are pooling both a 70x75 (full-plate) GS 20 Sequencing Kit with a 40x75 (half-plate) GS 20 Sequencing Kit.

 

Adjusting the 454 phred quality scores:

Since, the phred values calculated by the 454 Newbler Assembler are approximately 4 times higher than we think that they should be, we have written a script that changes the reported phred quality scores from a maximum of 64 to a maximum of 16 by dividing all 454 phred quality score values by 4.  This script is incorporated into a tool called "replace 454_data", written by Jim White in our informatics group, that when run using the "-qd4" ("quality divide by 4") option on a Sun workstation, deletes any old 454 data present in the target project directory, employs "get_contig_ends" and "fastaq2phd", derived from the NCBI's fasta2phd program, to create the phd files and matching "fake" chromatograph files for the phrap assembly in the target chromat_dir and phd_dir, respectively.  "Replace 454_data" and the associated fastaq2phd are available on our informatics ftp site at http://www.genome.ou.edu/informatics.html

High stringency phrap assembly:

Once this transfer to the Sun workstations is complete, the assembly and quality value files are combined with plasmid-based ABI 3730 end-paired shotgun reads using phrap to produce the combined assembly. As is our custom, we continue to use only a high stringency phrap assembly where the default values for min_match and min_score have been increased to 30 and 55, respectively.

PowerPoint showing the accuracy of the control beads at flows representing 50, 100, 150 and 200 bases.

Before attempting flows greater than 63, as described below, please read the 454 Long Read Caution above.

84 flows with a half plate:

With a half plate, we use a full-plate reagent kit and add 15 ml of ddwater to all bottles except for the Tween20, bleach and washAB bottles since these reagents only are used during the final clean-up and the volumes are independent of the number of flows.  We then use the 84xTACG_40X75.icl runScript to obtain high quality reads out to slightly over 200 bases.  One measure of the accuracy of a sequence run is to investigate the accuracy of the control beads using the runPhoenix program supplied by 454.  Below are screen dumps of typical results for the control beads at various base read lengths using this protocol.

 

84 flows with a full plate:

With a full plate, we pool two full-plate reagent kits with no additionally added ddwater as the double volumes are sufficient for a double length run.  We then use the 84xTACG_70X75.icl runScript to obtain high quality reads out to slightly over 200 bases.

 

126 flows with a half plate:

With a half plate, we pool one full-plate and one half-plate reagent kits with no additionally added ddwater as the double volumes are sufficient for a double length run.  We then use the 126xTACG_40X75.icl runScript to obtain high quality reads out to slightly approaching 280+ bases.

 

E. Memory requirements for the 454 assembler:

 

At present the 454 software requires that all reads be the same length and if you have short and long reads together, the 454 software pads the short reads so that in memory, all reads are the same length.   Thus, the amount of memory needed for the 454 assembly increases greatly (n2) with the longer and padded reads. 

 

F. Ongoing studies:

 

Because of physical space constraints, increasing the number of flows past 84 with a full plate is not yet reasonable with the reagents contained in the internal reagent chamber.  We are investigating additional changes that may allow this that include modifying the reagent holder and/or removing the reagents from the internal chamber and placing them outside the instrument with external cooling.

 

We also have investigating diluting the reagents but have not yet been sufficiently pleased with this and at present have no recourse but to use the full strength reagents.

 

 


Home Page

Please direct any questions to Bruce Roe.

Bruce Roe, broe@ou.edu