Comparative Genomic Sequencing via Exon-Specific Custom Synthetic Primer Based PCR
Bruce Roe, University of Oklahoma, 12-30-99
Hypothesis: Now that the entire sequence of a human chromosome,
chromosome 22, is known, it would be possible to sequence the syntenic regions
of other species using relatively inexpensive exon-specific primers in a single
or multiplex PCR-based sequencing format rather than a shotgun-based mapped BAC approach.
- The coding regions of higher eucaryotes are conserved while their intronic and
intergenic regions usually are not conserved.
- A similar primer-based genomic sequencing approach has been used successfully
to sequence mitochondrial genomes has been developed (M. Tanaka and T. Ozawa,
in "Protocols in Molecular Neurobiology" (A. Langstaff and P. Revest, eds.),
p. 25 Humana Press, Totawa, Newjersey, 1992;
M. Tanaka, M. Hayakawa and T. Ozawa, Automated Sequencing of Mitochondrial
DNA in Methods in Enzymology, vol. 264 (Academic Press, NY) pp. 407-421
For Initial primer picking and the first round of PCRs.
Experiment: Once the sequence of a region of genomic DNA is known, such is the case for human
chromosome 22, a custom synthetic primer PCR-based approach might be developed for rapidly sequencing
syntenic regions of other eucaryotic genomes.
Other Applications of this PCR-based approach to comparative sequencing:
- This idea is based on the observation that exons typically are
conserved between evolutionary distant species such as humans and mice,
even though their intronic sequences are not. The sequence of human
chromosome 22 reveals that the average genomic size for a gene is 19.2 Kbp
(median 3.7 Kbp) with a mean exon number of 5.3 (median 3.0).
Therefore the average size of an intron is calculated to be 5.2 Kbp (median 1.2 Kbp),
a region easily PCR'd using existing methods. Thus, once the exons
in one genome (human for example) are predicted, a set of exon-specific
custom synthetic primers could be synthesized using this known genomic sequence.
These primers could be used first to PCR a portion of adjacent exons and their
intervening intronic or intragenic sequences using another, related genome
(mouse for example) as the template and then as specific sequencing primers
off the PCR'd related genomic DNA.
For this to be successful the exons must be predicted with reasonable assurance of being correct and then the
exon-specific primers produced for the PCR off the target genomic DNA followed by primer-based DNA
synthesis off the corresponding PCR product.
This approach can be tested by PCR using human-exon specific primers with a region of known mouse genomic
DNA sequence to determine the efficiency and if any additional rules must be developed for improved primer
picking. Target exons can be predicted by standard in silico approaches such as Blast searches of GenBank nr
and EST databases, overlaid with results from FGenesh, XGrail, GeneScan, and/or other ab initio methods.
Primers will be picked using a modified version of PrimOU and synthesized on the MerMade in 96 well format at
a cost of less than $1 per 20-mer. Once the efficiency of this approach is determined using the known, completed
human and mouse sequence-based comparison, then other regions of the mouse genome could be sequenced
using human specific primers to provide additional information about the successful rate of both the genomic PCR
and subsequent PCR-based sequencing using the human specific primers for mouse genomic DNA PCR-
produced templates. All primers and PCR products will be produced and archived in 96 or 384 well bar-coded
microtiter plates for bookkeeping and easy retrieval and stored at -70 deg C for later use if needed.
Individual PCR'd regions that are not closed using the custom synthetic PCR primers for sequencing will be
closed by additional rounds of custom synthetic primer synthesis using primers picked from the newly generated sequences and subsequent sequencing off the already
produced PCR products. Any regions that are not spanned by PCR products, such as regions with larger than 10
Kbp introns or intergenic regions, regions representing multiple gene copies, either true genes or pseudogenes, or
regions with large, low copy number repeated sequences, will require more classical BAC-based sequencing
choosing BACs whose end sequence fall in the PCR-based sequenced regions. With ~800 genes containing
~4,240 exons representing ~40 % of human chromosome 22, approximately 8,480 primers initially will be
required to produce the exon-specific PCR products and an additional ~50,000 primers will be needed to complete
the sequence of these PCR products. The ~58,480 primers, with an average read length of 500 bases will give
~29.2 Mbp of double stranded sequence covering approximately 90% of the entire 32.5 Mb region of the mouse
genome syntenic to human chromosome 22. Therefore, the number of shotgun sequence BACs needed will
represent ~10 % of the target syntenic regions based on the above calculations taking into account the known,
predicted exons on human chromosome 22. If the PCR-based sequencing is only 70% efficient, then the total
number of additional BAC-based shotgun sequences needed will represent 30 % of the total.
- The cost of this Exon PCR-based approach to sequence regions of
other mammalian genomes that are syntenic to regions of the human genome
is ~one-tenth that of a mapped BAC shotgun approach if the cost of custom
synthetic primers is less than $1 per 20-mer.
If successful for human-mouse comparative sequencing, this approach will eliminate the need for highly
expensive and labor intensive BAC target clone mapping, target clone isolation as well as shotgun library
production, isolation and sequencing, except for those regions (between 10 and 30% of the total syntenic regions)
which have very large introns (>10 kb) that may be difficult to PCR and/or regions of large repeats which are not
comparable between evolutionary distant species.
- If individuals already have the clones for a region of high biological interest mapped in a strain of
mouse other than B6, that region could be sequenced from the non-B6 strain and then using this
PCR-based approach to do the sequence from the B6 strain. Since most mouse strains are fairly
identical with only ~1/500 SNPs, primers could be made throught the region (from introns as well
as exons) instead of using the exon-specific primer-based approach proposed when the genomic sequences
have diverged greatly.
Similarly, this PCR primer-based approach also could be used for other comparative genomic sequence studies,
such as comparative primate, feline, bovine, etc followed by primer-based sequencing.
For Additional rounds of PCRs and sequencing for closure and finishing each
PCR product's sequence.
- Obtain the exact position of the left and right end of each exon
in both the 'real' and 'pseudogenes on human chromosome 22.
- Modify PrimOU to pick suitable primers for the exon-specific genomic
PCRs so that the forward and reverse primer pairs from each exon will
produce overlapping sequences to aid in the later, adjacent contiguous
PCR product set assemblies (see below).
- When producing the custom synthetic primers on the MerMade, produce
two sets of 96 well microtiter plates, where one plate represents the forward
primers and the other plate in the pair represents the corresponding reverse primers.
Have the plate of forward primers be for F1 through F96 while the plate of reverse
primer plate should contain primers R2 through R97. This will ensure that the
correct primer pairs are matched for pipetting the PCRs and subsequent sequencing
reactions on the Hydra.
- Do the sequence assembly separately for the sequences obtained for
each exon-PCR primer pair.
- Develop a database to keep track of successful PCRs, successful sequencing
reactions, multiple PCR products from the same primer set, etc.
- Once the first round of PCRs produces products, distribute the PCR products
in sets of 96 along with the two plates of respective primers to sequencing
groups for closure and finishing. Each closer-finisher should have at least
10 contiguous PCR product sets. Different closer-finishers will close and
finish the same two identical PCR product sets (i.e. overlapping PCR products)
on each end of their adjacent contiguous PCR product sets. This will give an
extra level of redundancy at each end of the contiguous, multiple PCR product
region to aid in the final sequence assembly of large contiguous regions of
the genome (see below).
- Modify the PrimOU program to pick walking primers for the second and any
subsequent rounds of sequencing by primer walking to close any gaps remaining
in the sequence of each PCR product.
- Develop a computer program for the Biomek that will re-distribute the PCR
products to correspond to the correct wells for Hydra pipetting of the second
and/or subsequent rounds of PCRs and sequencing.
- Completely sequence each PCR product to a Consed level of fewer than 1
uncertain base/10Kb with automatic GenBank and local web-site submission once
contigs greater than 2 Kbp are obtained.
For Assembly of large regions built from the completed sequence of multiple,
overlapping PCR products anchored in portions of the same exons.
- Once a PCR product has been completely sequenced to a Consed level of fewer
than 1 uncertain base/10Kb, a concensus sequence can be converted to a xxx.phd
file (i.e. 'fake' phd file) and used to assemble a large region consisting of
several individual PCR-based concensus sequences.
- Determine the bar-coding processes used at other centers and implement it.
- In the cases of multiple PCR products being obtained from an individual
primer pair, clone the PCR products and sequence representative subcloned
PCR products separately.
- An early experiment should be to use primers F1 and R1, F2 and R2, etc
to PCR and sequence a test set of exons.
- As a control set, use both human and comparative target genomic DNA
initially to obtain both the human and comparative target Exon-Intron-Exon
PCR product and sequence both using the individual respective PCR primers.
Bruce Roe, firstname.lastname@example.org