This is to be used in conjunction with the User's Manual, which describes all FPC features.
Contents:
When you start using FPC for a project, you must
determine the best tolerance and cutoff
for your data.
TOLERANCE: You can have fixed or variable tolerance.
I have seen the following cases:
CUTOFF: You can use Equation 1 or Equation 2 (see CABIOS Vol 13 No. 5).
Both take into account the tolerance, number of matching bands, and
number of bands in each clone in order to calculate the probability that
the matching bands are a coincidence. The lower the number (i.e. the
higher the exponent), the more stringent the probability for calling
two clone overlapping.
A frequent source of confusion is the fact that the
tolerance is used in the equation. For example,
using fixed tolerance and equation 1, making the tolerance more stringent
can cause the coincident score to become more stringent (Case 1) or
LESS stringent (Case 2).
It is important to determine the best tolerance and then not vary it.
Another source of confusion is that the number of bands are used in
the equation, so you can get the following:
The results from the simulation (see V.) are very elucidating on setting the cutoff.
The following are some additional suggestions.
1. View a set of clones using the fingerprint window and try different
tolerances to see the effect. The tolerance can be changed on the window,
and selecting a clone will highlight all bands in other clones that
are within the tolerance of a band in the highlighted clone.
(Thanks to Sam Cartinhour for this suggestion.)
2. On Main Analysis window, beside "--> FPC" enter a clone name
and select "--> FPC". You will get the following type output:
3. On Main Analysis, run Build Contig.
You will get the following style output:
If you decide you do not like them (e.g. too many or too few),
adjust the tolerance and/or cutoff,
enter the maximum size beside "Kill"
and execute Build Contig.
Continue to do this until you get the size contigs you expect.
The 'Build Contigs' function uses the CB routine which clusters clones based on good
overlap probabilities. All clones in a resulting contigs have a good overlap
with as least one other clone in the contig. It also burys clones (see manual)
and gives an approximate initial ordering. Since it is a fast approximation for
an NP-complete problem, the initial ordering can be wrong due to: 1) local minima,
2) clones with incorrect bands can screw up the ordering, and 3)
the error in the map increases with the number of clones (see Soderlund, Longden,
and Mott, CABIOS, Vol 13 No 5). Therefore,
it is important to verify and refine any contig before picking clones for
sequencing or publication, etc.
If you are daily adding clones to your database, the following scenario
is recommended: after running Update .cor to add all new clones to the database,
run Build Contigs which will kill all
contigs beneath a given size and rebuild contigs which will include the
new clones. When a contig is of 'acceptable' size, refine it. In order to
keep it from being 'killed' on the next Build Contigs, set the sequence
state of one or more clones in the contig to something other than NONE;
the Kill command will not kill any contig with a clone set for sequencing.
Once you start to refine a contig, then you have to semi-manually add
clones and merge other contigs with it. This is why it is best to put off
as long as possible refining a contig;
i.e. the Build Contigs will automatically add
clones and merge contigs since they are recomputed.
A common practice is to add a special standard remark to all important clones, then a keyset
of clones with the remark can be created, and the project window will list the
contigs with clones in the keyset by using the Keyset option. In this way, you can keep track
of important clones even though they may end up in different numbered contigs after
each Build Contigs.
Here again, the simulation results (see V.) are of interest. The following are some
additional suggestions.
To use Calc, lower the cutoff (e.g. if the contigs are build with 1e-08, use 1e-09 or
1e-10). Run Calc and it should create one or more CBmaps of well ordered clones and
ignore any clones which do not have a good overlap with any other clones based on the cutoff. If you get too many clones with
A's above them, or too many maps, change the cutoff and try again.
Alternatively, use Expert. For this, keep the cutoff the same or increase it (e.g. from 1e-08 to 1e-06). Run Expert. It tries to find the best set of clones and orders them.
You will generally get multiple CBmaps. If the maps are not satisfactory, alter the cutoff and execute Expert again.
Q. On the CB display, how many extra bands are too many?
Q. When do you decide that Expert or Calc gives a reasonable answer?
A good test when using Calc: try running Again a few times. The order will remain the
same so that you can tell if the clones jump around much (sort by Left makes it the easiest to
view). If the order if 'fairly' stable, select the solution with the least 'o's and extras.
The 'Eval Contig' shows any pairs of clones that have a good overlap score but are not overlapping.
I have run this test on many contigs that have been done by hand, and compared the the resulting number
of 'no overlaps' with that resulting from the CB solution, and they are generally about the same.
A more stringent test is the 'Bad Olaps', but trying to get a perfect score with this test may
not be worth the effort.
If you are using the Gel Image window or Fingerprint window: Set the display so
that the buried clones are not shown, select all the rest using the Edit Map window and
then have them loaded into the Gel or Fingerprint window. Verify or change ordering;
if the order is changed, use FpOrder or GelOrder to instantiate the changes.
The data set has no ambiguities except the end fragments. There are 4278 clones.
The bands have migration rate values within the range of [766,3289].
All the following runs used a tolerance of 7, which gives 1890 unique bands out of 80115 total with
an average bin size of 516 (e.g. there are 538 bands in the data set within 7 +/- 3228).
At a 1e-06 cutoff, clones with less than approximately 50% shared bands do not pass the overlap test.
Chromosomes get split into multiple contigs where there are weak overlaps.
The contigs with clones from multiple chromosomes have areas of complete chaos where
multiple contigs have mapped to the same space. The counts of out-of-order
is atrocious in these areas. Where chaotic behavior is evident, the CB algorithm is run on each chaotic contig as
described in the next paragraph. Though we were able to determine the chaotic contigs by the scoring
program (see below), they can be determined visually, e.g. 100 clones should not
be mapping to the same space.
The fourth entry of the table using 1e-08 Calc was created as follows:
The database from 1e-06 was loaded. For the 5 contigs with chaotic areas, the
Calc algorithm was run on these contigs individually using a 1e-08. The Ok was executed
to instantiate the new order, using the options: (1) merge CBmaps with end clones
having 1e-06 overlap or better, (2) overlap adjacent maps by 2, and (3) move disconnected CBmaps to new contigs.
The effect of this is to only allow the 1e-06 overlap on the ends which adds an
additional constraint, and hence removed all false negative joins.
The fifth entry is the same as the fourth except the Expert algorithm was used instead of the Calc algorithm.
The Expert routine tries to find a path through the clones and does not attempt to use all of them, un-used
clones are buried. The 'bad' clones are all buried clones, i.e. because they are pseudo buried, you
know they are approximately positioned -- so they are really not so bad.
Scoring The clones were named such that they have the chromosome number and ordering information in it
(e.g. ele4_256). Ken McDonald wrote a program to determine where multiple chromosome are in a contig and
to determine if two clones are out of order, e.g. if ele4_256 starts after ele4_257. Note that this
just means that their starting points are off by a few bands but they still overlap. The 'bad' determines
if they don't overlap at all; in all cases, the bad clones are within a few units of each other.
Note also that this does not indicate where clones overlap more or less than they should. For example,
viewing a contig of 100 clones in the 1e-07 FPC file, the Eval Contig for Bad Overlaps shows a number of
5; that is, 5 clones fail the following test: for 30% or more of the clones they overlap with,
the overlap in coordinates is at least 30% different from the number of bands shared.
15 Jan 98 - V3.2
13 Oct 97 - The User's Manual has been updated. It includes all FPC V3 features.
Oct 97 - V3
10 Sept 97 - V2.9
I. Setting the tolerance and cutoff
II. Creating contigs and incremental updates
III. Refining contigs
IV. Adding clones to exiting contigs
V. Simulation results
VI. Release Notes for the last 3 releases
I. Setting the tolerance and cutoff
By default, FPC uses fixed. Change to variable tolerance on the
main Configure window.
(i) Bands measured on the acrylimide gels using Image result in migration
rates, so a tolerance of around 7 is appropriate.
(ii) Bands measured on the acrylimide gels using the ABI software result in
bands sizes with a fixed tolerance, so a tolerance of 1 or 2 is appropriate
(i.e. in this case it refers to bases).
(iii) Bands measured on agarose gels use a variable tolerance, so using a
tolerance of 3 will give a 0.03% uncertainty.
Case 1:
Tol Clone 1 Clone 2 Matching Bands Cutoff
7 22 15 11 3e-09
5 22 15 11 3e-11
Case 2:
Tol Clone 1 Clone 2 Matching Bands Cutoff
7 22 36 14 3e-12
5 22 36 14 3e-09
Clone 1 Clone 2 Matching Bands Cutoff
52 38 12 3e-02
52 16 12 3e-06
HINTS on how to determine tolerance and cutoff
>> bk7d10 ctg0 22b --> Fpc (Tol 5, Cutoff 1e-08)
Ctg0 bK123A11 15b 11 9e-11
Ctg0 bK256D12 36b 14 2e-09
Ctg0 cB58D5 12b 9 4e-09
Ctg0 cB73B9 15b 10 3e-09
Ctg0 dJ76B20 40b 18 5e-14
If you have a 5-fold coverage, this number of matches would be appropriate.
If you get too many or two few matches, vary the tolerance and/or cutoff.
Singletons 86
Begin 2048 comparisons:
CBmap 1 Clones 25 CBs 48 A's 2
CBmap 2 Clones 17 CBs 40 A's 0
CBmap 3 Clones 3 CBs 27 A's 0
Time User 0 min, 3 sec, 405088 microsec System 0 min, 7 sec, 644256 microsec
CB Single: Tol 7 Cutoff 1e-08 DiffBury 0.10
Ignore 19 Total: Used 45 Maps 3 Gap 91 Extra 215 CB 115
AvgGap 2.02 AvgExtra 4.78 AvgSpan 8.904
Unbury 9 Less than MinBands 22
Three maps were created with 25, 17, and 3 clones respectively.
II. Building contigs and incremental updates
III. Refining contigs
A. Even if your data is perfect and you have no end fragments, you will usually
get some bands that will not place; e.g. if two bands that are not the same genomic
sequence are called the same because they
have the same measurement and are in overlapping clones,
this will generally cause a subsequent mistake, possibly resulting in an extra band.
How many extra bands is acceptable is dependent on your data, the more bands per clone
the more extra bands you will probably have. Also, the number will increase as clones are added
to the right of the CBmap since error accumulates. But to give an example, a EcoRI restriction
digest of cosmid clones generally has between 1 and 5 extra per clone
in the maps.
A. There are many possible sets of consensus bands and then many possible arrangements
of each set; hence, there tends to be many equally likely solutions.
Basically, you have to spend some time on your first few contigs to get a feel for when
a CB solution is closest to correct for your data.
IV. Adding clones to existing contigs
A keyset of the new clones can be created by requesting
the clones added after a given date (e.g. yesterday's date).
From the Main Analysis window,
compare the new clones against the rest of the clone in the
database by executing Keyset-->Fpc.
The project window will be displayed listing the number of hits
for each contig.
For each contig with hits: display the contig,
select 'Compare Keyset' from the Contig Analysis window.
An internal list is
created of clones that match the contig. Step thorough the list by
selecting Next, a clone is listed and all the clones it matches are
highlighted. If you want to add the clone, select Add and refine the
position, or Add&Bury.
V. Simulation results
The following data set was provided by LaDeana Hillier and Ken McDonald, GSC, St. Louis.
For each of the 6 C.elegans chromosomes, the N's and gaps were removed from the sequence (hence, one
sequence for each chromosome). Clones were generated to give a 5x coverage with an approximately 80%
overlap. A simulated double digest of HindIII and PstI was performed resulting in an average 18 bands
per clone.
The clones were not started and stopped at either of these sites, hence, they have end
fragments.
TABLE 1. Simulation using C.elegans data.
* Contigs containing more than one chromosome.
Cutoff
Contigs
F+
F-
Mixed*
Out-of-order-Pairs**
1e-06
51
22
49
5
319(5)
1e-07
167
2
129
2
71(0)
1e-08
301
0
271
0
57(0)
1e-08 Calc on 1e-06
74
-
-
0
75(1)
1e-08 Expert on 1e-06
74
-
-
0
79(5)
** Pairs of clones in the wrong order. The number in parenthesis is the
number of pairs whose coordinates do not overlap.
Release Notes
Fingerprint window -
Bug fix: It was the case that if two bands from one clone matched one band
in the high lighed clone, they both turned red. It has been corrected that
only one will turn red.
New Feature: matched bands for the highlighted clone turn blue, the rest
stay black. A switch has been added called HighGreen, if on, the highlighted
clone will have matched bands in green instead of blue (I can't tell the black
from the blue, others cannot see the green, so to make everyone happy, this switch).
NB: These changes have NOT been made to the Gel Image window.
Gaps in a contig display are shown with yellow bars at the bottom of the contig.
Maximum marker name size is 16 (maximum clone size is still 10)
Build Contigs - was using CBcoords, now uses Len=Bands
OKAll on CBmap for Contig Analysis, bug fixes and Big improvements for arranging
CBmaps.
More batch processing: 1) Ace merge, 2) Dump as Ace, 3) Submit for Sequencing.
Modified dates work correctly for Ace Merge.
Many small bug fixe, e.g.:
1. Entering a new contig number on the main window now always works.
2. For Bury-all-selected and Bury-all-unselected (from Rule Menu), identical clones
will get the same parent if both are being buried.
3. Faster moving of clones for clones with markers
Known bugs:
Sometimes gets a Box-out-of-range error.
Limited batch processing
1. Run Kill/Build/Ok.
2. Output matches for a clone
Execute: fpc project -batch help
to see arguments.
CBmap -
Add OkAll to instantiate all CBmaps in current contig
FPC Analysis
Since the command sequence Kill/Singles/OkAll has become a
common nightly set of commands to run,
this set of commands have been bundled into "Build Contigs".
Contig Analysis/CBmap/Ok window
Has an option "Left End" which is used with Gel Order or FP Order.
That is, if you have selected an order via the Gel or Fp window and
instantiated it with the Analysis Gel Order or Fp Order, using the
Left End option insures that the left end of each clone comes after
the previous.
Project Window
Pull down menu in white space, there are two new options:
GoTo Current will always go to the displayed contig,
of if the Framework is showing, it will go to the marker
whose text window is showing. Everytime one change text
windows, you must re-execute this function (this is not
true for contigs).
GoTo Top turns the go to off and displays the top of the window.
File...
Check Cor
Occasionally you may get a "Box out of bounds" message and then
you have no choice but quit. In the past, this has meant you loses
changes since last save, now you have the option to save first.
Contig Analysis
Expert - a function to pick the best clones from contig and order them.