FPC V3.2 User's Guide

FPC V3.2 was released on 15 Jan 98. Anyone who is using FPC should get this recent one. There are NO changes to the input or output files, and no previous features have been removed. There are only bug fixes, enhancement and the assembly routines have been optimized using GSC C.elegan simulation data (see V). Just get the new executable or source files, install and continue as before. I have been told by a few beta user's that the new features are highly desirable. The manual has not been updated since release V3, but the new options are covered in this user's guide.

This is to be used in conjunction with the User's Manual, which describes all FPC features.

Contents:
I. Setting the tolerance and cutoff
II. Creating contigs and incremental updates
III. Refining contigs
IV. Adding clones to exiting contigs
V. Simulation results
VI. Release Notes for the last 3 releases

I. Setting the tolerance and cutoff

When you start using FPC for a project, you must determine the best tolerance and cutoff for your data.

TOLERANCE: You can have fixed or variable tolerance.

By default, FPC uses fixed. Change to variable tolerance on the main Configure window.

I have seen the following cases:
(i) Bands measured on the acrylimide gels using Image result in migration rates, so a tolerance of around 7 is appropriate.
(ii) Bands measured on the acrylimide gels using the ABI software result in bands sizes with a fixed tolerance, so a tolerance of 1 or 2 is appropriate (i.e. in this case it refers to bases).
(iii) Bands measured on agarose gels use a variable tolerance, so using a tolerance of 3 will give a 0.03% uncertainty.

CUTOFF: You can use Equation 1 or Equation 2 (see CABIOS Vol 13 No. 5). Both take into account the tolerance, number of matching bands, and number of bands in each clone in order to calculate the probability that the matching bands are a coincidence. The lower the number (i.e. the higher the exponent), the more stringent the probability for calling two clone overlapping.

A frequent source of confusion is the fact that the tolerance is used in the equation. For example, using fixed tolerance and equation 1, making the tolerance more stringent can cause the coincident score to become more stringent (Case 1) or LESS stringent (Case 2).

Case 1:
  Tol    Clone 1          Clone 2       Matching Bands     Cutoff
  7        22               15             11               3e-09
  5        22               15             11               3e-11

Case 2:
  Tol    Clone 1          Clone 2       Matching Bands     Cutoff
  7        22               36             14               3e-12
  5        22               36             14               3e-09

It is important to determine the best tolerance and then not vary it.

Another source of confusion is that the number of bands are used in the equation, so you can get the following:

   Clone 1          Clone 2       Matching Bands     Cutoff
     52               38             12               3e-02
     52               16             12               3e-06

HINTS on how to determine tolerance and cutoff

The results from the simulation (see V.) are very elucidating on setting the cutoff. The following are some additional suggestions.

1. View a set of clones using the fingerprint window and try different tolerances to see the effect. The tolerance can be changed on the window, and selecting a clone will highlight all bands in other clones that are within the tolerance of a band in the highlighted clone. (Thanks to Sam Cartinhour for this suggestion.)

2. On Main Analysis window, beside "--> FPC" enter a clone name and select "--> FPC". You will get the following type output:

  >> bk7d10 ctg0 22b --> Fpc  (Tol 5, Cutoff 1e-08)
  Ctg0   bK123A11 15b    11 9e-11   
  Ctg0   bK256D12 36b    14 2e-09           
  Ctg0     cB58D5 12b     9 4e-09                      
  Ctg0     cB73B9 15b    10 3e-09                      
  Ctg0    dJ76B20 40b    18 5e-14          
If you have a 5-fold coverage, this number of matches would be appropriate. If you get too many or two few matches, vary the tolerance and/or cutoff.

3. On Main Analysis, run Build Contig. You will get the following style output:

  Singletons 86
  Begin 2048 comparisons:

  CBmap 1    Clones  25  CBs  48  A's  2
  CBmap 2    Clones  17  CBs  40  A's  0
  CBmap 3    Clones   3  CBs  27  A's  0
  Time User 0 min, 3 sec, 405088 microsec System 0 min, 7 sec, 644256 microsec 
  CB Single: Tol 7 Cutoff 1e-08 DiffBury 0.10
  Ignore 19  Total: Used  45 Maps  3  Gap  91  Extra 215  CB 115
  AvgGap 2.02  AvgExtra 4.78  AvgSpan 8.904
  Unbury 9   Less than MinBands 22
Three maps were created with 25, 17, and 3 clones respectively.

If you decide you do not like them (e.g. too many or too few), adjust the tolerance and/or cutoff, enter the maximum size beside "Kill" and execute Build Contig. Continue to do this until you get the size contigs you expect.

II. Building contigs and incremental updates

The 'Build Contigs' function uses the CB routine which clusters clones based on good overlap probabilities. All clones in a resulting contigs have a good overlap with as least one other clone in the contig. It also burys clones (see manual) and gives an approximate initial ordering. Since it is a fast approximation for an NP-complete problem, the initial ordering can be wrong due to: 1) local minima, 2) clones with incorrect bands can screw up the ordering, and 3) the error in the map increases with the number of clones (see Soderlund, Longden, and Mott, CABIOS, Vol 13 No 5). Therefore, it is important to verify and refine any contig before picking clones for sequencing or publication, etc.

If you are daily adding clones to your database, the following scenario is recommended: after running Update .cor to add all new clones to the database, run Build Contigs which will kill all contigs beneath a given size and rebuild contigs which will include the new clones. When a contig is of 'acceptable' size, refine it. In order to keep it from being 'killed' on the next Build Contigs, set the sequence state of one or more clones in the contig to something other than NONE; the Kill command will not kill any contig with a clone set for sequencing. Once you start to refine a contig, then you have to semi-manually add clones and merge other contigs with it. This is why it is best to put off as long as possible refining a contig; i.e. the Build Contigs will automatically add clones and merge contigs since they are recomputed.

A common practice is to add a special standard remark to all important clones, then a keyset of clones with the remark can be created, and the project window will list the contigs with clones in the keyset by using the Keyset option. In this way, you can keep track of important clones even though they may end up in different numbered contigs after each Build Contigs.

III. Refining contigs

Here again, the simulation results (see V.) are of interest. The following are some additional suggestions.

  • From the Contig Analysis window, execute Calc or Expert.

    To use Calc, lower the cutoff (e.g. if the contigs are build with 1e-08, use 1e-09 or 1e-10). Run Calc and it should create one or more CBmaps of well ordered clones and ignore any clones which do not have a good overlap with any other clones based on the cutoff. If you get too many clones with A's above them, or too many maps, change the cutoff and try again.

    Alternatively, use Expert. For this, keep the cutoff the same or increase it (e.g. from 1e-08 to 1e-06). Run Expert. It tries to find the best set of clones and orders them. You will generally get multiple CBmaps. If the maps are not satisfactory, alter the cutoff and execute Expert again.

    Q. On the CB display, how many extra bands are too many?
    A. Even if your data is perfect and you have no end fragments, you will usually get some bands that will not place; e.g. if two bands that are not the same genomic sequence are called the same because they have the same measurement and are in overlapping clones, this will generally cause a subsequent mistake, possibly resulting in an extra band. How many extra bands is acceptable is dependent on your data, the more bands per clone the more extra bands you will probably have. Also, the number will increase as clones are added to the right of the CBmap since error accumulates. But to give an example, a EcoRI restriction digest of cosmid clones generally has between 1 and 5 extra per clone in the maps.

    Q. When do you decide that Expert or Calc gives a reasonable answer?
    A. There are many possible sets of consensus bands and then many possible arrangements of each set; hence, there tends to be many equally likely solutions. Basically, you have to spend some time on your first few contigs to get a feel for when a CB solution is closest to correct for your data.

    A good test when using Calc: try running Again a few times. The order will remain the same so that you can tell if the clones jump around much (sort by Left makes it the easiest to view). If the order if 'fairly' stable, select the solution with the least 'o's and extras.

  • Execute Ok which brings up a menu. Execute OkAll.

  • Verification.

    The 'Eval Contig' shows any pairs of clones that have a good overlap score but are not overlapping. I have run this test on many contigs that have been done by hand, and compared the the resulting number of 'no overlaps' with that resulting from the CB solution, and they are generally about the same. A more stringent test is the 'Bad Olaps', but trying to get a perfect score with this test may not be worth the effort.

    If you are using the Gel Image window or Fingerprint window: Set the display so that the buried clones are not shown, select all the rest using the Edit Map window and then have them loaded into the Gel or Fingerprint window. Verify or change ordering; if the order is changed, use FpOrder or GelOrder to instantiate the changes.

    IV. Adding clones to existing contigs

    A keyset of the new clones can be created by requesting the clones added after a given date (e.g. yesterday's date). From the Main Analysis window, compare the new clones against the rest of the clone in the database by executing Keyset-->Fpc. The project window will be displayed listing the number of hits for each contig. For each contig with hits: display the contig, select 'Compare Keyset' from the Contig Analysis window. An internal list is created of clones that match the contig. Step thorough the list by selecting Next, a clone is listed and all the clones it matches are highlighted. If you want to add the clone, select Add and refine the position, or Add&Bury.

    V. Simulation results

    The following data set was provided by LaDeana Hillier and Ken McDonald, GSC, St. Louis. For each of the 6 C.elegans chromosomes, the N's and gaps were removed from the sequence (hence, one sequence for each chromosome). Clones were generated to give a 5x coverage with an approximately 80% overlap. A simulated double digest of HindIII and PstI was performed resulting in an average 18 bands per clone. The clones were not started and stopped at either of these sites, hence, they have end fragments.

    The data set has no ambiguities except the end fragments. There are 4278 clones. The bands have migration rate values within the range of [766,3289]. All the following runs used a tolerance of 7, which gives 1890 unique bands out of 80115 total with an average bin size of 516 (e.g. there are 538 bands in the data set within 7 +/- 3228).

    TABLE 1. Simulation using C.elegans data.
    Cutoff Contigs F+ F- Mixed* Out-of-order-Pairs**
    1e-06 51 22 49 5 319(5)
    1e-07 167 2 129 2 71(0)
    1e-08 301 0 271 0 57(0)
    1e-08 Calc on 1e-06 74 - - 0 75(1)
    1e-08 Expert on 1e-06 74 - - 0 79(5)
    * Contigs containing more than one chromosome.
    ** Pairs of clones in the wrong order. The number in parenthesis is the number of pairs whose coordinates do not overlap.

    At a 1e-06 cutoff, clones with less than approximately 50% shared bands do not pass the overlap test. Chromosomes get split into multiple contigs where there are weak overlaps.

    The contigs with clones from multiple chromosomes have areas of complete chaos where multiple contigs have mapped to the same space. The counts of out-of-order is atrocious in these areas. Where chaotic behavior is evident, the CB algorithm is run on each chaotic contig as described in the next paragraph. Though we were able to determine the chaotic contigs by the scoring program (see below), they can be determined visually, e.g. 100 clones should not be mapping to the same space.

    The fourth entry of the table using 1e-08 Calc was created as follows: The database from 1e-06 was loaded. For the 5 contigs with chaotic areas, the Calc algorithm was run on these contigs individually using a 1e-08. The Ok was executed to instantiate the new order, using the options: (1) merge CBmaps with end clones having 1e-06 overlap or better, (2) overlap adjacent maps by 2, and (3) move disconnected CBmaps to new contigs. The effect of this is to only allow the 1e-06 overlap on the ends which adds an additional constraint, and hence removed all false negative joins.

    The fifth entry is the same as the fourth except the Expert algorithm was used instead of the Calc algorithm. The Expert routine tries to find a path through the clones and does not attempt to use all of them, un-used clones are buried. The 'bad' clones are all buried clones, i.e. because they are pseudo buried, you know they are approximately positioned -- so they are really not so bad.

    Scoring The clones were named such that they have the chromosome number and ordering information in it (e.g. ele4_256). Ken McDonald wrote a program to determine where multiple chromosome are in a contig and to determine if two clones are out of order, e.g. if ele4_256 starts after ele4_257. Note that this just means that their starting points are off by a few bands but they still overlap. The 'bad' determines if they don't overlap at all; in all cases, the bad clones are within a few units of each other. Note also that this does not indicate where clones overlap more or less than they should. For example, viewing a contig of 100 clones in the 1e-07 FPC file, the Eval Contig for Bad Overlaps shows a number of 5; that is, 5 clones fail the following test: for 30% or more of the clones they overlap with, the overlap in coordinates is at least 30% different from the number of bands shared.

    Release Notes

    15 Jan 98 - V3.2

    Fingerprint window -  
    Bug fix: It was the case that if two bands from one clone matched one band
    in the high lighed clone, they both turned red. It has been corrected that 
    only one will turn red. 
    New Feature: matched bands for the highlighted clone turn blue, the rest 
    stay black.  A switch has been added called HighGreen, if on, the highlighted 
    clone will have matched bands in green instead of blue (I can't tell the black 
    from the blue, others cannot see the green, so to make everyone happy, this switch).
    NB: These changes have NOT been made to the Gel Image window.
    
    Gaps in a contig display are shown with yellow bars at the bottom of the contig.
    
    Maximum marker name size is 16 (maximum clone size is still 10)
    
    Build Contigs - was using CBcoords, now uses Len=Bands
    OKAll on CBmap for Contig Analysis, bug fixes and Big improvements for arranging 
    CBmaps.
    
    More batch processing: 1) Ace merge, 2) Dump as Ace, 3) Submit for Sequencing.
    Modified dates work correctly for Ace Merge.
    
    Many small bug fixe, e.g.:
    1. Entering a new contig number on the main window now always works.
    2. For Bury-all-selected and Bury-all-unselected (from Rule Menu), identical clones
    will get the same parent if both are being buried.
    3. Faster moving of clones for clones with markers
    
    Known bugs:
    Sometimes gets a Box-out-of-range error.
    

    13 Oct 97 - The User's Manual has been updated. It includes all FPC V3 features.

    Oct 97 - V3

    Limited batch processing
      1. Run Kill/Build/Ok.
      2. Output matches for a clone
    Execute: fpc project -batch help
    to see arguments.
    
    CBmap -
      Add OkAll to instantiate all CBmaps in current contig
    

    10 Sept 97 - V2.9

    FPC Analysis
      Since the command sequence Kill/Singles/OkAll has become a 
    common nightly set of commands to run,
    this set of commands have been bundled into "Build Contigs". 
    
    Contig Analysis/CBmap/Ok window
      Has an option "Left End" which is used with Gel Order or FP Order.
    That is, if you have selected an order via the Gel or Fp window and
    instantiated it with the Analysis Gel Order or Fp Order, using the 
    Left End option insures that the left end of each clone comes after
    the previous.
    
    Project Window
      Pull down menu in white space, there are two new options:
      GoTo Current will always go to the displayed contig,
        of if the Framework is showing, it will go to the marker 
        whose text window is showing. Everytime one change text 
        windows, you must re-execute this function (this is not 
        true for contigs). 
      GoTo Top turns the go to off and displays the top of the window.
    
    File...
      Check Cor
    
    Occasionally you may get a "Box out of bounds" message and then
     you have no choice but quit. In the past, this has meant you loses
     changes since last save, now you have the option to save first.
    
    Contig Analysis 
      Expert - a function to pick the best clones from contig and order them.