| Poisson calculations | |||||||||
| The strategy for the shotgun approach follows the Lander and Waterman | |||||||||
| application of the Poisson distribution - Lander ES, Waterman MS | |||||||||
| Genomic mapping by fingerprinting random clones: a mathematical | |||||||||
| analysis Genomics 2 (3): 231-239 (1988) | |||||||||
| updated 02-06-06 | |||||||||
| Table 1: The probability a base is not sequenced | |||||||||
| The probability a base is not sequenced is given by: P0=e-c where c=fold sequence coverage (c=LN/G), | |||||||||
| LN=# bases sequenced, i.e. L=read length and N= # reads, and the constant, e=2.718 (e=2.718281828459) | |||||||||
| Fold sequence coverage | P0=e-c | P0x100= | |||||||
| Coverage | P0=e-c | P0=1/ec | % not sequenced | % sequenced | |||||
| 0.25 | P0=e-0.25 | 1/e0.25 = | 1/1.221 = | 0.78 | 78% | 22% | |||
| 0.5 | P0=e-0.50 | 1/e0.50 = | 1/1.648 = | 0.61 | 61% | 39% | |||
| 0.75 | P0=e-0.75 | 1/e0.75 = | 1/2.117 = | 0.47 | 47% | 53% | |||
| 1 | P0=e-1 | 1/e1= | 1/2.718 = | 0.37 | 37% | 63% | |||
| 1.5 | P0=e-1.5 | 1/e1.5= | 1/4.481 = | 0.223 | 22.30% | 77.70% | |||
| 2 | P0=e-2 | 1/e2= | 1/7.389 = | 0.135 | 13.50% | 87.50% | |||
| 3 | P0=e-3 | 1/e3= | 1/20.086 = | 0.050 | 5.00% | 95.00% | |||
| 4 | P0=e-4 | 1/e4= | 1/54.598 = | 0.018 | 1.80% | 98.20% | |||
| 5 | P0=e-5 | 1/e5= | 1/148.4 = | 0.0067 | 0.67% | 99.33% | |||
| 6 | P0=e-6 | 1/e6= | 1/403.4 = | 0.0025 | 0.25% | 99.75% | |||
| 7 | P0=e-7 | 1/e7= | 1/1096.63 = | 0.0009 | 0.09% | 99.91% | |||
| 8 | P0=e-8 | 1/e8= | 1/2980.95 = | 0.0003 | 0.03% | 99.97% | |||
| 9 | P0=e-9 | 1/e9= | 1/8103.08 = | 0.0001 | 0.01% | 99.99% | |||
| 10 | P0=e-10 | 1/e10= | 1/22026.5 = | 0.00045 | 0.01% | 100.00% | |||
| Note: % sequenced is independent of read length when fold coverage is considered | |||||||||
| Table 2: Total Gap Length vs % Sequenced | |||||||||
| Total Gap Length=Ge-c where c = Fold coverage, G=target sequence length and e-c = P0 | |||||||||
| Genome size = | 50kb | 150kb | 300kb | 2Mb | 4Mb | 20Mb | 40Mb | 500Mb | % |
| Fold coverage | Ge-c | Ge-c | Ge-c | Ge-c | Ge-c | Ge-c | Ge-c | Ge-c | sequenced |
| 1 | 18,500 | 55,500 | 111,000 | 740,000 | 1,480,000 | 7,400,000 | 14,800,000 | 185,000,000 | 63.00% |
| 2 | 6,750 | 20,250 | 40,500 | 270,000 | 540,000 | 2,700,000 | 5,400,000 | 67,500,000 | 87.50% |
| 3 | 2,500 | 7,500 | 15,000 | 100,000 | 200,000 | 1,000,000 | 2,000,000 | 25,000,000 | 95.00% |
| 4 | 900 | 2,700 | 5,400 | 36,000 | 72,000 | 360,000 | 720,000 | 9,000,000 | 98.20% |
| 5 | 335 | 1,005 | 2,010 | 13,400 | 26,800 | 134,000 | 268,000 | 3,350,000 | 99.40% |
| 6 | 125 | 375 | 750 | 5,000 | 10,000 | 50,000 | 100,000 | 1,250,000 | 99.75% |
| 7 | 45 | 135 | 270 | 1,800 | 3,600 | 18,000 | 36,000 | 450,000 | 99.91% |
| 8 | 15 | 45 | 90 | 600 | 1,200 | 6,000 | 12,000 | 150,000 | 99.97% |
| 9 | 5 | 15 | 30 | 200 | 400 | 2,000 | 4,000 | 50,000 | 99.99% |
| 10 | 2 | 6 | 12 | 90 | 180 | 900 | 1,800 | 20,000 | 99.995% |
| Table 3: Number of Sequence Reads for x-fold Coverage | |||||||||
| Number of sequence reads for x-fold coverage = N = | |||||||||
| Genome Size x Fold Cov./BasesPerRead | |||||||||
| genome size = 50kb | |||||||||
| Read Length = | 100 | 200 | 300 | 400 | 500 | 600 | |||
| Fold cov. | Number of sequence reads for x-fold coverage | ||||||||
| 1 | 500 | 250 | 167 | 125 | 100 | 83 | |||
| 2 | 1,000 | 500 | 333 | 250 | 200 | 167 | |||
| 3 | 1,500 | 750 | 500 | 375 | 300 | 250 | |||
| 4 | 2,000 | 1,000 | 667 | 500 | 400 | 333 | |||
| 5 | 2,500 | 1,250 | 833 | 625 | 500 | 417 | |||
| 6 | 3,000 | 1,500 | 1,000 | 750 | 600 | 500 | |||
| 7 | 3,500 | 1,750 | 1,167 | 875 | 700 | 583 | |||
| 8 | 4,000 | 2,000 | 1,333 | 1,000 | 800 | 667 | |||
| 9 | 4,500 | 2,250 | 1,500 | 1,125 | 900 | 750 | |||
| 10 | 5,000 | 2,500 | 1,667 | 1,250 | 1,000 | 833 | |||
| genome size = 150kb | |||||||||
| Read Length = | 100 | 200 | 300 | 400 | 500 | 600 | |||
| Fold cov. | Number of sequence reads for x-fold coverage | ||||||||
| 1 | 1,500 | 750 | 500 | 375 | 300 | 250 | |||
| 2 | 3,000 | 1,500 | 1,000 | 750 | 600 | 500 | |||
| 3 | 4,500 | 2,250 | 1,500 | 1,125 | 900 | 750 | |||
| 4 | 6,000 | 3,000 | 2,000 | 1,500 | 1,200 | 1,000 | |||
| 5 | 7,500 | 3,750 | 2,500 | 1,875 | 1,500 | 1,250 | |||
| 6 | 9,000 | 4,500 | 3,000 | 2,250 | 1,800 | 1,500 | |||
| 7 | 10,500 | 5,250 | 3,500 | 2,625 | 2,100 | 1,750 | |||
| 8 | 12,000 | 6,000 | 4,000 | 3,000 | 2,400 | 2,000 | |||
| 9 | 13,500 | 6,750 | 4,500 | 3,375 | 2,700 | 2,250 | |||
| 10 | 15,000 | 7,500 | 5,000 | 3,750 | 3,000 | 2,500 | |||
| genome size = 300kb | |||||||||
| Read Length = | 100 | 200 | 300 | 400 | 500 | 600 | |||
| Fold cov. | Number of sequence reads for x-fold coverage | ||||||||
| 1 | 3,000 | 1,500 | 1,000 | 750 | 600 | 500 | |||
| 2 | 6,000 | 3,000 | 2,000 | 1,500 | 1,200 | 1,000 | |||
| 3 | 9,000 | 4,500 | 3,000 | 2,250 | 1,800 | 1,500 | |||
| 4 | 12,000 | 6,000 | 4,000 | 3,000 | 2,400 | 2,000 | |||
| 5 | 15,000 | 7,500 | 5,000 | 3,750 | 3,000 | 2,500 | |||
| 6 | 18,000 | 9,000 | 6,000 | 4,500 | 3,600 | 3,000 | |||
| 7 | 21,000 | 10,500 | 7,000 | 5,250 | 4,200 | 3,500 | |||
| 8 | 24,000 | 12,000 | 4,000 | 6,000 | 4,800 | 4,000 | |||
| 9 | 27,000 | 13,500 | 9,000 | 6,750 | 5,400 | 4,500 | |||
| 10 | 30,000 | 15,000 | 10,000 | 7,500 | 6,000 | 5,000 | |||
| genome size = 2Mb | |||||||||
| Read Length = | 100 | 200 | 300 | 400 | 500 | 600 | |||
| Fold cov. | Number of sequence reads for x-fold coverage | ||||||||
| 1 | 20,000 | 10,000 | 6,640 | 5,000 | 4,000 | 3,333 | |||
| 2 | 40,000 | 20,000 | 13,320 | 10,000 | 8,000 | 6,667 | |||
| 3 | 60,000 | 30,000 | 20,000 | 15,000 | 12,000 | 10,000 | |||
| 4 | 80,000 | 40,000 | 26,640 | 20,000 | 16,000 | 13,333 | |||
| 5 | 100,000 | 50,000 | 33,320 | 25,000 | 20,000 | 16,667 | |||
| 6 | 120,000 | 60,000 | 40,000 | 30,000 | 24,000 | 20,000 | |||
| 7 | 140,000 | 70,000 | 46,640 | 35,000 | 28,000 | 23,333 | |||
| 8 | 160,000 | 80,000 | 53,320 | 40,000 | 32,000 | 26,667 | |||
| 9 | 180,000 | 90,000 | 60,000 | 45,000 | 36,000 | 30,000 | |||
| 10 | 200,000 | 100,000 | 66,668 | 50,000 | 40,000 | 33,333 | |||
| genome size = 4Mb | |||||||||
| Read Length = | 100 | 200 | 300 | 400 | 500 | 600 | |||
| Fold cov. | Number of sequence reads for x-fold coverage | ||||||||
| 1 | 40,000 | 20,000 | 13,280 | 10,000 | 8,000 | 6,667 | |||
| 2 | 80,000 | 40,000 | 26,640 | 20,000 | 16,000 | 13,334 | |||
| 3 | 120,000 | 60,000 | 40,000 | 30,000 | 24,000 | 20,000 | |||
| 4 | 160,000 | 80,000 | 53,280 | 40,000 | 32,000 | 26,667 | |||
| 5 | 200,000 | 100,000 | 66,640 | 50,000 | 40,000 | 33,334 | |||
| 6 | 240,000 | 120,000 | 80,000 | 60,000 | 48,000 | 40,000 | |||
| 7 | 280,000 | 140,000 | 93,280 | 70,000 | 56,000 | 46,667 | |||
| 8 | 320,000 | 160,000 | 106,640 | 80,000 | 64,000 | 53,333 | |||
| 9 | 360,000 | 180,000 | 120,000 | 90,000 | 72,000 | 60,000 | |||
| 10 | 400,000 | 200,000 | 133,336 | 100,000 | 80,000 | 66,667 | |||
| genome size = 20Mb | |||||||||
| Read Length = | 100 | 200 | 300 | 400 | 500 | 600 | |||
| Fold cov. | Number of sequence reads for x-fold coverage | ||||||||
| 1 | 200,000 | 100,000 | 66,400 | 50,000 | 40,000 | 33,330 | |||
| 2 | 400,000 | 200,000 | 133,200 | 100,000 | 80,000 | 66,670 | |||
| 3 | 600,000 | 300,000 | 200,000 | 150,000 | 120,000 | 100,000 | |||
| 4 | 800,000 | 400,000 | 266,400 | 200,000 | 160,000 | 133,330 | |||
| 5 | 1,000,000 | 500,000 | 333,200 | 250,000 | 200,000 | 166,670 | |||
| 6 | 1,200,000 | 600,000 | 400,000 | 300,000 | 240,000 | 200,000 | |||
| 7 | 1,400,000 | 700,000 | 466,400 | 350,000 | 280,000 | 233,330 | |||
| 8 | 1,600,000 | 800,000 | 533,200 | 400,000 | 320,000 | 266,670 | |||
| 9 | 1,800,000 | 900,000 | 600,000 | 450,000 | 360,000 | 300,000 | |||
| 10 | 2,000,000 | 1,000,000 | 666,680 | 500,000 | 400,000 | 333,330 | |||
| genome size = 40Mb | |||||||||
| Read Length = | 100 | 200 | 300 | 400 | 500 | 600 | |||
| Fold cov. | Number of sequence reads for x-fold coverage | ||||||||
| 1 | 400,000 | 200,000 | 132,800 | 10,000 | 8,000 | 6,667 | |||
| 2 | 800,000 | 400,000 | 266,400 | 20,000 | 16,000 | 13,334 | |||
| 3 | 1,200,000 | 600,000 | 400,000 | 30,000 | 24,000 | 20,000 | |||
| 4 | 1,600,000 | 800,000 | 532,800 | 40,000 | 32,000 | 26,667 | |||
| 5 | 2,000,000 | 1,000,000 | 666,400 | 50,000 | 40,000 | 33,334 | |||
| 6 | 2,400,000 | 1,200,000 | 800,000 | 60,000 | 48,000 | 40,000 | |||
| 7 | 2,800,000 | 1,400,000 | 932,800 | 70,000 | 56,000 | 46,667 | |||
| 8 | 3,200,000 | 1,600,000 | 1,066,400 | 80,000 | 64,000 | 53,333 | |||
| 9 | 3,600,000 | 1,800,000 | 1,200,000 | 90,000 | 72,000 | 60,000 | |||
| 10 | 4,000,000 | 2,000,000 | 1,333,360 | 100,000 | 80,000 | 66,667 | |||
| genome size = 500Mb | |||||||||
| Read Length = | 100 | 200 | 300 | 400 | 500 | 600 | |||
| Fold cov. | Number of sequence reads for x-fold coverage | ||||||||
| 1 | 5,000,000 | 2,500,000 | 1,667,000 | 1,250,000 | 1,000,000 | 833,333 | |||
| 2 | 10,000,000 | 5,000,000 | 3,333,000 | 2,500,000 | 2,000,000 | 1,666,666 | |||
| 3 | 15,000,000 | 7,500,000 | 5,000,000 | 3,750,000 | 3,000,000 | 2,500,000 | |||
| 4 | 20,000,000 | 10,000,000 | 6,667,000 | 5,000,000 | 4,000,000 | 3,333,333 | |||
| 5 | 25,000,000 | 12,500,000 | 8,333,000 | 6,250,000 | 5,000,000 | 4,366,666 | |||
| 6 | 30,000,000 | 15,000,000 | 10,000,000 | 7,500,000 | 6,000,000 | 5,000,000 | |||
| 7 | 35,000,000 | 17,500,000 | 11,667,000 | 8,750,000 | 7,000,000 | 5,833,333 | |||
| 8 | 40,000,000 | 20,000,000 | 13,333,000 | 10,000,000 | 8,000,000 | 6,666,666 | |||
| 9 | 45,000,000 | 22,500,000 | 15,000,000 | 11,250,000 | 9,000,000 | 7,500,000 | |||
| 10 | 50,000,000 | 25,000,000 | 16,667,000 | 12,500,000 | 10,000,000 | 8,733,333 | |||