Poisson calculations
The strategy for the shotgun approach follows the Lander and Waterman
application of the Poisson distribution - Lander ES, Waterman MS
Genomic mapping by fingerprinting random clones: a mathematical
analysis Genomics 2 (3): 231-239 (1988)
updated 02-06-06
Table 1: The probability a base is not sequenced
The probability a base is not sequenced is given by: P0=e-c where c=fold sequence coverage (c=LN/G),
LN=# bases sequenced, i.e. L=read length and N= # reads, and the constant, e=2.718 (e=2.718281828459)
Fold sequence coverage P0=e-c P0x100=
Coverage P0=e-c P0=1/ec % not sequenced % sequenced
0.25 P0=e-0.25 1/e0.25 = 1/1.221 = 0.78 78% 22%
0.5 P0=e-0.50 1/e0.50 = 1/1.648 = 0.61 61% 39%
0.75 P0=e-0.75 1/e0.75 = 1/2.117 = 0.47 47% 53%
1 P0=e-1 1/e1= 1/2.718 = 0.37 37% 63%
1.5 P0=e-1.5 1/e1.5= 1/4.481 = 0.223 22.30% 77.70%
2 P0=e-2 1/e2= 1/7.389 = 0.135 13.50% 87.50%
3 P0=e-3 1/e3= 1/20.086 = 0.050 5.00% 95.00%
4 P0=e-4 1/e4= 1/54.598 = 0.018 1.80% 98.20%
5 P0=e-5 1/e5= 1/148.4 = 0.0067 0.67% 99.33%
6 P0=e-6 1/e6= 1/403.4 = 0.0025 0.25% 99.75%
7 P0=e-7 1/e7= 1/1096.63 = 0.0009 0.09% 99.91%
8 P0=e-8 1/e8= 1/2980.95 = 0.0003 0.03% 99.97%
9 P0=e-9 1/e9= 1/8103.08 = 0.0001 0.01% 99.99%
10 P0=e-10 1/e10= 1/22026.5 = 0.00045 0.01% 100.00%
Note: % sequenced is independent of read length when fold coverage is considered
Table 2: Total Gap Length vs % Sequenced
Total Gap Length=Ge-c where c = Fold coverage, G=target sequence length and e-c = P0
Genome size = 50kb 150kb 300kb 2Mb 4Mb 20Mb 40Mb 500Mb %
Fold coverage Ge-c Ge-c Ge-c Ge-c Ge-c Ge-c Ge-c Ge-c sequenced
1 18,500 55,500 111,000 740,000 1,480,000 7,400,000 14,800,000 185,000,000 63.00%
2 6,750 20,250 40,500 270,000 540,000 2,700,000 5,400,000 67,500,000 87.50%
3 2,500 7,500 15,000 100,000 200,000 1,000,000 2,000,000 25,000,000 95.00%
4 900 2,700 5,400 36,000 72,000 360,000 720,000 9,000,000 98.20%
5 335 1,005 2,010 13,400 26,800 134,000 268,000 3,350,000 99.40%
6 125 375 750 5,000 10,000 50,000 100,000 1,250,000 99.75%
7 45 135 270 1,800 3,600 18,000 36,000 450,000 99.91%
8 15 45 90 600 1,200 6,000 12,000 150,000 99.97%
9 5 15 30 200 400 2,000 4,000 50,000 99.99%
10 2 6 12 90 180 900 1,800 20,000 99.995%
Table 3: Number of Sequence Reads for x-fold Coverage
Number of sequence reads for x-fold coverage = N =
Genome Size x Fold Cov./BasesPerRead
genome size = 50kb
Read Length = 100 200 300 400 500 600
Fold cov. Number of sequence reads for x-fold coverage
1 500 250 167 125 100 83
2 1,000 500 333 250 200 167
3 1,500 750 500 375 300 250
4 2,000 1,000 667 500 400 333
5 2,500 1,250 833 625 500 417
6 3,000 1,500 1,000 750 600 500
7 3,500 1,750 1,167 875 700 583
8 4,000 2,000 1,333 1,000 800 667
9 4,500 2,250 1,500 1,125 900 750
10 5,000 2,500 1,667 1,250 1,000 833
genome size = 150kb
Read Length = 100 200 300 400 500 600
Fold cov. Number of sequence reads for x-fold coverage
1 1,500 750 500 375 300 250
2 3,000 1,500 1,000 750 600 500
3 4,500 2,250 1,500 1,125 900 750
4 6,000 3,000 2,000 1,500 1,200 1,000
5 7,500 3,750 2,500 1,875 1,500 1,250
6 9,000 4,500 3,000 2,250 1,800 1,500
7 10,500 5,250 3,500 2,625 2,100 1,750
8 12,000 6,000 4,000 3,000 2,400 2,000
9 13,500 6,750 4,500 3,375 2,700 2,250
10 15,000 7,500 5,000 3,750 3,000 2,500
genome size = 300kb
Read Length = 100 200 300 400 500 600
Fold cov. Number of sequence reads for x-fold coverage
1 3,000 1,500 1,000 750 600 500
2 6,000 3,000 2,000 1,500 1,200 1,000
3 9,000 4,500 3,000 2,250 1,800 1,500
4 12,000 6,000 4,000 3,000 2,400 2,000
5 15,000 7,500 5,000 3,750 3,000 2,500
6 18,000 9,000 6,000 4,500 3,600 3,000
7 21,000 10,500 7,000 5,250 4,200 3,500
8 24,000 12,000 4,000 6,000 4,800 4,000
9 27,000 13,500 9,000 6,750 5,400 4,500
10 30,000 15,000 10,000 7,500 6,000 5,000
genome size = 2Mb
Read Length = 100 200 300 400 500 600
Fold cov. Number of sequence reads for x-fold coverage
1 20,000 10,000 6,640 5,000 4,000 3,333
2 40,000 20,000 13,320 10,000 8,000 6,667
3 60,000 30,000 20,000 15,000 12,000 10,000
4 80,000 40,000 26,640 20,000 16,000 13,333
5 100,000 50,000 33,320 25,000 20,000 16,667
6 120,000 60,000 40,000 30,000 24,000 20,000
7 140,000 70,000 46,640 35,000 28,000 23,333
8 160,000 80,000 53,320 40,000 32,000 26,667
9 180,000 90,000 60,000 45,000 36,000 30,000
10 200,000 100,000 66,668 50,000 40,000 33,333
genome size = 4Mb
Read Length = 100 200 300 400 500 600
Fold cov. Number of sequence reads for x-fold coverage
1 40,000 20,000 13,280 10,000 8,000 6,667
2 80,000 40,000 26,640 20,000 16,000 13,334
3 120,000 60,000 40,000 30,000 24,000 20,000
4 160,000 80,000 53,280 40,000 32,000 26,667
5 200,000 100,000 66,640 50,000 40,000 33,334
6 240,000 120,000 80,000 60,000 48,000 40,000
7 280,000 140,000 93,280 70,000 56,000 46,667
8 320,000 160,000 106,640 80,000 64,000 53,333
9 360,000 180,000 120,000 90,000 72,000 60,000
10 400,000 200,000 133,336 100,000 80,000 66,667
genome size = 20Mb
Read Length = 100 200 300 400 500 600
Fold cov. Number of sequence reads for x-fold coverage
1 200,000 100,000 66,400 50,000 40,000 33,330
2 400,000 200,000 133,200 100,000 80,000 66,670
3 600,000 300,000 200,000 150,000 120,000 100,000
4 800,000 400,000 266,400 200,000 160,000 133,330
5 1,000,000 500,000 333,200 250,000 200,000 166,670
6 1,200,000 600,000 400,000 300,000 240,000 200,000
7 1,400,000 700,000 466,400 350,000 280,000 233,330
8 1,600,000 800,000 533,200 400,000 320,000 266,670
9 1,800,000 900,000 600,000 450,000 360,000 300,000
10 2,000,000 1,000,000 666,680 500,000 400,000 333,330
genome size = 40Mb
Read Length = 100 200 300 400 500 600
Fold cov. Number of sequence reads for x-fold coverage
1 400,000 200,000 132,800 10,000 8,000 6,667
2 800,000 400,000 266,400 20,000 16,000 13,334
3 1,200,000 600,000 400,000 30,000 24,000 20,000
4 1,600,000 800,000 532,800 40,000 32,000 26,667
5 2,000,000 1,000,000 666,400 50,000 40,000 33,334
6 2,400,000 1,200,000 800,000 60,000 48,000 40,000
7 2,800,000 1,400,000 932,800 70,000 56,000 46,667
8 3,200,000 1,600,000 1,066,400 80,000 64,000 53,333
9 3,600,000 1,800,000 1,200,000 90,000 72,000 60,000
10 4,000,000 2,000,000 1,333,360 100,000 80,000 66,667
genome size = 500Mb
Read Length = 100 200 300 400 500 600
Fold cov. Number of sequence reads for x-fold coverage
1 5,000,000 2,500,000 1,667,000 1,250,000 1,000,000 833,333
2 10,000,000 5,000,000 3,333,000 2,500,000 2,000,000 1,666,666
3 15,000,000 7,500,000 5,000,000 3,750,000 3,000,000 2,500,000
4 20,000,000 10,000,000 6,667,000 5,000,000 4,000,000 3,333,333
5 25,000,000 12,500,000 8,333,000 6,250,000 5,000,000 4,366,666
6 30,000,000 15,000,000 10,000,000 7,500,000 6,000,000 5,000,000
7 35,000,000 17,500,000 11,667,000 8,750,000 7,000,000 5,833,333
8 40,000,000 20,000,000 13,333,000 10,000,000 8,000,000 6,666,666
9 45,000,000 22,500,000 15,000,000 11,250,000 9,000,000 7,500,000
10 50,000,000 25,000,000 16,667,000 12,500,000 10,000,000 8,733,333