ALLPATHS: de novo assembly of whole-genome shotgun microreads

ALLPATHS:
de novo assembly of whole-genome shotgun microreads

Jonathan Butler
Iain MacCallum
Michael Kleber
Ilya A Shlyakhter
Matthew K Belmonte
Eric S Lander
Chad Nusbaum
David B Jaffe

Genome Research 18(5):810-820 (May 2008).

ABSTRACT

New DNA sequencing technologies deliver data at dramatically lower costs, but demand new analytical methods to take full advantage of the very short reads that they produce. We provide an initial, theoretical solution to the challenge of de novo assembly from whole-genome shotgun microreads. For eleven genomes of size up to 39 Mb, we generated high-quality assemblies from 80X coverage by paired 30-base simulated reads modeled after real Illumina-Solexa reads. The bacterial genomes of C. jejuni and E. coli assemble optimally, yielding single perfect contigs, and larger genomes yield assemblies that are highly connected and accurate. Assemblies are presented in a graph form that retains intrinsic ambiguities such as those arising from polymorphism, thereby providing information that has been absent from previous genome assemblies. For both C. jejuni and E. coli, this assembly graph is a single edge encompassing the entire genome. Larger genomes produce more complicated graphs, but the vast majority of the bases in their assemblies are present in long edges that are nearly always perfect. We describe a general method for genome assembly that can be applied to all types of DNA sequence data, not only short read data, but also conventional sequence reads.

DOWNLOAD REPRINT (requires password)

CITED IN PUBLICATIONS BY OTHERS:

Holt RA, Jones SJM. The new paradigm of flow cell sequencing. Genome Research 18(6):839-846 (June 2008).
Salzberg SL, Sommer DD, Puiu D, Lee VT. Gene-boosted assembly of a novel bacterial genome from very short reads. PLoS Computational Biology 4(9):e1000186 (September 2008).
Shendure J, Ji HL. Next-generation DNA sequencing. Nature Biotechnology 26(10):1135-1145 (October 2008).
Coe BP, Chari R, Lockwood WW, Lam WL. Evolving strategies for a global gene expression analysis of cancer. Journal of Cellular Physiology 217(3):590-597 (December 2008).
Ossowski S, Schneeberger K, Clark RM, Lanz C, Warthmann N, Weigel D. Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Research 18(12):2024-2033 (December 2008).
Noguchi H, Taniguchi T, Itoh T. MetaGeneAnnotator: Detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA Research 15(6):387-396 (December 2008).
Denoeud F, Aury JM, Da Silva C, Noel B, Rogier O, Delledonne M, Morgante M, Valle G, Wincker P, Scarpelli C, Jaillon O, Artiguenave F. Annotating genomes with massive-scale RNA sequencing. Genome Biology 9(12):R175 (2008).
Nusbaum C, Ohsumi TK, Gomez J, Aquadro J, Victor TC, Warren RM, Hung DT, Birren BW, Lander ES, Jaffe DB. Sensitive, specific polymorphism discovery in bacteria using massively parallel sequencing. Nature Methods 6(1):67-69 (January 2009).
Jackson BG, Schnable PS, Aluru S. Parallel short sequence assembly of transcriptomes. BMC Bioinformatics 10(S14) (30 January 2009).
Hossain MS, Azimi N, Skiena S. Crystallizing short-read assemblies around seeds. BMC Bioinformatics 10(S16) (30 January 2009).
Farrer RA, Kemen E, Jones JDG, Studholme DJ. De novo assembly of the Pseudomonas syringae pv. syringae B728a genome using Illumina/Solexa short sequence reads. FEMS Microbiology Letters 291(1):103-111 (February 2009).
Chaisson MJ, Brinza D, Pevzner PA. De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Research 19(2):336-346 (February 2009).
Bryant DW, Wong WK, Mockler TC. QSRA - a quality-value guided de novo short read assembler. BMC Bioinformatics 10:69 (24 February 2009).
MacLean D, Jones JDG, Studholme DJ. Application of ‘next-generation’ sequencing technologies to microbial genetics. Nature Reviews Microbiology 7(4):287-296 (April 2009).
Scheibye-Alsing K, Hoffmann S, Frankel A, Jensen P, Stadler PF, Mang Y, Tommerup N, Gilchrist MJ, Nygard AB, Cirera S, Jorgensen CB, Fredholm M, Gorodkin J. Sequence assembly. Computational Biology and Chemistry 33(2):121-136 (April 2009).
Rokas A, Abbot P. Harnessing genomics for evolutionary insights. Trends in Ecology and Evolution 24(4):192-200 (April 2009).
Cook JJ, Zilles C. Characterizing and optimizing the memory footprint of de novo short read DNA sequence assembly. IEEE International Symposium on Performance Analysis of Systems and Software 143-152 (26-28 April 2009).
Jackson BG, Schnable PS, Aluru S. Assembly of large genomes from paired short reads. Bioinformatics and Computational Biology, Proceedings 5462:30-43 (2009).
Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol I. ABySS: a parallel assembler for short read sequence data. Genome Research 19(6):1117-1123 (June 2009).
Turner DJ, Keane TM, Sudbery I, Adams DJ. Next-generation sequencing of vertebrate experimental organisms. Mammalian Genome 20(6):327-338 (June 2009).
Pop M. Genome assembly reborn: recent computational challenges. Briefings in Bioinformatics 10(4):354-366 (July 2009).
Qu W, Hashimoto S, Morishita S. Efficient frequency-based de novo short-read clustering for error trimming in next-generation sequencing. Genome Research 19(7):1309-1315 (July 2009).
Du J, Bjornson RD, Zhang ZDD, Kong Y, Snyder M, Gerstein MB. Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants. PLoS Computational Biology 5(7):e1000432 (July 2009).
Sogin ML. Characterizing microbial population structures through massively parallel sequencing. In: Uncultivated Microorganisms (Epstein SS, ed.), pp 19-33. New York: Springer (24 July 2009).
Medvedev P, Brudno M. Maximum likelihood genome assembly. Journal of Computational Biology 16(8):1101-1116 (August 2009).
Studholme DJ, Ibanez SG, MacLean D, Dangl JL, Chang JH, Rathjen JP. A draft genome sequence and functional screen reveals the repertoire of type III secreted proteins of Pseudomonas syringae pathovar tabaci 11528. BMC Genomics 10:395 (24 August 2009).
Soderlund C, Johnson E, Bomhoff M, Descour A. PAVE: Program for assembling and viewing ESTs. BMC Genomics 10:400 (26 August 2009).
Schröder J, Schröder H, Puglisi SJ, Sinha R, Schmidt B. SHREC: a short-read error correction method. Bioinformatics 25(17):2157-2163 (1 September 2009).
DiGuistini S, Liao NY, Platt D, Robertson G, Seidel M, Chan SK, Docking TR, Birol I, Holt RA, Hirst M, Mardis E, Marra MA, Hamelin RC, Bohlmann J, Breuil C, Jones SJM. De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data. Genome Biology 10(9):R94 (11 September 2009).
Imelfort M. Sequence comparison tools. In: Bioinformatics: Tools and Applications (Edwards D, Stajich JE, Hansen D, eds.), pp 13-38. New York: Springer (22 September 2009).
Milos PM. Emergence of single-molecule sequencing and potential for molecular diagnostic applications. Expert Review of Molecular Diagnostics 9(7):659-666 (October 2009).
MacCallum I, Przybylski D, Gnerre S, Burton J, Shlyakhter I, Gnirke A, Malek J, McKernan K, Ranade S, Shea TP, Williams L, Young S, Nusbaum C, Jaffe DB. ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads. Genome Biology 10(10):R103 (1 October 2009).
Zhou X, Su Z, Sammons RD, Peng YH, Tranel PJ, Stewart CN, Yuan JS. Novel software package for cross-platform transcriptome analysis (CPTRA). BMC Bioinformatics 10 Supplement 11: S16 (8 October 2009).
Kerstens HHD, Crooijmans RPMA, Veenendaal A, Dibbits BW, Chin-A-Woeng TFC, den Dunnen JT, Groenen MAM. Large scale single nucleotide polymorphism discovery in unsequenced genomes using second generation high throughput sequencing technology: applied to turkey. BMC Genomics 10:479 (16 October 2009).
Zhao F, Hou H, Bao Q, Wu J. PGA4genomics for comparative genome assembly based on genetic algorithm optimization. Genomics 94(4):284-286 (October 2009).
Imelfort M, Edwards D. De novo sequencing of plant genomes using second-generation technologies. Briefings in Bioinformatics 10(6):609-618 (November 2009).
Flicek P, Birney E. Sense from sequence reads: methods for alignment and assembly. Nature Methods 6(11):S6-S12 (November 2009).
Nielsen CB, Jackman SD, Birol I, Jones SJM. ABySS-Explorer: visualizing genome sequence assemblies. IEEE Transactions on Visualization and Computer Graphics 15(6):881-888 (November-December 2009).
Gibbons JG, Janson EM, Hittinger CT, Johnston M, Abbot P, Rokas A. Benchmarking next-generation transcriptome sequencing for functional and evolutionary genomics. Molecular Biology and Evolution 26(12):2731-2744 (December 2009).
Zerbino DR, McEwen GK, Margulies EH, Birney E. Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler. PLoS ONE 4(12):e8407 (22 December 2009).
Zhou H, Zhao Z, Wang H. A new repeat family detection method based on sparse de Bruijn graph. Second International Symposium on Knowledge Acquisition and Modeling 3:147-150 (2009).
Metzker ML. Sequencing technologies — the next generation. Nature Reviews Genetics 11(1):31-46 (January 2010).
Clement NL, Snell Q, Clement MJ, Hollenhorst PC, Purwar J, Graves BJ, Cairns BR, Johnson WE. The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing. Bioinformatics 26(1):38-45 (1 January 2010).
Wooley JC, Ye Y. Metagenomics: facts and artifacts, and computational challenges. Journal of Computer Science and Technology 25(1):71-81 (January 2010).
Dalca AV, Brudno M. Genome variation discovery with high-throughput sequencing data. Briefings in Bioinformatics 11(1):3-14 (January 2010).
Kingsford C, Schatz MC, Pop M. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11:21 (12 January 2010).
Santuari L, Pradervand S, Amiguet-Vercher AM, Thomas J, Dorcey E, Harshman K, Xenarios I, Juenger TE, Hardtke CS. Substantial deletion overlap among divergent Arabidopsis genomes revealed by intersection of short reads and tiling arrays. Genome Biology 11(1):R4 (12 January 2010).
Palmer LE, Dejori M, Bolanos R, Fasulo D. Improving de novo sequence assembly using machine learning and comparative genomics for overlap correction. BMC Bioinformatics 11:33 (15 January 2010).
Jackman SD, Birol I. Assembling genomes using short-read sequencing technology. Genome Biology 11(1):202 (28 January 2010).
Marguerat S, Bahler J. RNA-seq: from technology to biology. Cellular and Molecular Life Sciences 67(4):569-579 (February 2010).
Young AL, Abaan HO, Zerbino D, Mullikin JC, Birney E, Margulies EH. A new strategy for genome assembly using short sequence reads and reduced representation libraries. Genome Research 20(2):249-256 (February 2010).
Li R, Zhu H, Ruan J, Qian W, Fang X, Shi Z, Li Y, Li S, Shan G, Kristiansen K, Li S, Yang H, Wang J, Wang J. De novo assembly of human genomes with massively parallel short read sequencing. Genome Research 20(2):265-272 (February 2010).
Horner DS, Pavesi G, Castrignano T, De Meo PD, Liuni S, Sammeth M, Picardi E, Pesole G. Bioinformatics approaches for genomics and post genomics applications of next-generation sequencing. Briefings in Bioinformatics 11(2):181-197 (March 2010).
Kelley DR, Salzberg SL. Detection and correction of false segmental duplications caused by genome mis-assembly. Genome Biology 11(3):R28 (10 March 2010).
Ratan A, Yu Z, Hayes VM, Schuster SC, Miller W. Calling SNPs without a reference sequence. BMC Bioinformatics 11:130 (15 March 2010).
Shi H, Schmidt B, Liu W, Müller-Wittig W. A parallel algorithm for error correction in high-throughput short-read data on CUDA-enabled graphics hardware. Journal of Computational Biology 17(4):603-615 (April 2010).
Kao W, Song YS. naiveBayesCall: an efficient model-based base-calling algorithm for high-throughput sequencing. In: Proceedings of the 14^th Annual International Conference on Research in Computational Molecular Biology (Bonnie Berger, ed.), Lisbon, 25-28 April 2010. Berlin: Springer (Lecture Notes in Computer Science 6044), pp 233-247 (2010).
Laserson J, Jojic V, Koller D. Genovo: de novo assembly for metagenomes. In: Proceedings of the 14^th Annual International Conference on Research in Computational Molecular Biology (Bonnie Berger, ed.), Lisbon, 25-28 April 2010. Berlin: Springer (Lecture Notes in Computer Science 6044), pp 341-356 (2010).
Peng Y, Leung HCM, Yiu SM, Chin FYL. IDBA - a practical iterative de Bruijn graph de novo assembler. In: Proceedings of the 14^th Annual International Conference on Research in Computational Molecular Biology (Bonnie Berger, ed.), Lisbon, 25-28 April 2010. Berlin: Springer (Lecture Notes in Computer Science 6044), pp 426-440 (2010).
Kuroshu RM, Watanabe J, Sugano S, Morishita S, Suzuki Y, Kasahara M. Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly. PLoS ONE 5(5):e10517 (7 May 2010).
Salmela L. Correction of sequencing errors in a mixed set of reads. Bioinformatics 26(10):1284-1290 (15 May 2010).
Shi H, Schmidt B, Liu W, Müller-Wittig, W. Quality-score guided error correction for short-read sequencing data using CUDA. Proceedings of the 10^th International Conference on Computational Science / Procedia Computer Science 1(1):1129-1138 (31 May – 2 June 2010).
Miller NJ, Richards S, Sappington TW. The prospects for sequencing the western corn rootworm genome. Journal of Applied Entomology 134(5):420-428 (June 2010).
Miller JR, Koren S, Sutton G. Assembly algorithms for next-generation sequencing data. Genomics 95(6):315-327 (June 2010).
Nagarajan H, Butler JE, Klimes A, Qiu Y, Zengler K, Ward J, Young ND, Methe BA, Palsson BO, Lovley DR, Barrett CL. De novo assembly of the complete genome of an enhanced electricity-producing variant of geobacter sulfurreducens using only short reads. PLoS ONE 5(6):e10922 (8 June 2010).
Dayarian A, Michael TP, Sengupta AM. SOPRA: Scaffolding algorithm for paired reads via statistical optimization. BMC Bioinformatics 11:345 (24 June 2010).
Schatz MC, Delcher AL, Salzberg SL. Assembly of large genomes using second-generation sequencing. Genome Research 20(9):1165-1173 (September 2010).
Nowrousian M. Next-generation sequencing techniques for eukaryotic microorganisms: sequencing-based solutions to biological problems. Eukaryotic Cell 9(9):1300-1310 (September 2010).
Paszkiewicz K, Studholme DJ. De novo assembly of short sequence reads. Briefings in Bioinformatics 11(5):457-472 (September 2010).
Surget-Groba Y, Montoya-Burgos JI. Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome Research 20(10):1432-1440 (October 2010).
Pham SK, Pevzner PA. DRIMM-Synteny: decomposing genomes into evolutionary conserved segments. Bioinformatics 26(20):2509-2516 (October 2010).
Yang XA, Dorman KS, Aluru S. Reptile: representative tiling for short read error correction. Bioinformatics 26(20):2526-2533 (October 2010).
Schadt EE, Turner S, Kasarskis A. A window into third-generation sequencing. Human Molecular Genetics 19(Sp. Iss. 2):R227-R240 (15 October 2010).
Boisvert S, Laviolette F, Corbeil J. Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. Journal of Computational Biology 17(11):1401-1415 (November 2010).
Zhao X, Palmer LE, Bolanos R, Mircean C, Fasulo D, Wittenberg GM. EDAR: an efficient error detection and removal algorithm for next generation sequencing data. Journal of Computational Biology 17(11):1431-1442 (November 2010).
Kelley DR, Schatz MC, Salzberg SL. Quake: quality-aware detection and correction of sequencing errors. Genome Biology 11(11):R116 (29 November 2010).
Xi R, Kim T, Park PJ. Detecting structural variations in the human genome using next generation sequencing. Briefings in Functional Genomics 9(5-6):405-415 (December 2010).
Celton JM, Christoffels A, Sargent DJ, Xu XM, Rees DJG. Genome-wide SNP identification by high-throughput sequencing and selective mapping allows sequence assembly positioning using a framework genetic linkage map. BMC Biology 8:155 (30 December 2010).
Wurtzel O, Dori-Bachash M, Pietrokovski S, Jurkevitch E, Sorek R. Mutation detection with next-generation resequencing through a mediator genome. PLoS ONE 5(12):e15628 (31 December 2010).
Ariyaratne PN, Sung WK. PE-Assembler: de novo assembler using short paired-end reads. Bioinformatics 27(2):167-174 (15 January 2011).
Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, Sharpe T, Hall G, Shea TP, Sykes S, Berlin AM, Aird D, Costello M, Daza R, Williams L, Nicol R, Gnirke A, Nusbaum C, Lander ES, Jaffe DB. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proceedings of the National Academy of Sciences of the United States of America 108(4):1513-1518 (25 January 2011).
Ilie L, Fazayeli F, Ilie S. HiTEC: accurate error correction in high-throughput sequencing data. Bioinformatics 27(3):295-302 (1 February 2011).
Yang X, Aluru S, Dorman KS. Repeat-aware modeling and correction of short read errors. BMC Bioinformatics 12:S52 (15 February 2011).
Kao W, Song YS. naiveBayesCall: an efficient model-based base-calling algorithm for high-throughput sequencing. Journal of Computational Biology 18(3):365-377 (March 2011).
Laserson J, Jojic V, Koller D. Genovo: de novo assembly for metagenomes. Journal of Computational Biology 18(3):429-443 (March 2011).
Liu Y, Schmidt B, Maskell DL. DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI. BMC Bioinformatics 12:85 (29 March 2011).
Donmez N, Brudno M. Hapsembler: an assembler for highly polymorphic genomes. Research in Computational Molecular Biology 15 (Lecture Notes in Computer Science 6577):38-52 (28-31 March 2011).
Medvedev P, Pham S, Chaisson M, Tesler G, Pevzner P. Paired de Bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers. Research in Computational Molecular Biology 15 (Lecture Notes in Computer Science 6577):238-251 (28-31 March 2011).
Wetzel J, Kingsford C, Pop M. Assessing the benefits of using mate-pairs to resolve repeats in de novo short-read prokaryotic assemblies. BMC Bioinformatics 12:95 (13 April 2011).
Narzisi G, Mishra B. Comparing de novo genome assembly: the long and short of it. PLoS ONE 6(4):e19175 (29 April 2011).
Chain PSG, Xie G, Starkenburg SR, Scholz MB, Beckloff N, Lo C, Davenport KW, Reitenga KG, Daligault HE, Detter JC, Freitas TAK, Gleasner CD, Green LD, Han CS, McMurry KK, Meincke LJ, Shen X, Zeytun A. Genomics for key players in the N cycle: from guinea pigs to the next frontier. Methods in Enzymology 496 (Research on Nitrification and Related Processes, Part B): 289-318 (2011).
Healy J, Chambers D. Fast and accurate genome anchoring using fuzzy hash maps. 5th International Conference on Practical Applications of Biotechnology and Bioinformatics 93149-156 (2011).
Lai AG, Denton-Giles M, Mueller-Roeber B, Schippers JHM, Dijkwel PP. Positional information resolves structural variations and uncovers an evolutionarily divergent genetic locus in accessions of Arabidopsis thaliana. Genome Biology and Evolution 3:627-640 (27 May 2011).
Nielsen R, Paul JS, Albrechtsen A, Song YS. Genotype and SNP calling from next-generation sequencing data. Nature Reviews Genetics 12(6):443-451 (June 2011).
Bao S, Jiang R, Kwan W, Wang B, Ma X, Song Y. Evaluation of next-generation sequencing software in mapping and assembly. Journal of Human Genetics 6(6):406-414 (June 2011).
Kao W, Chan AH, Song YS. ECHO: A reference-free short-read error correction algorithm. Genome Research 21(7):1181-1192 (July 2011).
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng QD, Chen ZH, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology 29(7):644-652 (July 2011).
Walshaw J, Etherington GJ, MacLean D. Next-generation sequencing approaches to metagenomics. In: Metagenomics: Current Innovations and Future Trends (Diana Marco, ed.), pp 63-88. Norwich: Caister Academic Press (12 July 2011).
Charuvaka A, Rangwala H. Evaluation of short read metagenomic assembly. BMC Genomics 12(Supplement 2):S8 (27 July 2011).
Parrish N, Hormozdiari F, Eskin E. Assembly of non-unique insertion content using next-generation sequencing. BMC Bioinformatics 12(Supplement 6):S3 (28 July 2011).
Cerdeira LT, Carneiro AR, Ramos RTJ, de Almeida SS, D’Afonseca V, Schneider MPC, Baumbach J, Tauch A, McCulloch JA, Azevedo VAC, Silva A. Rapid hybrid de novo assembly of a microbial genome using only short reads: corynebacterium pseudotuberculosis I19 as a case study. Journal of Microbiological Methods 86(2):218-223 (August 2011).
Melsted P, Pritchard JK. Efficient counting of k-mers in DNA sequences using a Bloom filter. BMC Bioinformatics 12:333 (10 August 2011).
Chapman JA, Ho I, Sunkara S, Luo SJ, Schroth GP, Rokhsar DS. Meraculous: de novo genome assembly with short paired-end reads. PLoS ONE 6(8):e23501 (18 August 2011).
Zhang X, Tan J, Yang M, Yin Y, Al-Mssallem IS, Yu J. Date palm genome project at the Kingdom of Saudi Arabia. Date Palm Biotechnology (Shri Mohan Jain, Jameel M Al-Khayri, Dennis V. Johnson, eds.), pp 427-448. New York: Springer (19 August 2011).
Liu Y, Schmidt B, Maskell DL. Parallelized short read assembly of large genomes using de Bruijn graphs. BMC Bioinformatics 12:354 (25 August 2011).
Jackson SA, Iwata A, Lee SH, Schmutz J, Shoemaker R. Sequencing crop genomes: approaches and applications. New Phytologist 191(4):915-925 (September 2011).
Wei W, Qi X, Wang L, Zhang Y, Hua W, Li D, Lv H, Zhang X. Characterization of the sesame (Sesamum indicum l.) global transcriptome using Illumina paired-end sequencing and development of EST-SSR markers. BMC Genomics 12:451 (19 September 2011).
Henry CS, Overbeek R, Xia FF, Best AA, Glass E, Gilbert J, Larsen P, Edwards R, Disz T, Meyer F, Vonstein V, DeJongh M, Bartels D, Desai N, D'Souza M, Devoid S, Keegan KP, Olson R, Wilke A, Wilkening J, Stevens RL. Connecting genotype to phenotype in the era of high-throughput sequencing. Biochimica et Biophysica Acta — General Subjects 1810(10):967-977 (October 2011).
Martin JA, Wang Z. Next-generation transcriptome assembly. Nature Reviews Genetics 12(10):671-682 (October 2011).
Compeau PEC, Pevzner PA, Tesler G. How to apply de Bruijn graphs to genome assembly. Nature Biotechnology 29(11):987-991 (November 2011).
Koren S, Treangen TJ, Pop M. Bambus 2: scaffolding metagenomes. Bioinformatics 27(21):2964-2971 (1 November 2011).
Earl D, Bradnam K, St John J, Darling A, Lin DW, Fass J, Hung OKY, Buffalo V, Zerbino DR, Diekhans M, Nguyen N, Ariyaratne PN, Sung WK, Ning ZM, Haimel M, Simpson JT, Fonseca NA, Birol I, Docking TR, Ho IY, Rokhsar DS, Chikhi R, Lavenier D, Chapuis G, Naquin D, Maillet N, Schatz MC, Kelley DR, Phillippy AM, Koren S, Yang SP, Wu W, Chou WC, Srivastava A, Shaw TI, Ruby JG, Skewes-Cox P, Betegon M, Dimon MT, Solovyev V, Seledtsov I, Kosarev P, Vorobyev D, Ramirez-Gonzalez R, Leggett R, MacLean D, Xia FF, Luo RB, Li ZY, Xie YL, Liu BH, Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Yin SY, Sharpe T, Hall G, Kersey PJ, Durbin R, Jackman SD, Chapman JA, Huang XQ, DeRisi JL, Caccamo M, Li YR, Jaffe DB, Green RE, Haussler D, Korf I, Paten B. Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Research 21(12):2224-2241 (December 2011).
Salmela L, Makinen V, Valimaki N, Ylinen J, Ukkonen E. Fast scaffolding with small independent mixed integer programs. Bioinformatics 27(23):3259-3265 (1 December 2011).
Chen G, Li R, Shi L, Qi J, Hu P, Luo J, Liu M, Shi T. Revealing the missing expressed genes beyond the human reference genome by RNA-seq. BMC Genomics 12:590 (2 December 2011).
Matsutani M, Hirakawa H, Saichana N, Soemphol W, Yakushi T, Matsushita K. Genome-wide phylogenetic analysis of differences in thermotolerance among closely related Acetobacter pasteurianus strains. Microbiology 158(1):229-239 (January 2012).
Lee HC, Lai K, Lorenc MT, Imelfort M, Duran C, Edwards D. Bioinformatics tools and databases for analysis of next-generation sequence data. Briefings in Functional Genomics 11(1):12-24 (January 2012).
Iqbal Z, Caccamo M, Turner I, Flicek P, McVean G. De novo assembly and genotyping of variants using colored de Bruijn graphs. Nature Genetics 44(2):226-232 (February 2012).
McCouch SR, McNally KL, Wang W, Hamilton RS. Genomics of gene banks: a case study in rice. American Journal of Botany 99(2):407-423 (February 2012).
Lassen KS, Schultz H, Heegaard NHH, He M. A novel DNAseq program for enhanced analysis of Illumina GAII data: a case study on antibody complementarity-determining regions. New Biotechnology 29(3):271-278 (15 February 2012).
Lee H, Tang HX. Next-generation sequencing technologies and fragment assembly algorithms. Methods in Molecular Biology 855 (Evolutionary Genomics: Statisical And Computational Methods), vol. 1: 155-174 (2012).
Seok J, Xu WH, Jiang H, Davis RW, Xiao WZ. Knowledge-based reconstruction of mrna transcripts with short sequencing reads for transcriptome research. PLoS ONE 7(2):e31440 (1 February 2012).
Bryant DW, Mockler TC. De novo short-read assembly. Bioinformatics For High Throughput Sequencing 85-105 (Springer, 2012).
Wajid B, Serpedin E. Review of general algorithmic features for genome assemblers for next generation sequencers. Genomics Proteomics & Bioinformatics 10(2):58-73 (April 2012).
Schulz MH, Zerbino DR, Vingron M, Birney E. Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28(8):1086-1092 (15 April 2012).
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology 19(5):455-477 (May 2012).
Chandra YG, Lee J, Kong BW. Genome sequence comparison of two United States live attenuated vaccines of infectious laryngotracheitis virus (ILTV). Virus Genes 44(3):470-474 (June 2012).
Henson J, Tischler G, Ning Z. Next-generation sequencing and large genome assemblies. Pharmacogenomics 13(8):901-915 (June 2012).
Peng Y, Leung HCM, Yiu SM, Chin FYL. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28(11):1420-1428 (1 June 2012).
Ronen R, Boucher C, Chitsaz H, Pevzner P. SEQuel: improving the accuracy of genome assemblies. Bioinformatics 28(12):i188-i196 (15 June 2012).
Ahn JH. ccTSA: a coverage-centric threaded sequence assembler. PLoS ONE 7(6):e0039232 (19 June 2012).
Bashir A, Klammer AA, Robins WP, Chin CS, Webster D, Paxinos E, Hsu D, Ashby M, Wang S, Peluso P, Sebra R, Sorenson J, Bullard J, Yen J, Valdovino M, Mollova E, Luong K, Lin S, Lamay B, Joshi A, Rowe L, Frace M, Tarr CL, Turnsek M, Davis BM, Kasarskis A, Mekalanos JJ, Waldor MK, Schadt EE. A hybrid approach for the automated finishing of bacterial genomes. Nature Biotechnology 30(7):701-707 (1 July 2012).
Shen X, Vikalo H. ParticleCall: a particle filter for base calling in next-generation sequencing systems. BMC Bioinformatics 13:160 (9 July 2012).
Wang XV, Blades N, Ding J, Sultana R, Parmigiani G. Estimation of sequencing error rates in short reads. BMC Bioinformatics 13:185 (30 July 2012).
Wang S, Wang X, He Q, Liu X, Xu W, Li L, Gao J, Wang F. Transcriptome analysis of the roots at early and late seedling stages using Illumina paired-end sequencing and development of EST-SSR markers in radish. Plant Cell Reports 31(8):1437-1447 (August 2012).
Lin HC, Goldstein S, Mendelowitz L, Zhou S, Wetzel J, Schwartz DC, Pop M. AGORA: assembly guided by optical restriction alignment. BMC Bioinformatics 13:189 (2 August 2012).
Erchin S, Bilal W. Review of general algorithmic features for genome assemblers for next generation sequencers. Genomics, Proteomics and Bioinformatics 10(2):58-73 (2012).
Xu B, Gao J, Li C. An efficient algorithm for DNA fragment assembly in MapReduce. Biochemical and Biophysical Research Communications 426(3):395-398 (28 September 2012).
Roy RS, Chen KC, Sengupta AM, Schliep A. SLIQ: simple linear inequalities for efficient contig scaffolding. Journal of Computational Biology 19(10):1162-1175 (October 2012).
Ribeiro FJ, Przybylski D, Yin S, Sharpe T, Gnerre S, Abouelleil A, Berlin AM, Montmayeur A, Shea TP, Walker BJ, Young SK, Russ C, Nusbaum C, MacCallum I, Jaffe DB. Finished bacterial genomes from shotgun sequence data. Genome Research 22(11):2270-2277 (November 2012).
Liu GE, Bickhart DM. Copy number variation in the cattle genome. Functional & Integrative Genomics 12(4):609-624 (November 2012).
English AC, Richards S, Han Y, Wang M, Vee V, Qu J, Qin X, Muzny DM, Reid JG, Worley KC, Gibbs RA. Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS ONE 7(11):e47768 (21 November 2012).
Ren X, Liu T, Dong J, Sun L, Yang J, Zhu Y, Jin Q. Evaluating de Bruijn graph assemblers on 454 transcriptomic data. PLoS ONE 7(12):e51188 (7 December 2012).
Wu X, Heo Y, El Hajj I, Hwu W, Chen D, Ma J. TIGER: tiled iterative genome assembler. BMC Bioinformatics 13(S19):S18 (19 December 2012).
Fancello L, Raoult D, Desnues C. Computational tools for viral metagenomics and their application in clinical research. Virology 434(2):162-174 (20 December 2012).

ALLPATHS:de novo assembly of whole-genome shotgun microreads

ALLPATHS:
de novo assembly of whole-genome shotgun microreads