ALLPATHS:
de novo assembly of whole-genome shotgun microreads

Jonathan Butler
Iain MacCallum
Michael Kleber
Ilya A Shlyakhter
Matthew K Belmonte
Eric S Lander
Chad Nusbaum
David B Jaffe

Genome Research 18(5):810-820 (May 2008).

ABSTRACT

New DNA sequencing technologies deliver data at dramatically lower costs, but demand new analytical methods to take full advantage of the very short reads that they produce. We provide an initial, theoretical solution to the challenge of de novo assembly from whole-genome shotgun microreads. For eleven genomes of size up to 39 Mb, we generated high-quality assemblies from 80X coverage by paired 30-base simulated reads modeled after real Illumina-Solexa reads. The bacterial genomes of C. jejuni and E. coli assemble optimally, yielding single perfect contigs, and larger genomes yield assemblies that are highly connected and accurate. Assemblies are presented in a graph form that retains intrinsic ambiguities such as those arising from polymorphism, thereby providing information that has been absent from previous genome assemblies. For both C. jejuni and E. coli, this assembly graph is a single edge encompassing the entire genome. Larger genomes produce more complicated graphs, but the vast majority of the bases in their assemblies are present in long edges that are nearly always perfect. We describe a general method for genome assembly that can be applied to all types of DNA sequence data, not only short read data, but also conventional sequence reads.


To receive a password that will enable you to download a reprint, enter your email address here:


DOWNLOAD REPRINT (requires password)


CITED IN PUBLICATIONS BY OTHERS:

  1. Holt RA, Jones SJM. The new paradigm of flow cell sequencing. Genome Research 18(6):839-846 (June 2008).
  2. Hossain S, Azimi N, Skiena S. Crystallizing short-read assemblies around lone Sanger reads. Bioinformatics, in press.