Open Access Highly Accessed Research

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species

Keith R Bradnam1*, Joseph N Fass1, Anton Alexandrov36, Paul Baranay2, Michael Bechner39, Inanç Birol33, Sébastien Boisvert1011, Jarrod A Chapman20, Guillaume Chapuis79, Rayan Chikhi79, Hamidreza Chitsaz6, Wen-Chi Chou1416, Jacques Corbeil1013, Cristian Del Fabbro17, T Roderick Docking33, Richard Durbin34, Dent Earl40, Scott Emrich3, Pavel Fedotov36, Nuno A Fonseca3035, Ganeshkumar Ganapathy38, Richard A Gibbs32, Sante Gnerre22, Élénie Godzaridis11, Steve Goldstein39, Matthias Haimel30, Giles Hall22, David Haussler40, Joseph B Hiatt41, Isaac Y Ho20, Jason Howard38, Martin Hunt34, Shaun D Jackman33, David B Jaffe22, Erich D Jarvis38, Huaiyang Jiang32, Sergey Kazakov36, Paul J Kersey30, Jacob O Kitzman41, James R Knight37, Sergey Koren2425, Tak-Wah Lam29, Dominique Lavenier789, François Laviolette12, Yingrui Li2829, Zhenyu Li28, Binghang Liu28, Yue Liu32, Ruibang Luo2829, Iain MacCallum22, Matthew D MacManes5, Nicolas Maillet89, Sergey Melnikov36, Delphine Naquin89, Zemin Ning34, Thomas D Otto34, Benedict Paten40, Octávio S Paulo31, Adam M Phillippy2425, Francisco Pina-Martins31, Michael Place39, Dariusz Przybylski22, Xiang Qin32, Carson Qu32, Filipe J Ribeiro23, Stephen Richards32, Daniel S Rokhsar2021, J Graham Ruby2627, Simone Scalabrin17, Michael C Schatz4, David C Schwartz39, Alexey Sergushichev36, Ted Sharpe22, Timothy I Shaw1415, Jay Shendure41, Yujian Shi28, Jared T Simpson34, Henry Song32, Fedor Tsarev36, Francesco Vezzi19, Riccardo Vicedomini1718, Bruno M Vieira31, Jun Wang28, Kim C Worley32, Shuangye Yin22, Siu-Ming Yiu29, Jianying Yuan28, Guojie Zhang28, Hao Zhang28, Shiguo Zhou39 and Ian F Korf1*

Author Affiliations

1 Genome Center, UC, Davis, CA 95616, USA

2 Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA

3 Department of Computer Science and Engineering, University of Notre Dame, South Bend, IN 46556, USA

4 Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA

5 Berkeley California Institute for Quantitative Biosciences, University of California, Berkeley, CA 94720, USA

6 Department of Computer Science, Wayne State University, Detroit, MI 48202, USA

7 Computer Science department, ENS Cachan/IRISA, 35042 Rennes, France

8 INRIA, Rennes Bretagne Atlantique, 35042 Rennes, France

9 CNRS/Symbiose, IRISA, 35042 Rennes, France

10 Infectious Diseases Research Center, Université Laval, Québec, QC G1V 4G2, Canada

11 Faculty of Medicine, Université Laval, Québec, QC G1V 4G2, Canada

12 Department of Computer Science and Software Engineering, Faculty of Science and Engineering, Université Laval, Québec, QC G1V 4G2, Canada

13 Department of Molecular Medicine, Faculty of Medicine, Université Laval, Québec, QC G1V 4G2, Canada

14 Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA

15 Department of Epidemiology and Biostatistics, College of Public Health, University of Georgia, Athens, GA 30602, USA

16 Institute of Aging Research, Hebrew SeniorLife, Boston, MA 02131, USA

17 IGA, Institute of Applied Genomics, 33100 Udine, Italy

18 Department of Mathematics and Computer Science, University of Udine, 33100 Udine, Italy

19 Science for Life Laboratory, KTH Royal Institute of Technology, 17121 Solna, Sweden

20 DOE Joint Genome Institute, Walnut Creek, CA 94598, USA

21 Department of Molecular and Cell Biology, UC Berkeley, Berkeley, CA 94720, USA

22 Broad Institute, Cambridge, MA 02142, USA

23 New York Genome Center, New York, NY 10022, USA

24 National Biodefense Analysis and Countermeasures Center, Frederick, MD 21702, USA

25 Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA

26 Department of Biochemistry and Biophysics, University of California, San Francisco, CA 94143, USA

27 Howard Hughes Medical Institute, Bethesda, MD 20814, USA

28 BGI-Shenzhen, Shenzhen, Guangdong 518083, China

29 HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory, The University of Hong Kong, Pok Fu Lam Rd, Hong Kong, Hong Kong

30 EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

31 Computational Biology & Population Genomics Group, Centre for Environmental Biology, Department of Animal Biology, Faculty of Sciences of the University of Lisbon, Campo Grande, P-1749-016 Lisbon, Portugal

32 Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA

33 Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 4E6, Canada

34 The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK

35 CRACS - INESC TEC, 4200-465 Porto, Portugal

36 National Research University of Information Technology, Mechanics and Optics (University ITMO), St. Petersburg 197101, Russia

37 454 Life Sciences, 15 Commercial Street, Branford, CT 06405, USA

38 Duke University Medical Center, Durham, NC 27710 USA

39 Laboratory for Molecular and Computational Genomics, Departments of Chemistry and Genetics, UW-Biotechnology Center, 425 Henry Mall, Madison, WI 53706, USA

40 Howard Hughes Medical Institute, Center for Biomolecular Science & Engineering, University of California, Santa Cruz, CA 95064 USA

41 Department of Genome Sciences, School of Medicine, University of Washington, Seattle, WA 98195, USA

For all author emails, please log on.

GigaScience 2013, 2:10  doi:10.1186/2047-217X-2-10

Published: 22 July 2013

Abstract

Background

The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly.

Results

In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies.

Conclusions

Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.

Keywords:
Genome assembly; N50; Scaffolds; Assessment; Heterozygosity; COMPASS