Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species
- Equal contributors
1 Genome Center, UC, Davis, CA 95616, USA
2 Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA
3 Department of Computer Science and Engineering, University of Notre Dame, South Bend, IN 46556, USA
4 Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
5 Berkeley California Institute for Quantitative Biosciences, University of California, Berkeley, CA 94720, USA
6 Department of Computer Science, Wayne State University, Detroit, MI 48202, USA
7 Computer Science department, ENS Cachan/IRISA, 35042 Rennes, France
8 INRIA, Rennes Bretagne Atlantique, 35042 Rennes, France
9 CNRS/Symbiose, IRISA, 35042 Rennes, France
10 Infectious Diseases Research Center, Université Laval, Québec, QC G1V 4G2, Canada
11 Faculty of Medicine, Université Laval, Québec, QC G1V 4G2, Canada
12 Department of Computer Science and Software Engineering, Faculty of Science and Engineering, Université Laval, Québec, QC G1V 4G2, Canada
13 Department of Molecular Medicine, Faculty of Medicine, Université Laval, Québec, QC G1V 4G2, Canada
14 Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
15 Department of Epidemiology and Biostatistics, College of Public Health, University of Georgia, Athens, GA 30602, USA
16 Institute of Aging Research, Hebrew SeniorLife, Boston, MA 02131, USA
17 IGA, Institute of Applied Genomics, 33100 Udine, Italy
18 Department of Mathematics and Computer Science, University of Udine, 33100 Udine, Italy
19 Science for Life Laboratory, KTH Royal Institute of Technology, 17121 Solna, Sweden
20 DOE Joint Genome Institute, Walnut Creek, CA 94598, USA
21 Department of Molecular and Cell Biology, UC Berkeley, Berkeley, CA 94720, USA
22 Broad Institute, Cambridge, MA 02142, USA
23 New York Genome Center, New York, NY 10022, USA
24 National Biodefense Analysis and Countermeasures Center, Frederick, MD 21702, USA
25 Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
26 Department of Biochemistry and Biophysics, University of California, San Francisco, CA 94143, USA
27 Howard Hughes Medical Institute, Bethesda, MD 20814, USA
28 BGI-Shenzhen, Shenzhen, Guangdong 518083, China
29 HKU-BGI Bioinformatics Algorithms and Core Technology Research Laboratory, The University of Hong Kong, Pok Fu Lam Rd, Hong Kong, Hong Kong
30 EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
31 Computational Biology & Population Genomics Group, Centre for Environmental Biology, Department of Animal Biology, Faculty of Sciences of the University of Lisbon, Campo Grande, P-1749-016 Lisbon, Portugal
32 Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
33 Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia V5Z 4E6, Canada
34 The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
35 CRACS - INESC TEC, 4200-465 Porto, Portugal
36 National Research University of Information Technology, Mechanics and Optics (University ITMO), St. Petersburg 197101, Russia
37 454 Life Sciences, 15 Commercial Street, Branford, CT 06405, USA
38 Duke University Medical Center, Durham, NC 27710 USA
39 Laboratory for Molecular and Computational Genomics, Departments of Chemistry and Genetics, UW-Biotechnology Center, 425 Henry Mall, Madison, WI 53706, USA
40 Howard Hughes Medical Institute, Center for Biomolecular Science & Engineering, University of California, Santa Cruz, CA 95064 USA
41 Department of Genome Sciences, School of Medicine, University of Washington, Seattle, WA 98195, USA
GigaScience 2013, 2:10 doi:10.1186/2047-217X-2-10Published: 22 July 2013
Additional file 1:
Supplementary Data Description. Full details of the Illumina, Roche 454, and Pacific Biosciences sequencing data that were made available to participating teams.
Format: DOCX Size: 28KB Download file
Additional file 2:
Supplementary Results. Additional figures and tables to accompany the main text.
Format: DOCX Size: 2.6MB Download file
Additional file 3:
Assembly Instructions. Details provided by participating teams on how to use software to recreate their assemblies. All teams were asked to provide this information.
Format: DOCX Size: 47KB Download file
Additional file 4:
Master spreadsheet containing all results. Details of 102 different metrics for every assembly. First sheet contains a detailed README explaining all columns. Second sheet contains the data. Third sheet shows z-score values for 10 key metrics for all assemblies. Fourth sheet shows average rankings for all 10 key metrics.
Format: XLSX Size: 109KB Download file
Additional file 5:
Details of all SRA/ENA/DDBJ accessions for input read data. This spreadsheet contains identifiers for all Project, Study, Sample, Experiment, and Run accessions for bird, fish, and snake input read data.
Format: XLSX Size: 22KB Download file
Additional file 6:
All results. This file contains the same information as in sheet 2 of the master spreadsheet (Additional file 4), but in a format more suitable for parsing by computer scripts.
Format: CSV Size: 31KB Download file
Additional file 7:
Bird scaffolds mapped to bird Fosmids. Results of using BLAST to align 46 assembled Fosmid sequences to bird scaffold sequences. Each figure represents an assembled Fosmid sequence with tracks showing read coverage, presence of repeats, and alignments to each assembly.
Format: PDF Size: 229KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 8:
Snake scaffolds mapped to snake Fosmids. Results of using BLAST to align 24 assembled Fosmid sequences to snake scaffold sequences. Each figure represents an assembled Fosmid sequence with tracks showing read coverage, presence of repeats, and alignments to each assembly.
Format: PDF Size: 117KB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 9:
Bird and snake Validated Fosmid Region (VFR) data. The validated regions of the bird and snake Fosmids are available as two FASTA-formatted files. This dataset also includes two FASTA files that represent the 100 nt 'tag' sequences that were extracted from the VFRs.
Format: GZ Size: 521KB Download file