Open Access Highly Accessed Data Note

A locally funded Puerto Rican parrot (Amazona vittata) genome sequencing project increases avian data and advances young researcher education

Taras K Oleksyk1*, Jean-Francois Pombert2, Daniel Siu3, Anyimilehidi Mazo-Vargas1, Brian Ramos1, Wilfried Guiblet1, Yashira Afanador1, Christina T Ruiz-Rodriguez14, Michael L Nickerson4, David M Logue1, Michael Dean4, Luis Figueroa5, Ricardo Valentin6 and Juan-Carlos Martinez-Cruzado1

Author Affiliations

1 University of Puerto Rico at Mayagüez, Mayagüez, Puerto Rico

2 University of British Columbia, Vancouver, BC, Canada

3 Axeq Technologies, Seoul, South Korea

4 Cancer and Inflammation Program, National Cancer Institute, NIH, Frederick, MD, USA

5 Compañía de Parques Nacionales de Puerto Rico, San Juan, Puerto Rico

6 Department of Natural and Environmental Resources, San Juan, Puerto Rico

For all author emails, please log on.

GigaScience 2012, 1:14  doi:10.1186/2047-217X-1-14

Published: 28 September 2012

Additional files

Additional file 1:

Supplementary materials.

Format: DOC Size: 55KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 2:

Table S1. Quality and volume of four DNA samples extracted from whole blood of two Amazona vittata parrots selected for the genome sequencing.

Format: DOC Size: 36KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 3:

Table S2. Results of the genome sequencing (Illumina HiSeq, Axeq Technologies). Pa9a_1 and Pa9a_2 represent the opposite ends of the 300 bp short reads, and the Pa9a-MP_1 and Pa9a-MP_2 are the 2,500 bp mate pairs (MP). All sequences were 101 bp long.

Format: DOC Size: 37KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 4:

Table S3. Results of the genome assembly by SOAPdenovo [8].

Format: DOC Size: 42KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 5:

Supplementary figures. Figure S1. Venn diagram of the overlap between the number of A. vittata scaffolds and the G. gallus transcripts from GenBank that were mapped to them by BLAST. Figure S2. A single example of chimera detected on scaffold-74754 after visual inspection of reads mapped to 100 largest scaffolds. Figure S3. Percentage of scaffolds containing fragments with > 95% similarity to GenBank sequences. Figure S4. Comparison between categories of A. guttata scaffolds (described earlier in Figure 2): The box plots show the medians, Q1, Q3 and the extreme values. The means are shown in Table 3. A. Distribution of scaffold lengths; B. Distribution of densities of genes mapped per kbp of scaffold length. C. Differences in the distribution of proportion of the length of the scaffold mapped to a G. gallus transcript from NCBI Entrez Gene database. D. Differences in the distribution of proportion of the length of the scaffold mapped to a known repeat class using RepeatMasker software [5]. Figure S5. Distribution of major classes of repetitive sequences found on A. vittata scaffolds. Figure S6. Relationship between the quality scores of the alignments between the parrot scaffolds to the chicken and zebra finch genomes: A. All scaffolds. B. Mismatched scaffolds only (those scaffolds that shared similarity with sequences of G. gallus and T. guttata genomes but mapped to different chromosomes in the two species; see classification in Figure 2). C. Matched sequences only (those that mapped to the same chromosome in reference genomes of the two avian species). Figure S7. Relationship between the size of a scaffold and the quality of its alignment to T. guttata and/or G. gallus genome sequence: A. All scaffolds aligned to the T. guttata genome. B. All scaffolds aligned to the G. gallus genome. C. Scaffolds from T. guttata that Mismatched scaffolds mapped to different chromosomes in G. gallus; see classification in Figure 2). D. Scaffolds from G. gallus that Mismatched scaffolds mapped to different chromosomes in T. guttata). E. Matched sequences from T. guttata only (those that mapped to the same chromosome in reference genomes of the two avian species), F. Matched sequences from G. gallus only (those that mapped to the same chromosome in reference genomes of the two avian species). Figure S8. Small fragments are repeat- rich and gene-rich: A. Relationship between the length of the scaffolds and the proportion of it length matched to the G. gallus sequences from NCBI Entrez Gene database. B. Relationship between the length of the scaffolds and the proportion of it length designated by RepeatMasker as repetitive sequence.

Format: PDF Size: 3MB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 6:

Table S4A. Summary of the alignment of A. vittata sequences to the G. gallus genome sequence containing only the top alignment for each scaffold, its chromosomal position and quality scores.

Format: XLS Size: 9.7MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 7:

Table S4B. Summary of the alignment of A. vittata sequences to the T. guttata genome sequence containing only the top alignment for each scaffold, its chromosomal position and quality scores.

Format: XLS Size: 9.8MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 8:

Table S4C. The database of the alignment information of A. vittata sequences to G. gallus and T. guttata genome sequence by BLAST.

Format: TXT Size: 12.7MB Download file

Open Data

Additional file 9:

Table S5.Proportions of sequences with some similarity that mapped to chromosomes of two reference avian genomes(G. gallus and T. guttata).

Format: XLS Size: 42KB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 10:

Table S6A.The summary of the database of GenBank sequences with more than 95% similarity with the parrot scaffolds.

Format: XLS Size: 4.8MB Download file

This file can be viewed with: Microsoft Excel Viewer

Open Data

Additional file 11:

Table S6B. The database of GenBank sequences with more than 95% similarity with the parrot scaffolds found by BLAST. S7A. A map of G. gallus transcripts from NCBI Entrez Gene database that mapped to one of the A. guttata scaffolds.

Format: TXT Size: 2.1MB Download file

Open Data

Additional file 12:

Table S7A. A map of G. gallus transcripts from NCBI Entrez Gene.

Format: XLSX Size: 11KB Download file

Open Data

Additional file 13:

Table S7B. The database of alignments between of G. gallus transcripts from NCBI Entrez Gene database and A. guttata scaffolds by BLAST.

Format: TXT Size: 13.3MB Download file

Open Data

Additional file 14:

Table S8. Distribution of different cases of repetitive elements among different classes of A. guttata scaffolds.

Format: XLSX Size: 25KB Download file

Open Data

Additional file 15:

Table S9. Bioinformatics tools and outputs for scaffold and gene annotation.

Format: DOC Size: 33KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 16:

Table S10. An example of annotation output produced by a student in the Genome annotation class using A. vittata genome.

Format: XLSX Size: 6.8MB Download file

Open Data