Open Access Open Badges Technical Note

CGtag: complete genomics toolkit and annotation in a cloud-based Galaxy

Saskia Hiltemann12*, Hailiang Mei3, Mattias de Hollander4, Ivo Palli1, Peter van der Spek1, Guido Jenster2 and Andrew Stubbs1

Author Affiliations

1 Department of Bioinformatics, Erasmus MC, Dr. Molewaterplein 5, 3015 GE Rotterdam, The Netherlands

2 Department of Urology, Erasmus MC, Dr. Molewaterplein 5, 3015 GE Rotterdam, The Netherlands

3 Netherlands Bioinformatics Center, NBIC, Geert Grooteplein 28, 6525 GA Nijmegen, The Netherlands

4 Department of Microbial Ecology, Netherlands Institute of Ecology, NIOO-KNAW, Droevendaalsesteeg 10, 6708 PB Wageningen, The Netherlands

For all author emails, please log on.

GigaScience 2014, 3:1  doi:10.1186/2047-217X-3-1

Published: 24 January 2014



Complete Genomics provides an open-source suite of command-line tools for the analysis of their CG-formatted mapped sequencing files. Determination of; for example, the functional impact of detected variants, requires annotation with various databases that often require command-line and/or programming experience; thus, limiting their use to the average research scientist. We have therefore implemented this CG toolkit, together with a number of annotation, visualisation and file manipulation tools in Galaxy called CGtag (Complete Genomics Toolkit and Annotation in a Cloud-based Galaxy).


In order to provide research scientists with web-based, simple and accurate analytical and visualisation applications for the selection of candidate mutations from Complete Genomics data, we have implemented the open-source Complete Genomics tool set, CGATools, in Galaxy. In addition we implemented some of the most popular command-line annotation and visualisation tools to allow research scientists to select candidate pathological mutations (SNV, and indels). Furthermore, we have developed a cloud-based public Galaxy instance to host the CGtag toolkit and other associated modules.


CGtag provides a user-friendly interface to all research scientists wishing to select candidate variants from CG or other next-generation sequencing platforms’ data. By using a cloud-based infrastructure, we can also assure sufficient and on-demand computation and storage resources to handle the analysis tasks. The tools are freely available for use from an NBIC/CTMM-TraIT (The Netherlands Bioinformatics Center/Center for Translational Molecular Medicine) cloud-based Galaxy instance, or can be installed to a local (production) Galaxy via the NBIC Galaxy tool shed.

Complete genomics; Next generation sequencing; Genetic variation; Pathogenic gene selection