First Indian germline database to catalog the variations in the genome of Indian population


25 Aug 2016

First Indian germline database to catalog the variations in the genome of Indian population

TMC-SNPdb: first Indian germline database to catalog the variations in the genome of Indian population

By Dr Amit Dutt, Intermediate Fellow

ACTREC Mumbai

Cancer is a genetic disease caused by a sequential accumulation of mutations in the genome. Unlike normal cell genomes, a typical cancer cell genome harbors two kinds of variations: in addition to post birth acquired somatic mutations that are associated with the disease, it also contains germline variations, called as single nucleotide polymorphism (SNP), with frequency varying by ethnicity. Apropos, a crucial aspect of any tumor genome sequence analysis involves subtraction of such common SNPs that are also present in a normal cell of the same individual, followed by known SNPs present in the population, as reported in public databases such as dbSNP and 1000 Genomes Project. The current build of dbSNP— the most comprehensive public SNP database—however, inadequately represents several non-European Caucasian populations, posing a limitation in cancer genomic analyses of data from these populations.

To fill this gap of Indian specific normal SNPdb, we present the first concerted effort to comprehensively identify and catalogue novel SNPs present exclusively in Indian population to generate a normal baseline reference database of SNPs present exclusively in Indian population, along with some caveats as detailed in the research manuscript. The TMC-SNPdb – Tata Memorial Center SNPdb-- is the first open source freely available, flexible and upgradable SNP database from whole exome data of 62 normal samples derived from cancer patients (hosted at ANNOVAR) from India origin. It consists of 114, 309 unique germline variants prevalent exclusively among Indian population at variant frequency. The TMC-SNPdb comes along with a companion subtraction tool that can be executed with command line option or using an easy-to-use graphical user interface (GUI) to deplete Indian population-specific SNPs, in addition to the dbSNP and 1000 Genomes Project.

Beyond cancer genome analyses, we anticipate universal utility of the TMC-SNPdb in several Mendelian germline diseases. Any researcher working on genome sequencing of any disease would wish for a normal germline SNP db from Indian population. More importantly, it comes along with a flexibility to allow researchers to add their own germline sequences to the TMC-SNPdb to filter Indian specific SNPs. This database has been incorporated in the dbSNP (the official database of all known SNPs in the world) and will be officially released with the next build of dbSNP release (build 149), scheduled for later this year. The TMC-SNPdb is also be available for immediate download from the ANNOVAR and our lab web page: http://www.actrec.gov.in/pi-webpages/AmitDutt/TMCSNP/TMCSNPdp.html

TMC-SNPdb: an Indian germline variant database derived from whole exome sequences
Pawan Upadhyay, Nilesh Gardi, Sanket Desai, Bikram Sahoo, Ankita Singh, Trupti Togar, Prajish Iyer, Ratnam Prasad, Pratik Chandrani, Sudeep Gupta and Amit Dutt*

Read the paper online here