Database of exome sequencing data
Over the last several years we have built the technical capacity for exome sequencing in mice and implemented this method of mutation identification in our Mouse Mutant Resource (MMR). We were the first to publish this approach for the discovery of mutations in the mouse genome (Fairfield et al., 2011, Genome Biology). Over 175 strains of mice with a variety of Mendelian diseases have been sequenced and analyzed using this technology. To maximize the usefulness of our sequencing data, we developed an analysis pipeline as well as a genetic variant database. The database currently houses over 4 million single nucleotide variants and small insertions or deletions, and provides a strain variation dataset of unprecedented breadth.
The database will be of high interest not only to the mouse genetics community but also to the human genetics community, which is increasingly reliant on mouse phenotype/genotype data to associate newly discovered rare variants with disease.
Our exome sequencing analysis pipeline runs the most current, well-established tools for alignment and SNV/INDEL calling, all of which have been customized for mouse exome sequencing. The pipeline generates an annotated VCF file and a data file containing essential statistics. These are both uploaded to the database, along with the sample data. The database uses the Clinical Genome Analytics framework and allows for custom data queries by strain or by phenotype. It also provides mutation-candidate nomination based on a series of rules that we’ve created from our prior experience finding causative mutations in mouse exome data. There are also several options for data downloading and sharing.
We’ve found that about 10% of the pathogenic mutations discovered are in novel mouse genes for which we now provide the first allele associated with a phenotype. While little is known about the function of these genes in mice, all are well conserved across vertebrates and importantly, in humans. Another 37% of the disease-causing mutations we identified are new alleles of genes that have yet to be associated with a human disease.