FusorSV

An algorithm for optimally combining data from multiple structural variation detection methods

Fusor SV

FusorSV on Genome Biology 



Overview

Fusor SVComprehensive and accurate Structural Variation (SV) discovery from next generation sequencing data remains a major challenge. Popular approaches to overcome performance limitations of existing SV-calling algorithms are to use multiple complementary algorithms to determine the SV loci and then merge them under a heuristic manner. However, such approaches do not take into account the strengths and weaknesses of individual algorithms and hence either under or over merge the variant loci resulting in missing and/or false SV calls. Here, we present FusorSV, an open source tool that uses a data mining approach to assess performance and merge callsets from an ensemble of SV-calling algorithms. We also developed a FusorSV fusion model that was built on an ensemble of eight SV-calling algorithms and the analysis of 27 deep-coverage (50X) human genomes in the 1000 Genomes Project (1000GP). The model can be used for analysis of any newly sequenced sample and can be updated for other ensembles of SV-calling algorithms or datasets. Our model identified additional 843 (610 deletions, 202 duplications and 31 inversions) novel SV calls that were not reported by the 1000GP for the 27 samples. Experimental validation of a subset of these novel SV calls yielded a validation rate of 86.7%. For an easy-to-use SV detection pipeline, we built Structural Variation Engine (SVE) consisting of eight state-of-the-art SV-calling algorithms and FusorSV that is capable of performing gold standard SV analysis for whole genome sequencing projects that use Illumina paired-end read data.

Team

  • Timothy Becker1,2
  • Wan-Ping Lee1
  • Joseph Leone1
  • Qihui Zhu1
  • Chengsheng Zhang1
  • Silvia Liu1
  • Jack Sargent1
  • Kritika Shanker1
  • Adam Mil-homens1
  • Eliza Cerveira1
  • Mallory Ryan1
  • Jane Cha1
  • Fabio C. P. Navarro3,4
  • Timur Galeev3,4
  • Mark Gerstein3,4,5
  • Ryan Mills6,7
  • Dong-Guk Shin2,8
  • Charles Lee1,9 Visit the Lee Lab
1The Jackson Laboratory for Genomic Medicine, Farmington, CT 2Department of Computer Science and Engineering, University of Connecticut, Storrs, CT
3Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT
4Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT
5Department of Computer Science, Yale University, New Haven, CT
6Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI
7Department of Human Genetics, University of Michigan, Ann Arbor, MI
8senior authors
9senior and corresponding authors 

 

Download and install from Github



Copyright (C) 2017  Timothy Becker, Wan-Ping Lee
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
The Jackson Laboratory
10 Discovery Drive
Farmington, CT USA 06032
webservices@jax.org