Data Analysis

Finely-tuned and field-tested across hundreds of thousands of samples spanning flagship genomic initiatives, our tools are the industry-standard for genomic variant discovery. Coupled with the Broad-produced data used to train the tools, our outputs produce clean data with low false-positives and high sensitivity and specificity to discover causal variants.

Germline Variant Analysis Get a Quote

  • VCF (Variant Call Format) File
  • Includes SNV and Indel variants


Applicable for Human Whole Genome and Human Whole Exome Sequencing, our Germline Variant Analysis utilizes GATK Best Practices for Pre-processing of raw sequencing reads and Variant discovery of analysis-ready reads to deliver germline variants.  Pre-processing includes mapping to the reference using BWA mem, duplicate marking, local realignment around indels, and base quality score recalibration (BQSR).  The variant discovery process is decomposed into separate steps: variant calling (performed per-sample), joint genotyping (performed per-cohort) and variant filtering (also performed per-cohort). The first two steps are designed to maximize sensitivity, while the filtering step aims to deliver a level of specificity that can be customized for each project.

Somatic Variant Analysis Get a Quote

  • VCF (Variant Call Format) File
  • Somatic MAF (Mutation Annotation Format) File
  • Includes Somatic SNV and Indel Variants


Applicable for Human Whole Exome Sequencing across Tumor/Normal sample pairs, our Somatic Variant Analysis utilizes the GATK best practices core variant calling workflow including Pre-processing and Variant Discovery.  Pre-processing includes mapping (BWA mem) and duplicate marking for individual sequencing output. Local indel realignment is subsequently performed jointly on the tumor normal pair, prior base quality score recalibration (BQSR), and contrastive evaluation between the tumor and normal using Mutect and Indelocator in order to provide somatic SNV and Indel calls with our desired level of quality, fine-tuned to balance specificity and sensitivity.

Exome Sequences Completed in 2015


Additional materials

Broad Institute FireCloud

For more information on our proprietary cloud-based data delivery platform, please visit the FireCloud Website.

GATK Best Practices

For more detailed information on analysis solutions, please visit the GATK Website.

FireCloud Data Delivery (163kb/pdf)

Broad Institute Genomic Services delivers sequencing data through a proprietary cloud-based platform called FireCloud. The platform offers many benefits over traditional data delivery services allowing you to securely store, manage, analyze, and share large bioinformatics datasets and analyses with collaborators worldwide.

ASHG 2016 Poster on Scaling Variant Calling Up to Hundreds of Thousands of Samples with GATK (446kb/pdf)

Cohort size for modern sequencing studies continues to rise into the hundreds of thousands of samples. There is a need for processing to be efficient and variant calling accuracy to be preserved without sacrificing rare variation sensitivity.