Genomic Data Analysis

Finely-tuned and field-tested across hundreds of thousands of samples spanning flagship genomic initiatives, our tools are the industry-standard for genomic variant discovery. Coupled with the Broad-produced data used to train the tools, our outputs produce clean data with low false-positives and high sensitivity and specificity to discover causal variants.

Germline Variant Analysis Get a Quote

  • VCF (Variant Call Format) File
  • Includes SNV and Indel variants


Applicable for Human Whole Genome and Human Whole Exome Sequencing, our Germline Variant Analysis utilizes GATK Best Practices for Pre-processing of raw sequencing reads and Variant discovery of analysis-ready reads to deliver germline variants.  Pre-processing includes mapping to the reference using BWA mem, duplicate marking, local realignment around indels, and base quality score recalibration (BQSR).  The variant discovery process is decomposed into separate steps: variant calling (performed per-sample), joint genotyping (performed per-cohort) and variant filtering (also performed per-cohort). The first two steps are designed to maximize sensitivity, while the filtering step aims to deliver a level of specificity that can be customized for each project.

Somatic Variant Analysis Get a Quote

  • VCF (Variant Call Format) File
  • Somatic MAF (Mutation Annotation Format) File
  • Includes Somatic SNV and Indel Variants


Applicable for Human Whole Exome Sequencing across Tumor/Normal sample pairs, our Somatic Variant Analysis utilizes the GATK best practices core variant calling workflow including Pre-processing and Variant Discovery.  Pre-processing includes mapping (BWA mem) and duplicate marking for individual sequencing output. Local indel realignment is subsequently performed jointly on the tumor normal pair, prior base quality score recalibration (BQSR), and contrastive evaluation between the tumor and normal using Mutect and Indelocator in order to provide somatic SNV and Indel calls with our desired level of quality, fine-tuned to balance specificity and sensitivity.

Somatic SNV, Indel and CNV Calling Get a Quote

  • Tumor-Only or Matched Tumor-Normal Analysis
  • Many add-ons available including: Annotation with snpEff, Consolidation of maf files from multiple individuals into one, Conversion of CNV output to older ReCapSeg format, Generation of PASS-only mafs, Comparison of sets of variant calls with Venn diagrams


With your choice of either GATK3 or GATK4 versions of Mutect2 and the GATK4 version of the CNV caller, his service provides somatic SNV, insertion, deletion, and copy number calls with or without the use of a matched normal. This service generates both coding and non-coding mutational load on a sample or sample set and the quality control metrics such as the percentage of somatically callable bases, cross-individual contamination, and mutational spectrums with lego plots. When a matched normal is available, the analysis workflow provides germline calls with HaplotypeCaller. A panel of normals is used as a noise model to improve the specificity of calls.


Somatic Structural Variation Detection Get a Quote

  • Tumor-Only or Matched Tumor-Normal Structural Variation Detection


SvABA, formerly known as Snowman, is a method for detecting structural variants in sequencing data using genome-wide local assembly.


Somatic Mutational Burden Calculation Get a Quote


This tool calculates both coding and non-coding mutational load on a sample or sample set. It can be run on the whole genome, the whole exome, the list of cfDNA genes, or a custom defined gene list. The required input is a maf. 

ULP-WGS ichorCNA Purity/Ploidy Analysis Get a Quote


This tool calculates the purity and ploidy of an ultra low pass whole genome as part of the blood biopsy workflow.

Germline Rearrangement Detection Get a Quote


Manta (version 1.3.1) is used to call structural variants in germline sequencing data.  This is a version-controlled workflow in FireCloud.

Coverage Analysis Get a Quote


This workflow allows for the identification of under-covered regions (defined as <20x coverage in 20% or more samples) and is compatible with hg19 or hg38 data; this tool is useful for characterizing and optimizing panels, exomes, and genomes.


Genotype Concordance Assessments Get a Quote


Genotype concordance compares variants called in a callset (typically vcf) to a truth dataset. The truth dataset is currently NIST hg19 NA12878, but this can be modified if needed, and the input is most efficient as a vcf, but bam and cram are also possible.

Auto-Validation Get a Quote


This on prem suite of tools is used to calculate false positive rate, sensitivity for calling variants by allele fraction and read depth, and repeatability of variant calling using the Jaccard similarity index. Replicates of NA12878 (recommended 6) are required for false positive rate calculation. Replicates of the 5, 10, and 20plex HapMap Cell Line DNA pools (required >3 each) are required for sensitivity and repeatability calculation. These tools may be used for assessing process changes or characterizing new panels. Note: the Mutect1_Indelocator version of this tool is compatible with the versioning of the CRSP pipeline.  

SmartSeq2 Single Cell RNA expression and QC Get a Quote


This pipeline uses HISAT2 to align reads sequenced from each cell to the reference and produce a BAM file, RSEM to estimate expression values, and Picard to obtain QC metrics such as alignment metrics, GC bias, insert size, quality by cycle, and RNA coverage metrics. Future plans will include quality metrics across the samples in a plate as well.


Exome Sequences Completed in 2015


Additional materials

Genomic Data Analysis Datasheet (114kb/pdf)

Providing the next step after sequence data generation. Broad Genomics Analysis supports analytical activities for both the Broad community and as-a-service externally.

Broad Institute FireCloud

For more information on our proprietary cloud-based data delivery platform, please visit the FireCloud Website.

GATK Best Practices

For more detailed information on analysis solutions, please visit the GATK Website.

FireCloud Data Delivery (163kb/pdf)

Broad Institute Genomic Services delivers sequencing data through a proprietary cloud-based platform called FireCloud. The platform offers many benefits over traditional data delivery services allowing you to securely store, manage, analyze, and share large bioinformatics datasets and analyses with collaborators worldwide.

ASHG 2016 Poster on Scaling Variant Calling Up to Hundreds of Thousands of Samples with GATK (446kb/pdf)

Cohort size for modern sequencing studies continues to rise into the hundreds of thousands of samples. There is a need for processing to be efficient and variant calling accuracy to be preserved without sacrificing rare variation sensitivity.