Genomic Data Analysis

Finely-tuned and field-tested across hundreds of thousands of samples spanning flagship genomic initiatives, our tools are the industry-standard for genomic variant discovery. Coupled with the Broad-produced data used to train the tools, our outputs produce clean data with low false-positives and high sensitivity and specificity to discover causal variants.

Germline Variant Analysis Get a Quote

  • VCF (Variant Call Format) File
  • Includes SNV and Indel variants

Details

Applicable for Human Whole Genome and Human Whole Exome Sequencing, our Germline Variant Analysis utilizes GATK Best Practices for Pre-processing of raw sequencing reads and Variant discovery of analysis-ready reads to deliver germline variants.  Pre-processing includes mapping to the reference using BWA mem, duplicate marking, local realignment around indels, and base quality score recalibration (BQSR).  The variant discovery process is decomposed into separate steps: variant calling (performed per-sample), joint genotyping (performed per-cohort) and variant filtering (also performed per-cohort). The first two steps are designed to maximize sensitivity, while the filtering step aims to deliver a level of specificity that can be customized for each project.

Somatic Variant Analysis Get a Quote

  • VCF (Variant Call Format) File
  • Somatic MAF (Mutation Annotation Format) File
  • Includes Somatic SNV and Indel Variants

Details

Applicable for Human Whole Exome Sequencing across Tumor/Normal sample pairs, our Somatic Variant Analysis utilizes the GATK best practices core variant calling workflow including Pre-processing and Variant Discovery.  Pre-processing includes mapping (BWA mem) and duplicate marking for individual sequencing output. Local indel realignment is subsequently performed jointly on the tumor normal pair, prior base quality score recalibration (BQSR), and contrastive evaluation between the tumor and normal using Mutect and Indelocator in order to provide somatic SNV and Indel calls with our desired level of quality, fine-tuned to balance specificity and sensitivity.

Somatic SNV, Indel and CNV Calling Get a Quote

  • Tumor-Only or Matched Tumor-Normal Analysis
  • Many add-ons available including: Annotation with snpEff, Consolidation of MAF files from multiple individuals into one, Conversion of CNV output to older ReCapSeg format, Generation of PASS-only MAFs, Venn diagram comparison of variant call sets

Details

With your choice of either GATK3 or GATK4 versions of Mutect2, and the GATK4 version of the CNV caller, this service provides somatic SNV, insertion, deletion, and copy number calls with or without the use of a matched normal. This service generates both coding and non-coding mutational load on a sample or sample set, and quality control metrics such as the percentage of somatically callable bases, cross-individual contamination, and mutational spectrums with lego plots. When a matched normal is available, the analysis workflow provides germline calls with HaplotypeCaller. A panel of normals is used as a noise model to improve the specificity of calls, and is used in lieu of a matched normal.

 

Somatic Structural Variation Detection Get a Quote

  • Tumor-Only or Matched Tumor-Normal Structural Variation Detection

Details

SvABA, formerly known as Snowman, is a method for detecting structural variants in sequencing data using genome-wide local assembly.

 

Somatic Mutational Burden Calculation Get a Quote

Details

This tool calculates both coding and non-coding mutational load on a sample or sample set. It can be run on the whole genome, the whole exome, a list of cfDNA genes, or a custom defined gene list. The required input is a MAF. 

ULP-WGS ichorCNA Purity/Ploidy Analysis Get a Quote

Details

This tool calculates the purity and ploidy of an ultra-low pass whole genome.

Germline Rearrangement Detection Get a Quote

Details

Manta (version 1.3.1) is used to call structural variants in germline sequencing data.

Coverage Analysis Get a Quote

Details

This workflow allows for the identification of under-covered regions (defined as <20x coverage in 20% or more samples) and is compatible with hg19 or hg38 data. This tool is useful for characterizing and optimizing panels, exomes, and genomes.

 

 

Genotype Concordance Assessments Get a Quote

Details

Genotype concordance compares variants called in a callset (typically VCF) to a truth dataset. The truth dataset is currently NIST hg19 NA12878, but this can be modified if needed, and the input is most efficient as a VCF, but BAM and CRAM are also accepted.

Somatic Performance Assessments Get a Quote

Details

Used to calculate false positive rate, sensitivity for calling variants by allele fraction and read depth, and repeatability of somatic variant calling using the Jaccard similarity index. Replicates of NA12878 (recommended 6) are required for false positive rate calculation. Replicates of the 5, 10, and 20plex HapMap Cell Line DNA pools (required >3 each) are required for sensitivity and repeatability calculation. These tools may be used for assessing process changes or characterizing new panels. Mutect1 + Indelocator or Mutect 2 tools are used depending on the analysis pipeline.

SmartSeq2 Single Cell RNA expression and QC Get a Quote

Details

This pipeline uses HISAT2 to align reads sequenced from each cell to the reference and produce a BAM file, RSEM to estimate expression values, and Picard to obtain QC metrics such as alignment metrics, GC bias, insert size, quality by cycle, and RNA coverage metrics. Future plans will include quality metrics across the samples in a plate.

 

 

Exome Sequences Completed in 2015

2,116

Additional materials

Genomic Data Analysis Datasheet (115kb/pdf)

Providing the next step after sequence data generation. Broad Genomics Analysis supports analytical activities for both the Broad community and as-a-service externally.

Broad Institute FireCloud

For more information on our proprietary cloud-based data delivery platform, please visit the FireCloud Website.

GATK Best Practices

For more detailed information on analysis solutions, please visit the GATK Website.

FireCloud Data Delivery (163kb/pdf)

Broad Institute Genomic Services delivers sequencing data through a proprietary cloud-based platform called FireCloud. The platform offers many benefits over traditional data delivery services allowing you to securely store, manage, analyze, and share large bioinformatics datasets and analyses with collaborators worldwide.

ASHG 2016 Poster on Scaling Variant Calling Up to Hundreds of Thousands of Samples with GATK (446kb/pdf)

Cohort size for modern sequencing studies continues to rise into the hundreds of thousands of samples. There is a need for processing to be efficient and variant calling accuracy to be preserved without sacrificing rare variation sensitivity.