
Finely-tuned and field-tested across hundreds of thousands of samples spanning flagship genomic initiatives, our tools are the industry-standard for genomic variant discovery. Coupled with the Broad-produced data used to train the tools, our outputs produce clean data with low false-positives and high sensitivity and specificity to discover causal variants.
- VCF (Variant Call Format) File
- Includes SNV and Indel variants
- VCF (Variant Call Format) File
- Somatic MAF (Mutation Annotation Format) File
- Includes Somatic SNV and Indel Variants
Applicable for Human Whole Exome Sequencing across Tumor/Normal sample pairs, our Somatic Variant Analysis utilizes the GATK best practices core variant calling workflow including Pre-processing and Variant Discovery. Pre-processing includes mapping (BWA mem) and duplicate marking for individual sequencing output. Local indel realignment is subsequently performed jointly on the tumor normal pair, prior base quality score recalibration (BQSR), and contrastive evaluation between the tumor and normal using Mutect and Indelocator in order to provide somatic SNV and Indel calls with our desired level of quality, fine-tuned to balance specificity and sensitivity.
- Tumor-Only or Matched Tumor-Normal Analysis
- Many add-ons available including: Annotation with snpEff, Consolidation of MAF files from multiple individuals into one, Conversion of CNV output to older ReCapSeg format, Generation of PASS-only MAFs, Venn diagram comparison of variant call sets
With your choice of either GATK3 or GATK4 versions of Mutect2, and the GATK4 version of the CNV caller, this service provides somatic SNV, insertion, deletion, and copy number calls with or without the use of a matched normal. This service generates both coding and non-coding mutational load on a sample or sample set, and quality control metrics such as the percentage of somatically callable bases, cross-individual contamination, and mutational spectrums with lego plots. When a matched normal is available, the analysis workflow provides germline calls with HaplotypeCaller. A panel of normals is used as a noise model to improve the specificity of calls, and is used in lieu of a matched normal.
- Tumor-Only or Matched Tumor-Normal Structural Variation Detection
SvABA, formerly known as Snowman, is a method for detecting structural variants in sequencing data using genome-wide local assembly.
This tool calculates both coding and non-coding mutational load on a sample or sample set. It can be run on the whole genome, the whole exome, a list of cfDNA genes, or a custom defined gene list. The required input is a MAF.
This tool calculates the purity and ploidy of an ultra-low pass whole genome.
Manta (version 1.3.1) is used to call structural variants in germline sequencing data.
This workflow allows for the identification of under-covered regions (defined as <20x coverage in 20% or more samples) and is compatible with hg19 or hg38 data. This tool is useful for characterizing and optimizing panels, exomes, and genomes.
Genotype concordance compares variants called in a callset (typically VCF) to a truth dataset. The truth dataset is currently NIST hg19 NA12878, but this can be modified if needed, and the input is most efficient as a VCF, but BAM and CRAM are also accepted.
Used to calculate false positive rate, sensitivity for calling variants by allele fraction and read depth, and repeatability of somatic variant calling using the Jaccard similarity index. Replicates of NA12878 (recommended 6) are required for false positive rate calculation. Replicates of the 5, 10, and 20plex HapMap Cell Line DNA pools (required >3 each) are required for sensitivity and repeatability calculation. These tools may be used for assessing process changes or characterizing new panels. Mutect1 + Indelocator or Mutect 2 tools are used depending on the analysis pipeline.
This pipeline uses HISAT2 to align reads sequenced from each cell to the reference and produce a BAM file, RSEM to estimate expression values, and Picard to obtain QC metrics such as alignment metrics, GC bias, insert size, quality by cycle, and RNA coverage metrics. Future plans will include quality metrics across the samples in a plate.
(115kb/pdf)
Providing the next step after sequence data generation. Broad Genomics Analysis supports analytical activities for both the Broad community and as-a-service externally.
For more information on our proprietary cloud-based data delivery platform, please visit the Terra Website.
For more detailed information on analysis solutions, please visit the GATK Website.
(156kb/pdf)
Broad Institute Genomic Services delivers sequencing data through a proprietary cloud-based platform called Terra. The platform offers many benefits over traditional data delivery services allowing you to securely store, manage, analyze, and share large bioinformatics datasets and analyses with collaborators worldwide.
(446kb/pdf)
Cohort size for modern sequencing studies continues to rise into the hundreds of thousands of samples. There is a need for processing to be efficient and variant calling accuracy to be preserved without sacrificing rare variation sensitivity.