Resource Library

Showing 1 - 5 of 87 results.

Low cost cloud-based pipelines and optimized workflows to compare Pacbio Sequel II and ONT PromethION long read sequencing data

Maura Costello, Kiran Garimella, Ally Day, Erin LaRoche, Tamara Mason, Michael Dasilva, Laurie Holmes, Tera Bowers, and Stacey Gabriel.

Broad Institute, 320 Charles St., Cambridge, MA 02141

As interest in applying long read sequencing to applications such as human whole genome sequence (WGS) analysis continues to grow, there is an increasing need for robust, cost efficient technologies and analysis methods that can enable science at a greater scale. Current methods are still costly to both produce and analyze data.

Here, we present our experiences and comparison of the PacBio Sequel II sequencer with 8M SMRT cells and the Oxford Nanopore (ONT) PromethION sequencer with R9.4.1 flow cells, using well characterized trios from HapMap 1000 Genomes to evaluate the pros and cons of each in terms of WGS library preparation, sequencing workflows, and data output with an eye towards cost, ease of use, and data utility. Additionally, we present results from a variety of optimization experiments to improve robustness of long read workflows. For PacBio, we assess the effect that library insert size has on error rate and genome coverage following circular consensus sequencing (CCS) error correction, and also assess various methods to improve the per SMRT cell yield including predictive loading and improved sequencing chemistry. For ONT, we evaluate the efficiency of the Short Read Eliminator kit from Circulomics in an attempt to increase the mean read length for lower quality DNA on nanopore sequencing.

Finally, we present an overview of our cloud-based long read analysis pipeline publicly available in Terra, which is technology agnostic and compatible with both ONT and PacBio data. Integrating multiple tools allows us to perform de novo assembly and generate call sets for human WGS from multiple callers in a single pipeline, including GATK/HaplotypeCaller and DeepVariant for SNVs/indels, and PBSV and Sniffles for SVs. Storing and analyzing data in the cloud provides significant time and cost savings, and enables faster sample-to-result time and easier collaboration.

Custom Targeted Sequencing Datasheet

Datasheet for the custom capture, and targeted sequencing product

ASHG19 - Integration of Best Practice RNA-Seq Workflows into Cloud-Based Translational Analysis Platform

Micah Rickles-Young, Junko Tsuji, Alyssa Macbeth, Brian R. Granger, Tera Bowers, Carrie Cibulskis, Niall Lennon


The Broad Institute has a long history in genomic sequencing and in the development of tools for researchers to analyze these data. With improvements in technology and reductions in cost, the rate of sequence generation is increasing, which necessitates a platform to scale the associated analyses. We also need to be able to apply our best practice methods across a range of complex workflows to support the breadth of science among our users. This challenge is what spurred the creation of the Translational Analysis Group (TAG) within the Genomics Platform at the Broad Institute. Over the past two years, our group has developed and maintained over 30 validated, version-controlled workflows and has run over 20,000 analyses on Terra, the Broad Institute’s cloud-based analysis platform. Until recently, we have mainly focused on supporting germline and somatic variant analyses on whole genome and exome libraries, however there is high demand to integrate RNA-sequencing (RNA-seq) into the analyses. In this presentation, we introduce our new RNA-seq workflows for bulk and single-cell RNA experiments. Our suite of RNA-seq workflows starts with mapping RNA reads to a reference genome and then profiles gene and the isoform expression. The bulk RNA-seq outputs can be used as inputs for the downstream workflows to perform differential expression and RNA variant calling analysis. For evaluating the workflows, we benchmarked with publicly available datasets such as GTEx to check the expression and the RNA variant calls against the matched exome. The development of our RNA-seq analysis capabilities increases the scope of projects, both internal and external, for which TAG can provide analysis services with the reproducibility, scalable resources, and version control necessary for consistency in studies which extend over a long period of time, such as clinical trials.

ASHG19 - Application of Lean Manufacturing Methodologies in High Throughput Genomic Sequencing

Tom Howd, Peter Trefry, Samuel DeLuca, Marissa Gildea, Doug Gobron, Michael Nasuti, Shannon Adams, and Tim DeSmet

The utility and application of genomics to understand disease and the continuing trend to utilize genomics in healthcare, results in an ever increasing demand for greater sequence data generation. Despite the significant reductions in per-base sequencing cost over the last decade, the infrastructure, capital, and reagent costs are still relatively expensive. Top of the line sequencers can cost over 1 million dollars per instrument, and sequencing run costs can still be tens of thousands of dollars. With such high fixed cost associated with genome data generation, it is important to maximize capacity utilization and reduce the non-value add and wasteful workflow process steps. We demonstrate the application of lean manufacturing methodologies and visual management techniques to the genomic sequencing workflow, which results in achieving a sequencer utilization rate of around 90%, while three fold scaling our library preparation process to over 300,000 samples destined for exome and whole human genome sequencing annually.


By combining the sample preparation methods for both exome and whole genome sequencing into a unified, modularized workflow, samples and reagent supply chains can be optimized resulting in more efficient, and cost effective processing. Additional benefits include reductions to work in process and overall cycle times. Here, we illustrate the methodologies that enable low cost per base sequence data generation applicable across large sequencing cores, and modest sized data generation groups.


ASHG19 - Broad Institute's Genomics Platform Portfolio Leads support the Research Community

Tera Bowers, Carrie Cibulskis, Justin Abreu, Andrew Hollinger, Cole Walsh, Maura Costello, Niall Lennon

In 2017, the Genomics Platform (GP) at the Broad Institute created a new set of roles (Portfolio Leads) to better support both the germline and somatic research communities. The creation of these roles have allowed the platform to roll out new products and improve existing products with a focus on the specific features required to serve the scientific questions our communities want to address. Once of the key drivers for germline portfolio is Broad’s Medical and Populations Genetics (MPG) group.  When researching the requirements for the new exome, MPG investigators expressed a need for better mitochondrial coverage. After working closely with MPG, our R&D team, and TWIST Biosciences, a 100 fold increase in the mitochondrial genome coverage was achieved while still maintaining abundant and even coverage across the rest of the custom exome design. Another offering, portfolio has been instrumental in developing are the new single-cell RNA-seq products. A collaboration with Aviv Regev’s lab, has successfully scaled the SmartSeq2M protocol with full automation from library construction to sequencing. The Platform now has the capacity to library construct and sequence 16 plates a week. Thus, allowing more researchers access to sequence either single-cell or populations for their full-length transcript capture methods. A suite of long read sequencing products are being developed. These aim to provide improved structural variation calling in human whole genome sequencing. The GP was an early access site for PacBio’s Sequel II instrument with higher yielding 8M SMRT cells and longer run times. In GP’s hands, the Sequel II has delivered raw average read lengths of ~50 kb with 50% of reads being >140 kB. These new offerings will continue to enable the science of our research community.