
Maura Costello, Kiran Garimella, Ally Day, Erin LaRoche, Tamara Mason, Michael Dasilva, Laurie Holmes, Tera Bowers, and Stacey Gabriel.
Broad Institute, 320 Charles St., Cambridge, MA 02141
As interest in applying long read sequencing to applications such as human whole genome sequence (WGS) analysis continues to grow, there is an increasing need for robust, cost efficient technologies and analysis methods that can enable science at a greater scale. Current methods are still costly to both produce and analyze data.
Here, we present our experiences and comparison of the PacBio Sequel II sequencer with 8M SMRT cells and the Oxford Nanopore (ONT) PromethION sequencer with R9.4.1 flow cells, using well characterized trios from HapMap 1000 Genomes to evaluate the pros and cons of each in terms of WGS library preparation, sequencing workflows, and data output with an eye towards cost, ease of use, and data utility. Additionally, we present results from a variety of optimization experiments to improve robustness of long read workflows. For PacBio, we assess the effect that library insert size has on error rate and genome coverage following circular consensus sequencing (CCS) error correction, and also assess various methods to improve the per SMRT cell yield including predictive loading and improved sequencing chemistry. For ONT, we evaluate the efficiency of the Short Read Eliminator kit from Circulomics in an attempt to increase the mean read length for lower quality DNA on nanopore sequencing.
Finally, we present an overview of our cloud-based long read analysis pipeline publicly available in Terra, which is technology agnostic and compatible with both ONT and PacBio data. Integrating multiple tools allows us to perform de novo assembly and generate call sets for human WGS from multiple callers in a single pipeline, including GATK/HaplotypeCaller and DeepVariant for SNVs/indels, and PBSV and Sniffles for SVs. Storing and analyzing data in the cloud provides significant time and cost savings, and enables faster sample-to-result time and easier collaboration.