Technical white paper

Accelerated throughput of next-generation sequencing workflows

Executive summary

Next-generation sequencing (NGS) technologies have revolutionized the field of genomics and fueled the growth of biotechnology, health care, pharmaceutical, and life sciences organizations worldwide. This rapid growth in production capabilities, however, requires significant compute and storage capacities to meet the increased demand for processing. In this white paper, we demonstrate an order of magnitude improvement in the throughput performance of whole genome sequencing (WGS) workflows. We achieve this impressive speedup by using Parabricks GPU-accelerated implementation of the Broad Institute's Genome Analysis Toolkit (GATK) Best Practices pipeline on an HPE Apollo 6500 Gen10 system with NVIDIA GPUs.

Introduction

Genomic sequencing has flourished rapidly over the last decade to the point where a whole human genome can be sequenced in less than a day for about $1000. The plummeting cost of DNA sequencing, however, has resulted in an explosion of genetic data volume from NGS methods that are challenging businesses to adapt to new production processes, changing technologies, and rising global competition.
From a computational standpoint, NGS workflows require high-performance processing and storage infrastructures to achieve maximum productivity and gain deeper insight from their most data-intensive workloads. Figure 1 shows a typical NGS workflow, starting from the unknown DNA fragments on the left, to the base-calling sequencers and computational analyses in the middle, to the subsequent annotation and interpretation to the right.