Streamlining Solid Tumor Comprehensive Genomic Profiling Workflows Using Learned Variant and Sequencing Noise Autoclassification

Document Type


Publication Date



oregon; chiles; genomics


Introduction: Use of next-generation sequencing (NGS) has seen a dramatic expansion in the clinical setting, with tests for inherited conditions and somatic alterations in cancer. Variant review is a vital part of the NGS pipeline, ensuring that true genetic alterations are being reported for precision matching to the best treatment options and clinical trial eligibility. However, due to an ever-increasing list of cancerrelated genes and increased sensitivity of NGS pipelines, this process can be arduous and time consuming. Here, we present an autoclassification system for routinely observed variants and sequencing artifacts developed and refined across a cohort of nearly 10,000 samples. Methods: Cancer patients (stages I-IV) in the Providence St. Joseph Healthcare system received comprehensive genomic profiling (CGP) from October 2019 to June 2022 (n=9,237 patients), 9,839 total sequencing samples. DNA and RNA were extracted from formalinfixed, paraffin-embedded (FFPE) tissues using Qiagen or Promega FFPE Extraction Kit procedures. NGS libraries were prepared using the Illumina TSO 500 HT protocol and sequenced on the Illumina NovaSeq 6000 platform. Average variant allele frequency (VAF) is calculated from each variant’s historical calls. ClinVar and OncoKB were used to determine clinical significance. Variants added to the auto-classification list were decided on by multiple factors such as their known oncogenicity, number of occurrences seen, VAF average for all occurrences, and sequencing read quality. Results: We have 800 distinct DNA variants auto-classified: 160 as a reportable variant, 545 as a relatively rare benign polymorphism, and 95 as sequencing noise. These classifications span 248 different genes (of 523 analyzed in the TSO500 panel) with TP53 having the most (n=83). Across the 9,839 samples, after initial filtering of common polymorphisms, 192,064 variants were (or retroactively could have been) auto-classified, an average of 19.5 variants per sample (0.85 reportable, 11.06 benign polymorphisms, and 7.61 sequencing noise). Conversely, we see a total count of 229,244 variants that were not auto-classified (23.3 variants per sample). This results in an average of 45.56% of the called variants being auto-classified. We also identified several assay chemistry-specific sequencing noise variants that occur routinely that prior to auto-classification required careful investigation to distinguish from real reportable genomic variants. Conclusions: With everincreasing panel size and complexity for CGP testing, autoclassification of variants is a key step in reducing the burden and time commitment for manual variant review, as well as potentially increasing variant designation consistency and identification of assay-specific sequencing noise that would otherwise be reported.

Clinical Institute





Earle A. Chiles Research Institute


Presented at the AMP Annual Meeting; November 1-5; Phoenix, AZ. 2022: TT022.

This document is currently not available here.