Best Practices for NGS-Based Cancer Diagnostics

Andreas Scherer, Ph.D., President & CEO


Precision Medicine is based on gathering and analyzing genomic, proteomic, and clinical data from large populations of patients to stratify patients into subpopulations with common traits, health conditions, or response to drugs. For individual patients, these data can then be used to recommend the best actionable clinical decisions in the following areas:

Diagnosis | Treatment | Prevention.

Specifically, in the cancer space, data derived from Next-Generation Sequencing (NGS) is used to diagnose and prognose diseases, select a targeted therapy, and potentially evaluate the suitability of a patient to be part of a clinical trial. The entire NGS-based cancer diagnostic space received a push in 2018 when Medicaid issued standard reimbursement codes for NGS-based tests. NGS allows us to look at any number of genes that are potentially involved in the oncogenesis of a patient’s tumor. The usage of an NGS platform is more efficient at a lower cost point than other methods such as Sanger-Sequencing and provides a much better resolution compared to microarrays. Hence, its adoption is spreading fast on a global basis.

To put this in perspective, I will elaborate on how this field is expanding at the macro level. The global NGS market is currently valued at USD 5.70 billion and is expected to reach USD 16.35 billion by the year 2024. This represents a compound annual growth rate (CAGR) of 19.2%. Contributing to this growth are the following critical factors:

1. An increase in the number of disease treatment options, accompanied by the adoption of precision medicine & molecular diagnostics.

2. Advancements in NGS platforms, with sequencers providing increased throughput and improved data quality.

3. Moore’s Law of NGS: A decline in NGS capital requirements continues across multiple sequencing platform providers even as the sequencing technology itself improves.

4. Changes in the regulatory environment, with an increase in acceptance of utilization of NGS-based tests in the clinic.

5. Changes in reimbursement, with payors increasingly willing to pay for these tests.

6. Funding for large-scale sequencing projects from both government & private sources.

While these developments represent remarkable advances, they also portend a significant dilemma for the industry. We have, on the one hand, a labor-intensive diagnostic process that requires expertise and attention to detail, while on the other hand, a rapidly growing demand for NGS-based tests. Clinical laboratories in this space can expect to multiply their workload within the next four years, with similar, if not greater growth anticipated beyond that timeframe. At the same time, there is already a shortage of clinical experts in the field of genetics with specific expertise in the NGS area. Automation of analysis and patient matching algorithms will become increasingly indispensable tools to solve this dilemma.

To the point: the process of final variant classification and reporting requires adjudicating multiple lines of evidence, developing decision trees, and coding systems to weigh the evidence. Automation of this process will help to eliminate the problems associated with human error and individual subjectivity. In addition, the automation of informatics and the creation of guided workflows will reduce the time and effort required for molecular pathologists and medical geneticists to review, assess, and sign off on clinical reports. The key value of a workflow-supported diagnostic process includes:

  • Delivery of consistent, high-quality interpretations: Those responsible for conducting the analysis must be well trained and have in-depth domain knowledge of the complexities of clinical variant interpretation. Studies have shown that time of day may affect a person’s judgment, therefore the quality of the analysis must be equally excellent regardless of when the work is being conducted – be it at 8:00 AM, 1:30 PM right after lunch, or well into the evening hours. Consequently, system maintenance and a high level of attention to detail are required throughout a busy workday.
  • Providing a framework for newer, less experienced clinicians: As the volume in a clinical laboratory ramps up, there will be an inevitable need to bring additional staff up to speed in conducting the analytics. We were able to demonstrate that our product can be tremendously helpful in this context. Our solution provides a framework that supports less experienced analysts so that they can confidently conduct high-quality work in this space. In many instances, our product provides clear, specific guidance in how to answer certain questions with a minimum of supervision (as always, a trainee can still engage with more senior lab members when necessary). Overall, our solution affords a potentially significant reduction in the time to bring new members up to speed in the clinical interpretation of variants.
  • Staying abreast of new developments: Golden Helix invests substantial resources and time directly engaging with the community of end-users, attending conferences, engaging with our clients, and consulting with outside experts as part of our product development process. Based on this ongoing feedback, we regularly update the software to ensure that it reflects the state-of-the-art variant analysis and interpretation. This is reassuring to clients who would otherwise face the same burdensome process to stay abreast of new developments but who don’t have the time and resources to perform this leg work.

The widespread implementation of NGS technologies has produced a vast number of variants to analyze and categorize, along with numerous data sources containing a wealth of knowledge about the clinical relevance of variants. And while a wide array of available algorithms may assess how the presence of a variant functionally impacts a particular gene and associated proteins, we are long past the point where these types of analytics can be conducted manually.

It is crucial to have the combined capability of analyzing germline and somatic variants, as cancer can be triggered by variants in either or both cell types. The former (germline) may occur in egg and/or sperm cells is hereditary in nature – BRCA1 and BRCA2 are well‑known examples. The latter occurs in any other type of cell. In either case, these mutations can either be activating or inactivating in nature. Activating means that the mutation confers new or increased cell activity that is helpful to the development or spreading of the tumor. Inactivating means the loss or dampening of a cell function that inhibits the development or growth of tumors, e.g., switching off a tumor-suppressor gene. In fact, depending on the gene variants and splicing alternatives, BRCA1 may result in an activating or inactivating mutation.

Another example would be the tumor-suppressor protein p53 encoded by TP53. The homozygous loss of p53 is often found in colon cancer, but also in breast and lung cancers. Below is an overview of the variety of different forms of mutations that are clinically most relevant:

  • Single-Nucleotide Variants (SNVs) triggering, for example, a missense or nonsense amino acid substitution.
  • Splice Site Alterations that impact the mRNA transcript. These mutations can potentially render an entire gene useless.
  • Copy Number Variants (CNVs) that either duplicate or delete entire chunks of DNA, causing devastating damage on the molecular level. The tumor-suppressor Rb1 is often seen in retinoblastomas, for example.
  • Other structural rearrangements such as gene fusions, translocations, inversions.

Depending on the molecular profile of a sample arising from these various mutant types, clinicians can determine suitable treatment options for cancer patients. These include the following:

  • FDA approved cancer drugs.
  • Off-label treatments for specific tumors, sometimes in conjunction with other treatments such as chemotherapy.
  • Recommendation to enroll a patient into a clinical trial based on the molecular profile of the tumor.

Molecular cancer diagnostics is a quickly evolving discipline, as witnessed by the many papers published regularly describing treatment options and new associations between genes and cancers. We have reached a level of complexity at which manual review of the available data, information, and knowledge is nearly impossible and the development of a comprehensive clinical report is extremely difficult. We may have already reached the point where software-aided decision making has become the only viable option to deal with this complex matter. In the next section, we take up the following topics:

  • Clinical Reporting: What do we need to report on?
  • Databases and Annotation Sources
  • The AMP Guidelines
  • Examples of how to apply the AMP guidelines leveraging Golden Helix’s Diagnostic Platform for Cancer

Clinical Reporting

Analogous to the ACMG guidelines for germline mutations, the Association for Molecular Pathologists (AMP) has issued guidelines to assess and report on somatic variants. The key paper in this area was a set of consensus recommendations reached jointly by several societies (see Richards et al. (2015)).

In a survey that included 67 responses, it was interesting to see to what degree labs differed from each other in terms of the level of detail included in their reports. A majority of the participants (83%) used gene panels as their diagnostic vehicle ranging from less than a dozen to slightly below one hundred genes. To a lesser degree, some labs employed clinical exome – or whole genome‑based tests (12%, 5% respectively) to conduct their analytics work. Nearly all labs reported on SNVs and INDELs (95%), whereas CNVs and gene fusions were only reported in about a third of the labs (35% and 37% respectively).

The most striking findings were the following:

  • Numerical cutoff for Minor Allele Frequencies (MAF): Most labs (75%) use ‘1%’ as the cutoff to define what constitutes a minor allele. This is important as this metric constitutes what is considered a rare mutation.
    Classification scheme: About 40% of the labs are using a 5-level categorization scheme (pathogenic, likely pathogenic, unknown significance, likely benign and benign). 35% of the labs used a 3-level categorization scheme. About 25% of the labs had a scheme that differed from the former two approaches.
  • Reporting on therapeutic implications: A vast majority of labs (80%) reported on the therapeutic implications of a variant.
  • Germline variants: Slightly more than 70% reported on germline variants observed in the sample.
  • Allele frequencies: About half of the labs reported variant allele frequencies.
  • Genomic coordinates: Equally, about half of the labs included genomic coordinates in their report, which typically simplifies external review of the report by a third party.
  • Transcript information: About 80% of the labs included transcript information in their report.
  • QC metrics: Roughly a third of the labs reported genes or regions in which results did not meet QC standards. Another third of the labs did this sometimes, and one third didn’t report on this issue at all.

All these findings highlighted the need to standardize clinical reporting in the cancer space. Hence the issuance of the AMP guidelines in the 2015 paper (see also Li et al., 2017).

Annotation Sources and Computational Prediction Methods

Any bioinformatic pipeline for cancer ultimately calls variants based on the aligned reads that the sequencer generated. Variant calling is the process of reviewing a sequence alignment, typically in the form of a BAM file, to identify loci that differ from the reference genome. Single Nucleotide Variants (commonly called “SNVs” or “SNPs”) are the most common type of variation, followed by insertions and deletions (jointly referred to as “INDELs”) and Multiple Nucleotide Variants (“MNVs” or “substitutions”). It is also possible to detect Copy Number Variations (CNVs) and other structural variants such as inversions and translocations; whole-genome sequence data is often required to identify these features accurately. Variant calls are typically stored in a “VCF” file. The VCF file format is flexible but typically captures the observed genotype at any genomic coordinate where a variant is observed together with technical data, such as read depth and quality scores.

In the context of cancer testing, sequence reads from both tumor tissue and matched normal tissue samples can be aligned and compared to detect somatic variants in cancer cells. Probabilistic techniques have been developed to compute the probabilities of somatic mutations (Roth, 2012; Larson, 2012), and machine learning has been used to train classifiers that can detect somatic mutations (Ding, 2012).

In cancer testing, the somatic variants occurring in the tumor cells of the patient are of interest. Because of the heterogeneity of cell regions within the tumor, mutations that occur in only 50% of tumor cells must be detected, with sensitivity down to 10% of tumor cells desirable (according to the guidelines of the American Society of Cancer Oncology, see Leighl (2014)). Biopsies taken for diagnosis and sequenced will also be a mix of tumor and normal cells. The implication is that variants must be called when somatic mutations occur in only a small fraction of the sequences at a given genomic locus, often calling mutations that occur only in 1-5% of sequence reads (called the allelic frequency). When a biopsy of normal cells for a patient is taken alongside the tumor, algorithms can consider both in conjunction and use probabilistic techniques to detect somatic mutations (Ding, 2012). The performance and agreement of these algorithms can vary widely depending on the properties of the input data (Xu et al. (2014)).
Cancer panel tests are often performed only on tumor samples (without matched-normal samples sequenced), in which case determining which variants are somatic mutations is done in the filtering and annotation process.
As noted previously, clinical cancer analysis generally focuses on somatic mutations found in tumors. Novel somatic mutations are typically identified by comparing tumor sequences to a matched normal sequence. However, it is increasingly common to sequence only the tumor tissue when performing gene panel tests. The genes tested by standard cancer panels are well characterized, and certain mutations within those genes are known to occur frequently in certain cancers. Mutations at the same sites are extremely rare in non-cancerous tissues. Analysis of tumor sequences in the absence of a matched normal is therefore capable of identifying known cancer-associated mutations, although this lacks the power to distinguish between somatic mutations and germline variants at novel sites.

Filtering and Annotation

After calling variants, care must be taken to confirm the quality of the sequencing, review the identified variants, annotate them according to known or predicted consequences, and identify which variants, if any, may be clinically actionable.

Several public and proprietary databases contain information about previously observed somatic mutations in tumors. One of the most popular databases is COSMIC, a publicly accessible database of mutations extracted from research articles and from The Cancer Genome Atlas (TCGA). COSMIC catalogs many important data points about each mutation, including the tumor location, histology, and internet links to original publications and case reports.

A typical annotation and filtering workflow might include the following steps:

  • Annotate all variants based on gene location and resulting changes to the protein product of the gene.
  • Compare the variant list with COSMIC and retrieve annotations for any matching mutations.
  • Flag variants with poor sequencing quality (based on coverage depth or other metrics).
  • Identify somatic mutations. Compare to normal tissue if available to remove germline variants. Otherwise, remove common germline variants as reported in population databases such as the 1000 Genomes project and focus on variants observed with allelic frequency in the expected range for somatic mutations (for example, variants that appear in 1% to 15% of reads at the site).
  • Prepare a final list of probable somatic mutations, together with all available annotation data.

Figures 1, 2, 3, and 4 provide a comprehensive overview of key databases for the interpretation of somatic sequence variants.

Golden Helix curates a carefully reviewed dataset containing assessments of variants and genes in the context of specific cancers under the name “Golden Helix CancerKB.” This database is meant to provide interpretations for the most common cancer genes and biomarkers for specific tumor types. Along with interpretations about the clinical impact of variants and other biomarkers, summary descriptions of the clinical significance of genes and the clinical outcomes associated with mutations in that gene for the patient’s tumor type are provided. Ultimately, these interpretations are a starting point for a lab to develop its own reusable report snippets. But, by providing report-ready content for the most common test results, the Golden Helix CancerKB accelerates the time to report while providing a valuable interpretation resource. Figure 5 shows an example of the content of this database.

In addition to annotation sources and curated specialty databases, it is important to point out that in order to determine the pathogenicity of a variant, clinicians use computational models to predict the functional outcome of a mutation. There are two types of algorithms that are typically used:

  • Splice site prediction algorithms (Ssee Figure 6): These tools have only limited precision and tend to overcall, hence lower than ideal precision. Yet, they are, for a clinician, informative and need to be considered in the clinical assessment.
  • Prediction algorithms assessing if a particular change in the nucleotide sequence will impact the structure and function of the protein encoded by a particular gene (see Figure 7).

Functional prediction algorithms generally assess the impact of a missense change based on criteria such as the conservation of the amino acid position, the mutation’s location within the protein sequence, and the biochemical consequences of the amino acid substitution. While many of these algorithms rely on different prediction methods, they all have similarities in their underlying basis. Numerous meta-analyses have been performed comparing the various functional prediction methods, finding significant differences in performance between the various algorithms.

Splice site prediction algorithms are used to determine if a variant is likely to either disrupt an existing splice site or introduce a novel spice site. These algorithms generally have higher sensitivity relative to specificity and are prone to producing false-positive classifications (Richards, 2015). Thus, while splice site algorithms can provide some evidence for a damaging effect, further evidence is required to establish pathogenicity.

Figure 5: The interpretations for BRAF V600E for Melanoma in this draft clinical report preview are all provided by Golden Helix CancerKB

Figure 6: Splice Site Prediction Algorithms

Figure 7: Functional Prediction Algorithms

A state-of-the-art clinical workflow tool should also provide functional predictions, conservation scores, and splice site predictions for all relevant variants. This functionality makes it easy to assess the in-silico evidence for a given variant.

The AMP Guidelines

Somatic variants can manifest in different ways:

  • Single Nucleotide Variants
  • INDELs
  • Fusions and
  • Copy Number Variations

Somatic variants occur after birth due to environmental effects or replication errors. The allele frequency of these variants in the general population catalogs covers a range of values but are typically less than 0.5%. When we assess the clinical implication of a somatic variant, we are interested in the following:

  • Sensitivity: Does a particular variant imply the sensitivity to a particular drug or treatment?
  • Resistance: Are certain drugs not effective?
  • Toxicity: Does a single biomarker or biomarker panel imply that a certain drug has adverse or toxic effects

The AMP guidelines recommend clustering the available clinical evidence in four levels:

1. Level A: This is the highest level of available evidence. Biomarkers in this category are clearly established to either create a definitive tumor response for a specific tumor type or are known to indicate the resistance of the tumor based on US-FDA or other professionally accepted guidelines.

2. Level B: There is still very strong evidence of predicted tumor response or resistance based on numerous large studies. Experts agree on the findings, and the assembled body of knowledge is consistent in the clinical interpretation of a given biomarker.

3. Level C: Biomarkers in this category point towards efficacy or resistance in a different type of tumor. Clinicians use Level C to recommend off-label use of a drug. Sometimes they are used to recommend the inclusion of a patient in a clinical trial.

4. Level D: In this category, evidence from pre-clinical studies or other smaller studies on a smaller scale is collected. This is generally seen as the weakest of all pieces of evidence that can yield a clinical decision.

The available clinical evidence is then sorted into four tiers that describe the clinical impact of any given variant (see Figure 8).

Figure 8: Evidence-based variant categorization

Tier I Variants

With Tier I variants, we have clear evidence of therapeutic, diagnostic, and or prognostic value. In this tier, we are mainly interested in variants with a predicted response or resistance to FDA approved drugs, or others are documented in professional guidelines for a specific tumor. We include only Level A and B evidence in this tier. Let’s look at a few examples.

One of the most well-known cancer mutations, BRAF V600E, is predictive for a positive response for the drug vemurafenib in melanoma (see Figure 9). Therapeutically, this would be a valid and likely choice based on the observed biomarker.

Other variants have diagnostic or prognostic implications. For example, PML-RARA fusion is pathognomonic for promyelocytic leukemia. On the flip side, an FLT3 internal tandem duplication variant indicates a poor prognosis in acute myeloid leukemia (see National Comprehensive Cancer Network Guidelines in Oncology Acute Myeloid Leukemia – subscription required).

Tier II Variants

The evidence in this tier stems from Level C and D. The clinical recommendation typically leads to recommend off-label use of drugs or the admission into a clinical trial. For example, ruxolitinib was approved by the FDA with a primary indication for myelofibrosis. The JAK inhibitor has shown clear efficacy to improve the survival rate for this type of cancer. Recently, there were results published from studies showing that patients with lymphoblastic leukemia can benefit from the drug. This would be an off-label prescription by an oncologist of an FDA approved drug due to strong clinical evidence based on well documented, high powered studies.
Another example centers around TP53, which is also a commonly mutated cancer gene in humans. As previously described, it regulates the p53 protein, a growth suppressor. In this context, Nutlins, a group of small‑molecule compounds, have shown efficacy in preclinical studies for p53 wild-type tumors. But it has also demonstrated efficacy in the treatment of advanced solid tumors, hematological malignancies, and liposarcomas (see

Tier III Variants

Variants in this tier are interesting to assess; they are rare, and thus cannot be ruled out as benign based on population-level allele frequencies. And, unfortunately, no clear evidence has been established for clinical indications or utility. The mutation type might indicate a potentially damaging impact, such as frameshift or missense, but since no conclusive studies link them to any particular cancer, they are categorized as variants of unknown clinical significance.

Figure 9: Impact of vemurafenib

Tier IV Variants

The last tier of biomarkers is benign. Typically, they are ruled out based on their allele frequency in population catalogs. Some of those catalogs are cited in Figure 2. They are simply common variants in the human genome.
Interpretation of Germline Variants
As noted previously, germline variants are inherited from one or both parents, and thus an offspring can have either one or two copies from the germline cells. The interpretation of germline variants implicated in cancer needs to follow the ACMG guidelines. [See my eBook on this subject entitled “Clinical Variant Interpretation – Applying ACMG guidelines to Analyze Germline Diseases,” part of our eBook series at]

In the case of a pathogenic germline variant, it is recommended that the presence of the variant should be confirmed outside of the tumor sample. This may require an extra step to confirm whether this variant is also in the normal tissue sample. Along with the proper assessment of that variant according to the guidelines, it is recommended to follow up with genetic counseling due to the wider potential implication for the impacted family.

The field of NGS-based cancer diagnostics is very dynamic and rapidly expanding. Obviously, the categorization of variants as Level A, B, C, D evidence, and the inclusion of the findings in the various tiers, is always just a snapshot in time. For example, we may see Tier II Variants soon be classified as Tier I, as certain off-label recommendations are becoming officially approved by the FDA. Variants of unknown significance can be suddenly implicated in a preclinical study in certain cancer types elevating their status to Tier II. This is the reason why workflow automation is the only real option to deal with the knowledge explosion in this field. Now, let’s look at a few examples of how these guidelines are applied within Golden Helix’s clinical analysis pipeline.

Diagnostics Examples

Given that detecting cancer at an early stage can make it much more treatable, developing tests and making them clinically actionable is crucial to beat this disease. Of course, more can be done – consequently, we see the field pushing into exome, whole genome, and RNA sequencing to capture more of the complex processes behind oncogenesis and to identify genetic markers for earlier detection.

With VarSeq, Golden Helix has developed a software platform that supports gene panel analysis leading to clinically actionable information. This platform covers all the intermediate steps to reach a set of high‑quality reportable variants, including:

  • Setting the patient’s tumor type and other laboratory information;
  • Reviewing the NGS panel statistics and quality metrics for the current sample;
  • Checking and reporting target regions that fail the thresholds set by the lab test;
  • Confirm that important must-call hotspot regions and variants have sufficient coverage;
  • Adding somatic and potential germline variants to the analysis, including SNV, CNV, and Fusions;
  • Scoring variants that are novel or of uncertain impact to a cancer gene before reporting it as a Biomarker;
  • Writing interpretations following the AMP Tier guidelines that incorporate the supporting clinical evidence for a biomarker in the context of the current patient’s tumor type;
  • Reviewing and finalizing the clinical report.

In the following example, we will review the results of an NGS gene panel covering common cancer genes with a patient diagnosis of Melanoma. We will follow the process from start to finish for the BRAF V600E biomarker and then cover a CNV and Fusion example.

Patient Clinical Information and Tumor Type

At the start of the workflow, the Patient screen provides inputs for various patient information that may need to be displayed on the final clinical report.

One critical patient-specific field that is set in this first screen is the current diagnosis of the patient. The AMP guidelines emphasize the evaluation of clinical evidence for a cancer biomarker should be made in the context of the patient’s tumor type. The most intuitive way to represent the many cancer types and sub-types is through a tumor type hierarchy organized first by the originating tissue type. A searchable tree-view of the cancer ontology is provided to select the appropriate tumor type. Commonly tested tumor types are available for quick selection. Figure 10 shows the selection of Melanoma for this patient.

Figure 10: Selecting the Tumor Type for the Current Patient

Figure 11: Reviewing the NGS Sequencing Summary and Coverage

Figure 12: Reviewing Failed Target Regions and Must-call Sites

NGS Panel Summary and Quality Check

The first screen also displays the NGS Sequencing Summary and NGS Coverage of Genes (see Figure 11). Critical decisions can be made with these summary statistics, e.g., whether the sample needs to be re‑processed or whether it meets the standards of the test to be used for clinical assessment.

Beyond these summary views, specific details on the target regions and regions of interest are shown on the screen and need to be considered for potential action. In addition, any test result that may meet criteria for reporting but falls below target thresholds should be evaluated and reported as inconclusive. Furthermore, some cancers have very frequent recurrent mutations with actionable clinical evidence; these mutations should be called as either present or absent with high confidence.

The “Must Call Hotspots” sites can be configured to run a coverage statistics algorithm on these specific variant sites. In Figure 12, four target regions fail the coverage thresholds, while the BRAF V600 amino acid hotspot is covered sufficiently. The failed target regions can be selected and added to the clinical report and noted as failing the thresholds.

NGS Variant Example: BRAF V600E

After selecting the patient tumor type and reviewing the NGS summary statistics, we can begin the analysis workflow by adding variants of different types to the patient’s mutation profile. The annotation and filtering process described early has reduced the number of raw NGS variants down to high-quality variants while removing most benign variants and passenger mutations with no functional impact on the cancer genes. Variants of different types can be added to provide a comprehensive view of the mutation profile of the patient, including:

  • Small variants (1 to 1kb) such as SNPs, INDELs, and multi-base substitutions
  • Copy number variants ranging from single exon gains or losses to whole genes
  • Gene-fusions defined by the pair of genes with one gene acting as the primary functional driver
  • Wild-type genes that by having no mutations act as relevant biomarkers for the current tumor type

In Figure 13, we have added the BRAF V600E variant, which will be reported as a Biomarker while also including a Secondary Germline frameshift in BRCA2 and a Variant of Uncertain Significance in DST.

Figure 13: The BRAF600E will be Reported, while Germline Variant and Variant of Uncertain Significance are Also Present

Figure 14: The Oncogenicity Scoring Algorithm Results for BRAF V600E


Each variant added gets evaluated automatically based on the scoring guidelines appropriate for the mutation’s origin. Somatic variants use a specialized Oncogenicity scoring system, while germline mutations provide the full ACMG guidelines scoring rubric. Recurrent somatic mutations may be known immediately as reportable cancer biomarkers, but many detected somatic variants require an evaluation of the bioinformatics and literature evidence. This evaluation is done through the Oncogenicity scoring system and includes many of the annotations we described earlier. Figure 14 shows the summary and recommendations of the automated Oncogenicity scoring of BRAF V600E (p.V600E in figure).

Figure 16: The BRAF V88888600E Interpretation for Drug Sensitivity

The bulk of the work for reporting a variant is in the evaluation of the clinical evidence for that variant as a cancer biomarker. The workflow aggregates the following sources of clinical evidence statements:

 DrugBank as a source of FDA approved drugs which list specific cancer biomarkers on the label as an indication for use;
 The Precision Medicine Knowledgebase (PMKB) from Weill Cornell Medicine interpretations of cancer variants;
 Clinical Interpretation of Variants in Cancer (CIViC) aggregation of evidence statements about variants in specific tumor types

The sources are combined, ranked, and filtered to present the most pertinent and high-quality evidence statement under the four reportable categories of Drug Sensitivity, Drug Resistance, Prognostic, and Diagnostic. Figure 15 shows the results of this process for the BRAF V600E biomarker specific to the current patient diagnosis of Melanoma. These filters can be relaxed to consider the therapeutic evidence for other tumor types that may be reportable as Tier II following the AMP guidelines.

Each row of the evidence table can be selected and reviewed in detail alongside the draft interpretation for use in the clinical report. Any citations referenced by PMID will automatically be extracted and result in the full title and author attribution listed below the interpretation as well as in the collated list of references in the final clinical report. Along with the interpretation, specific drugs or drug combinations can be specified as well as the tier of the evidence according to the AMP guidelines. In Figure 16, the BRAF V600E interpretation for drug sensitivity concludes there is Tier 1 – Level A evidence for a number of drug combinations.

Report Review and Finalizing

Figure 17: Reviewing the Biomarker Results Selected To Be Reported

Figure 17 shows the reportable details of the BRAF V600E biomarker along with its interpretations. Each variant has been classified, and each reportable biomarker interpreted, the full content of the analysis can be reviewed on the Report screen. Along with any final changes, a results summary can be prepared to describe the key findings of the clinical report.

At any time, the report can be rendered into its final form as a Word document and converted or saved as a PDF. These documents are preserved as part of the project record. The generated PDF can be previewed inside the workflow, as shown in Figure 18.

Figure 18: Previewing the Resulting Report

Once reviewed, the report can be Signed Off and Finalized, after which no more changes can be made to it, and a finalized report can be rendered for consideration of actionable decisions

Figure 19: Interpretation of an ERBB2 Amplification

NGS CNV Example: ERBB2 Amplification

Along with small variants, CNVs are often reported as relevant biomarker on the clinical report. NGS gene panels can be used to detect CNVs down to the single-exon level, given a sufficiently advanced CNV detection algorithm. Even if CNVs are detected using a secondary assay, they can be manually added and included in the patient’s mutation profile.

The same clinical evidence sources and AMP Tier system can be used to evaluate and interpret CNVs and prepare them for clinical reporting. Figure 19 provides a sense of the Biomarker interpretation screen for an ERBB2 amplification. While the clinical evidence for therapeutic response, diagnostic and prognostic outcomes are the focus of the report, whereas background information on a gene and its role in cancer provide a critical context for the oncologist or other report recipients.

Fusion Example: BCR-ABL1

Gene fusions occur commonly in certain cancers and create hybrid proteins that often activate the function of the primary gene. Common fusions can be detected by specialized kits that look for the presence of DNA sequences that span the fusion junctions. RNA sequencing also efficiently detects hybrid gene fusions.

A detected fusion can be added to the mutation profile for the patient and interpreted alongside the other Biomarkers. The interpretation of a fusion Biomarker focuses on the primary gene that defines the functional behavior of the fusion product. The second gene in the fusion-pair enables the activation of the primary genes’ function. Figure 20 shows the interface for adding a new fusion to the project and the detection of the primary gene.

Figure 20: Adding BCR-ABL 1 Evidence for All Cancers

Once added, a fusion can be interpreted using the same Biomarker interface as small variants and CNVs. In the case of BCR-ABL1, there are multiple FDA approved therapies listed in the indications for use, but not for the current tumor type of Melanoma. In Figure 21, the filters of Disease have been broadened to “All” cancers to show this evidence.

Figure 21: Viewing BCR-ABL1 Evidence for All Cancers


Detecting cancer at an early stage can improve the likelihood of response to an appropriate treatment regimen (if available). Developing tests and making them clinically actionable is crucial to beat this disease. This article has covered the key concepts involved in the clinical interpretation of somatic variants. We looked at the following topics:

  • Clinical Reporting: The need to standardize
  • Annotation Sources and Functional Prediction Algorithms
  • The AMP Guidelines
  • Examples of how to apply the AMP guidelines leveraging Golden Helix’ Diagnostic Platform for Cancer

The key factor to realize the utility of Golden Helix and other platforms in this context is:

  • Delivery of consistent, high-quality interpretations
  • Increased lab throughput
  • Providing a framework for newer, less experienced clinicians
  • Staying abreast of new developments

We reached a level of complexity in the available data, information, and knowledge where the manual development of a defendable clinical report is extremely difficult. If not already, then very soon, software-aided decision making will be the only viable option to take advantage of the unmet opportunity to deal with these complex matters.

About Golden Helix

Golden Helix has been delivering industry leading bioinformatics solutions for the advancement of life science research and translational medicine for over a decade. Its innovative technologies and analytic services empower scientists and healthcare professionals at all levels to derive meaning from the rapidly increasing volumes of genomic data produced from next-generation sequencing. With its solutions, hundreds of the world’s hospitals and testing labs are able to harness the full potential of genomics to identify the cause of disease, develop genomic diagnostics, and advance the quest for personalized medicine. Golden Helix products and services have been cited in thousands of peer-reviewed publications. Golden Helix is also on the Inc 5000 list of the fastest growing private companies in the US.

Andreas Scherer, Ph.D., is CEO of Golden Helix. He is also Managing Partner of Salto Partners, a management consulting firm headquartered in the DC metro area.  He has extensive experience successfully managing growth as well as orchestrating complex turnaround situations. His company, Salto Partners, advises on business strategy, financing, sales and operations. Clients are operating in the high tech and life sciences space. Dr. Scherer holds a PhD in computer science from the University of Hagen, Germany, and a Master of Computer Science from the University of Dortmund, Germany. He is author and co- author of over 20 international publications and has written books on project management, the Internet, and artificial intelligence. His latest book, “Be Fast Or Be Gone”, is a prizewinner in the 2012 Eric Hoffer Book Awards competition, and has been named a finalist in the 2012 Next Generation Indie Book Awards!


Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K, Sougnez C, Greulich H, Muzny DM, Morgan MB, Fulton L, Fulton RS, Zhang Q, Wendl MC, Lawrence MS, Larson DE, Chen K, Dooling DJ, Sabo A, Hawes AC, Shen H, Jhangiani SN, Lewis LR, Hall O, Zhu Y, Mathew T, Ren Y, Yao J, Scherer SE, Clerc K, Metcalf GA, Ng B, MilosavljevicA, Gonzalez-Garay ML, Osborne JR, Meyer R, Shi X, Tang Y, Koboldt DC, Lin L., Aboot SD, Sawyer CS, Vickery T, Sander S, Robinson J, Winckler W, Baldwin J, Chrieac LR, Dutt A, Fennel T, Hanna M, Johnson BE, Onofrio RC, Thomas RK, Tonon G, Weir BA, Zhao X, Ziaugra L, Zody MC, Giordano T, Orringer MB, Roth JA, Spitz MR, Wisuba II, Ozenberger B, Good PJ, Chang AC, Beer DG, Watson, MA, Ladanyi M, Broderick S, Yoshiazwa A, Travis WD, Pao W, Province MA, Weinstock GM, Varmus HE, Gabriel SB, Lander ES, Gibbs RA, Meyerson M, Wilson SK. “Somatic mutations affect key pathways in lung adenocarcinoma, Nature 455(7216) (2008): pp 1069-1075.

Ding, Jiarui and Bashashati, Ali and Roth, Andrew and Oloumi, Arusha and Tse, Kane and Zeng, Thomas and Haffari, Gholamreza and Hirst, Martin and Marra, Marco A and Condon, Anne and others (2012). “Feature-based classifiers for somatic mutation detection in tumour–normal paired sequencing data”. Bioinformatics (Oxford University Press) 28 (2): 167–175. doi:10.1093/bioinformatics/btr629

Ding, L., Ley, T.J., Larson, D.E., Miller, C.A., Koboldt, D.C., Welch, J.S. et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature. 2012; 481: 506–510
Larson, David E and Harris, Christopher C and Chen, Ken and Koboldt, Daniel C and Abbott, Travis E and Dooling, David J and Ley, Timothy J and Mardis, Elaine R and Wilson, Richard K and Ding, Li (2012). “SomaticSniper: identification of somatic point mutations in whole genome sequencing data”. Bioinformatics (Oxford University Press) 28 (3): 311–317.

Leighl et al. “Molecular Testing for Selection of Patients With Lung Cancer for Epidermal Growth Factor Receptor and Anaplastic Lymphoma Kinase Tyrosine Kinase Inhibitors: American Society of Clinical Oncology Endorsement of the College of American Pathologists/International Society for the Study of Lung Cancer/Association of Molecular Pathologists Guideline” J Oncol Pract. (Oct 13, 2014)

Li, Q., Wang, K., Clinical Interpretation of Genetic Variants by the 2015 ACMG-AMP Guidelines, Am J Hum Genet. 2017 Feb 2; 100(2): 267–280. PMID: 28132688, Published online 2017 Jan 26. doi: 10.1016/j.ajhg.2017.01.004
Richards, S., Aziz, N., Bale, S., Bick, D., Das, S., Gastier-Foster, J., Grody, W.W., Hegde, M., Lyon, E., Spector, E., Voelkerding, K., and Rehm, H.L. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for

Molecular Pathology. Genet Med. 2015; 17: 405–423
Roth, Andrew and Ding, Jiarui and Morin, Ryan and Crisan, Anamaria and Ha, Gavin and Giuliany, Ryan and Bashashati, Ali and Hirst, Martin and Turashvili, Gulisa and Oloumi, Arusha and others (2012). “JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data”. Bioinformatics (Oxford University Press) 28 (7): 907–913.

Xu et al. “Comparison of somatic mutation calling methods in amplicon and whole exome sequence data”. BMC Genomics 2014, 15:244.

Figure 1: Population database to exclude common variants

Figure 2: Cancer-specific variant databases

Figure 3: Sequence Repositories

Figure 4: Other useful databases

Figure 5: The interpretations for BRAF V600E for Melanoma in this draft clinical report preview are all provided by Golden Helix CancerKB

Figure 6: Splice Site Prediction Algorithms

Figure 7: Functional Prediction Algorithms

Figure 8: Evidence-based variant categorization

Figure 9: Impact of vemurafenib

Figure 10: Selecting the Tumor Type for the Current Patient

Figure 11: Reviewing the NGS Sequencing Summary and Coverage

Figure 12: Reviewing Failed Target Regions and Must-call Sites

Figure 13: The BRAF V600E will be Reported, while a Germline Variant and Variant of Uncertain Significance are Also Present

Figure 14: The Oncogenicity Scoring Algorithm Results for BRAF V600E

Figure 15: The BRAF V600E Clinical Evidence Table

Figure 16: The BRAF V88888600E Interpretation for Drug Sensitivity

Figure 17: Reviewing the Biomarker Results Selected To Be Reported

Figure 18: Previewing the Resulting Report

Figure 19: Interpretation of an ERBB2 Amplification

Figure 20: Adding BCR-ABL1 Detected ABL1 as the Primary Gene

Figure 21: Viewing BCR-ABL1 Evidence for All Cancers