The Burden of Proof – What Laboratories Need to Know About Evidence Development Before Launching a New Test
by Lori Anderson, Gabriel Bien-Willner, and Patricia Goede
Introduction and Background
The essence of precision medicine is the notion of delivering targeted therapies predicted to have a high likelihood of therapeutic response based on an individual patient’s biomarker characteristics. The overall success of precision medicine programs relies upon the successful interactions of several healthcare stakeholder groups, including patients, treating physicians, laboratory test providers, pharmaceutical manufacturers, and payors.
Since the early 20th century, clinical laboratories have played a significant role in medicine, especially in the discovery and
classification of human ailments and in guiding clinical treatment. The sequencing of the human genome in 2003 has had a profound impact on the practice of modern-day medicine. Since 2003, gene-based laboratory testing has grown exponentially. With precision medicine, labs now find themselves at the center of a paradigm shift in the practice of medicine with an explosion of molecular tests at the forefront of translating scientific innovation into clinical practice. In 2018 Phillips et al reported that there were 75,000 genetic tests on the market with approximately 10 new tests being added each day.1 Over the past decade, however, labs have faced difficulties in communicating with payors and establishing coverage or equitable reimbursement for these services. While payors may indeed have a role in this process and its failures, laboratories must engage with and learn how payors make decisions regarding coverage and reimbursement to be successful.
Across the laboratory test provider industry there often appear to be gaps in the level of understanding or confusion regarding the evidentiary requirements necessary to demonstrate clinical value set by payors. The result of not understanding and failing to subsequently establish the requisite supporting evidence is that the tests will not be reimbursed. Often when reimbursement is denied, the billing laboratory initiates a laborious and costly process of appeals that may or may not be successful. If the denial is due to a lack of evidence (i.e., the lab test is unproven) then the appeals process will most often not be successful. The laboratory may then choose to write off the test bill as bad debt or they may choose to bill the patient to recoup some of the unrealized revenue. Neither of these options is ideal.
“Across the laboratory test provider industry there often appear to be gaps in the level of understanding or confusion regarding the evidentiary requirements necessary to demonstrate clinical value set by payors”
To minimize the risk of insurance denials due to tests being considered unproven, experimental, or investigational, labs must have
a well-thought-out reimbursement strategy before the launch of any new test. Successful reimbursement depends on labs understanding the evidentiary requirements and assessment processes that payors employ as well as having the necessary resources available to perform the types of studies required.
This paper will focus on the elements of evidence necessary for labs to consider for coverage. Unlike many other countries with centralized health technology assessment (HTA) agencies, the decentralized nature of medical insurance within the U.S. necessitates each payor to perform an independent objective evidence assessment of medical interventions including lab tests. While there is some sharing of evidence reviews between payors, there are different models and judgments used for performing those reviews, though the basic premise is similar. Each published study applicable to a particular lab test is evaluated for quality – including internal and external validity, representativeness, relevance, credibility, and bias. Next, the entire body of evidence is evaluated for quality, consistency, strength, and weight.
The concept of evidence-based reimbursement for lab testing is relatively new compared to the pharmaceutical industry.2 Many labs are only beginning to develop the rigorous processes needed for developing the evidence base necessary for favorable reimbursement coverage decisions. With the recent explosion of genomic tests, payors have also become more diligent in their requirements for lab test developers to provide evidentiary support for their tests by demonstrating that their tests are analytically sound, medically necessary, and have clinical utility.
In 2002-2004 the Office of Public Health Genomics (OPHG) of the Centers for Disease Control and Prevention (CDC) developed a model framework for evaluating the scientific data of emerging genetic tests.3 The CDC model consists of 4 domains including Analytical Validity (AV), Clinical Validity (CV), Clinical Utility (CU), and Ethical/Legal/Social Implications (ACCE; see Figure 1).3 While developed specifically for genetic tests, the model is generally applicable to other types of testing and has been broadly adopted by payors within the U.S. including the Centers for Medicare and Medicaid Services (CMS) and the Molecular Diagnostics Program (MolDX®)4 and health insurers in other countries. The model consists of 44 questions that address each of the 4 domains as they relate to the clinical use of the test in the context of a particular disorder.3 The ACCE framework is also referenced for evidence presentation of companion diagnostics within the Academy of Managed Care Pharmacy (AMCP) Format 4.1 for Formulary Submissions – the guidance for product dossier development used by pharmaceutical and medical device manufacturers for scientific evidence presentation to health care decision-makers.5
Evidence of AV, CV, CU, and economic value is most often presented in the form of a comprehensive product dossier such as that defined by the AMCP Format 4.1 guidance. Payors that follow MolDX evidence criteria will only provide coverage for tests when AV, CV, and CU have been demonstrated. To meet CMS’s criteria of “reasonable and necessary” (the scientific and medical precondition to be met for coverage within the defined categories of the Medicare program) the test must demonstrate CU in addition to meeting standards for AV and CV as part of the Technical Assessment process.4
Analytical validity addresses the technical performance characteristics of the test such as accuracy, reproducibility, sensitivity, specificity, cut off values, and meaningful ranges of detection. In layman’s terms, AV answers the question, “does this test accurately and reliably detect/measure the analyte it claims to detect/measure?” Clinical validity addresses the relationship between the test result and the condition to which it is associated. To establish CV, evidence is presented in the published literature to support, through rigorous scientific study, a convincing link between the analyte/test result and the relevant disease or disease state. Clinical utility is the domain that defines medical necessity. For a lab test to have CU, the test results must be actionable; from guiding therapeutic decision making to altering patient lifestyle to ending the diagnostic odyssey.
Next, the test results must also show proof that when an appropriate intended action is taken that patients have improved outcomes. Thus, the test demonstrates comparative effectiveness. The PICO (Population, Intervention, Comparator, Outcomes) model is a helpful tool for labs to use when designing clinical studies, organizing literature searches, and describing the intended use of a test.6 The PICO model has been adopted by some payors’ evidence assessment groups including the Blue Cross Blue Shield Association as a framework for evaluating the comparative effectiveness and clinical utility of drugs, devices, and lab tests compared to the current standard of care within a specific patient population.
Defining Intended Use
Laboratory tests fall into six broad categories including screening, risk, diagnostic, prognostic, predictive, and monitoring. Companion diagnostics are a type of predictive test utilized to identify eligible patients for specific therapies and are essential for the approved administration of that therapy. All categories of tests can lead to clinical actions that can result in better patient outcomes. However, the tests can only reliably perform under the same conditions wherein it was tested, developed, and validated, as well as for the specimen type and disease state for which it was intended to be deployed. Laboratories need to understand and articulate the intended use of their tests. The PICO model is a useful tool to aid labs in defining the role of their tests within the healthcare continuum. The model should be used for defining, documenting, and organizing clinical studies into a body of evidence.
Every lab test is intended to measure a specific analyte or analytes within a defined population. Populations may be broad such as when a hemoglobin test is used as a measure of anemia in males and females of any age group or extremely narrow such as an EGFR molecular variant test in a patient population with non-small cell lung cancer intended to guide tyrosine kinase inhibitor therapy. It is important that the lab precisely define the target population of the test and that the evidence developed represents the target population. Sociodemographic differences make it difficult to generalize results from narrowly defined groups to broader populations. Some tests have multiple narrowly defined populations such as BRCA1 and BRCA2 variant testing in several narrowly defined populations of individuals (e.g. patients with specific cancers types).
In the context of a PICO framework, the laboratory test is the “intervention” – not to be confused with an intended medical intervention undertaken to help treat a condition under the guidance of a medical professional. It is important for the lab to describe the test and how it fits into the current standard of care. It is also necessary to describe the test in the context of the comparator testing strategy. For some new tests, the comparator may be “no test” or “no testing strategy” or the comparator may be an existing test or testing strategy.
While there are many examples of newer tests demonstrating value compared to older ones, this is not always the case. For example, the added benefit of molecular testing for variants in the MTHFR gene responsible for elevated homocysteine levels known to be a risk factor in cardiovascular disease is uncertain compared to simply testing for homocysteine levels. In this example, MTHFR testing is considered the “intervention” and homocysteine testing is considered the “comparator” in the PICO framework. Not surprisingly MTHFR testing with the intent of evaluating cardiovascular disease risk (the “population”) is considered unproven and not covered by most payors because CV has not been established.
Clinical utility is established when the lab can unequivocally demonstrate that utilization of the test contributes to better patient outcomes. Measured outcomes are a direct result of an intended medical intervention, treatment, or therapy. While a laboratory test is not itself a medical intervention, the test results may directly or indirectly guide the use of a medical intervention. If the CU of the medical intervention is uncertain or unproven, i.e., it does not demonstrate better outcomes or clinical benefit, then the CU of the test is also uncertain. Thus, the test and the medical intervention are linked in a chain of evidence where CU for the medical intervention is a prerequisite for establishing CU for a laboratory test.
The improvement of a patient’s medical condition is one example of a better outcome. Another may be the avoidance of a toxic therapy that, as a result of the test, is not likely to have efficacy. Many payors have broader definitions of outcomes that include cost-effectiveness, as well as other values the intervention brings to the health system compared to the current standard of care. Laboratories may find the PICO framework is an excellent tool to leverage as they design their evidence development strategy in support of gaining coverage for their test portfolio.
Types of Evidence and Evidence Assessment
Payors consider and evaluate several sources for evidence, including peer-reviewed published literature, clinical practice guidelines such as the National Comprehensive Cancer Network (NCCN) guidelines for cancer, grey literature (non-peer reviewed whitepapers, marketing materials, product inserts, etc.), and the opinions of subject matter experts to establish if a test reaches the threshold of validity and utility necessary for coverage.
High quality, peer-reviewed publications are the gold standard. If the test is proprietary, then the lab test developer should participate in the design and execution of the clinical studies as warranted to complete a chain of evidence to satisfy AV, CV, and CU. If the test is a “me too” test, then the lab test developer may piggyback off the body of existing peer-reviewed publications if they can demonstrate CV and CU equivalence. Some tests such as gene classifiers for disease risk stratification may have similar intended use to the currently accepted standard-of-care but claim to classify patients more accurately into risk groups. In this scenario, the CU of the stratification approach is already established. Thus, the lab only needs to prove that the results of the classifier are accurately associated with the correct risk group, in other words, demonstrate CV.
“Regardless of study design chosen, the onus is on the test developer to convince the scientific community and subsequently the payor through documentation that their test is accurate and precise, that it measures a clinically-relevant analyte, and that its findings are necessary for treating clinicians to leverage its results for the proper treatment of their patients.”
There is no magic formula for how many published studies are necessary to persuade a payor that the test is worthy of coverage, rather the quality and strength of the evidence must speak for itself. The U.S. Preventative Services Task Force (USPSTF) Procedure Manual outlines the task force’s logic in how they go about evaluating the adequacy of the evidence including the appropriateness of the study design, the internal validity (or quality) of the studies, the relevance of the study population, the precision and consistency of study results, and the appropriateness of the conclusions drawn from the studies.,7 Other tools for assessing studies and evidence include the online CER Collaborative Comparative Effectiveness Research tool8 and the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) process.9
Clinical studies are often required to demonstrate the CV and CU of a medical service. There are two broad categories of study designs and methods, explanatory and pragmatic that may be appropriate in different settings – both categories with different levels and types of bias. Much attention is paid to double-blinded randomized clinical trials (RCT), a type of explanatory trial design considered by many to be a gold standard but can be both lengthy and costly. In the highly controlled environment of an RCT, the efficacy of an intervention (a drug, a device, or a diagnostic) can be demonstrated by limiting confounding variables as much as possible to focus on the impact on patient outcomes in the presence or absence of the medical service under investigation. In practice, RCTs have built-in comparators – either the standard of care or relative to criteria set upon approval to proceed with a trial.
Explanatory studies generally have highly selective patient enrollment criteria, use experienced investigators, extensive and often cumbersome monitoring, and are designed to confirm a physiological or clinical hypothesis. Therefore, results from such studies are typically much less generalizable whereas pragmatic trials are designed to inform clinical practice.10 Pragmatic trials are more accessible to more patients including patients with comorbidities; engage a larger cross-section of practitioners as investigators; prescribe less cumbersome monitoring; and include outcomes that are more relevant to patients.10 Pragmatic trials demonstrate the real-world effectiveness of a medical intervention which differs from efficacy established by well-designed and well-executed explanatory trials. Real-world evidence and pragmatic studies that demonstrate effectiveness have historically been perceived to have less internal validity, but more external validity compared to RCTs. The perceptions of study quality weigh into the overall weighting and strength of each study. However, it should be noted that RCTs also have drawbacks and are limited by the quality of study design (as are all studies) and reliance on relatively large patient populations that do not fit with, or is reasonable for, all clinical scenarios.11
The 21st Century Cures Act of 2016 was a shot-in-the-arm for real-world evidence (RWE) and pragmatic studies. The Act mandates the use of real-world data (RWD) as a way of bringing (primarily) pharmaceuticals and medical devices to market faster and more efficiently.12 Real-world data is derived from several sources including claims databases, biosensors, medical and health record systems (EMR/EHR), and patient registries. One of the challenges of current sources of RWD is that the systems where the data resides were designed with specific intent most often not associated with robust RWD collection for use in clinical studies and evidence collection. Thus, the data set is often incomplete. New approaches in structuring and sharing RWE are demonstrating effectiveness12 and this has likely impacted positive coverage determinations,14,15 though the need for establishing data standards and incentivizing data sharing remains. Other authors have written about these needs in a recent issue of this journal.16
Regardless of study design chosen, the onus is on the test developer to convince the scientific community and subsequently the payor through documentation that their test is accurate and precise, that it measures a clinically-relevant analyte, and that its findings are necessary for treating clinicians to leverage its results for the proper treatment of their patients.
Laboratory test developers must gain a proper understanding of, and commit the necessary resources to develop the evidence necessary to demonstrate the AV, CV, and CU required to gain payor coverage. The need for undertaking the types of clinical studies and to develop the evidence base while entrenched in the pharmaceutical and medical device industries is a relatively new concept to the laboratory industry (parallel drug-companion diagnostic release being a good example). Until recently, the field of health economics and outcomes research (HEOR) and market access have primarily focused on the pharmaceutical industry which has relatively long product life cycles and intellectual property protections, unlike the lab industry. Thus, while the understanding of market access is emerging within the lab industry, there is a critical shortage of resources, both personnel and financial to meet the current needs. Labs that have invested the resources (time, money, and expertise) and use the available tools like ACCE and PICO models to frame their presentation of scientific data to payors can be successful. The 21st Century Cure Act opens the door for laboratories to leverage new and potentially less expensive evidence development models.
Patricia Goede is VP Clinical Informatics at XIFIN, Inc., where she brings 22 years’ experience developing biomedical imaging informatics solutions and technology to facilitate multi-modality and multispecialty image-based exchange, collaboration and management in distributed environments. Goede founded VisualShare and served as CEO until its acquisition by XIFIN in 2015. Previously, Goede was at the University of Utah where she pioneered a number of image, visualization and collaboration tools. She is was the co-founder of the Electronic Medical Education Resource Group (EMERG), and as its director, established the Utah Center of Excellence for Electronic Medical Education. Goede holds an MS in Computational Visualization and a PhD in Biomedical Imaging Informatics.
Lori Anderson has over 25 years of experience in the diagnostic and life sciences laboratory industries. As a laboratory scientist, Lori has developed numerous clinical and translational research assays predominantly in the cardiology and oncology space. She has coauthored numerous publications and presentations. Lori has extensive experience in defining the evidentiary requirements necessary for insurance coverage of diagnostic tests. She comes to XIFIN from Quest Diagnostics where she was a Director of Health Economics and Outcomes Research where she focused on reimbursement for diagnostic testing. Lori graduated as a medical technologist from Fanshawe College and holds a BA in Administrative and Commercial Studies from Western University in London, Canada.
Gabriel A. Bien-Willner,
MD, PhD, FACP
Medical Director, MolDx, Chief
Medical Oﬃcer, Palmetto GBA
Dr. Bien-Willner is the Medical Director of the MolDX program at Palmetto GBA, a Medicare Administrative Contractor (MAC). MolDX seeks to understand the molecular testing landscape to implement payer controls, coverage, and to set policy for affiliated MACs, which currently cover 28 states. He is a leader in the Precision Medicine space and practices as a Board-certified Anatomic Pathologist and Molecular Genetic Pathologist. Throughout his career, he has been active in research, development, and advancement of molecular diagnostic services, specifically next generation sequencing. He has worked closely with clinicians to develop clear clinical diagnostic and treatment pathways directing Precision Medicine programs for community cancer centers. Dr. Bien- Willner received his MD and PhD degrees from Baylor College of Medicine, with a PhD in Human Molecular Genetics. He completed his residency, fellowship, and attained a faculty appointment at Washington University in St. Louis prior to leadership roles in laboratory and biotech companies before joining Palmetto GBA.
- Phillips KA, Deverka, PA, Hooker GW, Douglas, Genetic Test Availability And Spending: Where Are We Now? Where Are We Going? Health Aff. 2018 May;37(5):710-716. doi: 10.1377/hlthaff.2017.1427.
- Ramsey SD, Veenstra DL, Garrison LP, et Toward Evidence-based Assessment for Coverage and Reimbursement of Laboratory-based Diagnostic and Genetic Tests. Am J Manag Care. 2006 Apr;12(4):197 – 202. https://www.ajmc.com/journals/issue/2006/2006-04-vol12-n4/apr06-2281p197-202 Accessed May 1, 2020.
- CDC ACCE (https://www.cdc.gov/genomics/gtesting/acce/index.htm)
- Molecular Diagnostic Program (molDX®) Coverage, Coding, and Pricing Standards and Requirements (M00106) Palmetto GBA®, A CELERIAN GROUP COMPANY, A CMS Medicare Administrative Contractor, Version 26, Revised 12/16/19. https://palmettogba.com/Palmetto/moldx.Nsf/files/MolDX_Manual.pdf/$File/MolDX_Manual.pdf Accessed May 1, 2020
- AMCP Format for Formulary Submissions: Guidance on Submissions of Pre-approval and Post-approval Clinical and Economic Information and Academy of Managed Care Pharmacy, Version 4.1, 2019. http://www.amcp.org/sites/default/files/2019-12/AMCP_Format%204.1_1219_final.pdf Accessed May 6, 2020
- Chang SM, Matchar DB, Smetana GW, Umscheid Methods Guide for Medical Test Reviews, Agency for Healthcare Research and Quality (AHRQ). Publication No. 12-EHC017, June 2012. https://effectivehealthcare.ahrq.gov/sites/default/files/pdf/methods-guidance-tests_overview-2012.pdf Accessed May 1, 2020.
- Siu AL, S. Preventative Services Task Force Procedure Manual, December 2015. https://www.uspreventiveservicestaskforce.org/uspstf/procedure-manual Accessed May 1, 2020
- CER Collaborative – Comparative Effectiveness Research https://www.cercollaborative.org/global/default.aspx? Accessed May 1, 2020
- Guyatt G, Oxman AD, Akl EA, et al. GRADE Guidelines: 1. Introduction-GRADE Evidence Profiles and Summary of Findings Tables. J Clin epidemiol. 2011 Apr;64(4):383-94. doi:10.1016/j. jclinepi.2010.04.026
- Ford I, Norrie J. Pragmatic Trials. NEJM. 2016 August 4; 375:454-463. DOI: 10.1056/NEJMra1510059
- Sanson-Fisher RW, Bonevski B, Green, LW, D’Este. Limitations of the Randomized Controlled Trial in Evaluating Population-Based Health Interventions. Am J Prev Med. 2007 Aug;33(2):155-61. doi: 10.1016/j. amepre.2007.04.007
- 21st Century Cures Act. U.S. Food and Drug Administration https://www.fda.gov/regulatory-information/selected-amendments-fdc-act/21st-century-cures-act Accessed May 7, 2020
- Agarwala V, Khozin S, Singal G, et al. Real-World Evidence In Support Of Precision Medicine: Clinico-Genomic Cancer Data As A Case Study. Health Aff. 2018 May;37(5):765-772. doi:10.1377/hlthaff.2017.1579.
- NCD 90.2, CMS: National Coverage Determination (NCD) for Next Generation Sequencing (NGS) (90.2) https://www.cms.gov/medicare-coverage-database/details/ncd-details.aspx?NCDId=372&ncdver accessed June 1, 2020
- L38238, MolDX: Predictive Classifiers for Early Stage Non-Small Cell Lung Cancer. https://www.cms.gov/medicare-coverage-database/details/lcd-details.aspx?LCDId=38238&ver
Accessed May 1, 2020
- Goede P, Anderson L, Borsato E. How Data and Informatics Intersect to Enable Precision Medicine to Reach its Full Potential. The Journal of Precision Medicine. December 2019. https://www.thejournalofprecisionmedicine.com/wp-content/uploads/2019/12/jpm419-Goede.pdf Accessed May 1, 2020
The views expressed in this article are generic in nature and do not necessarily reflect the coverage process under the MolDx Program.