Clinical Diagnostic Gene Expression Thyroid Testing




Thyroid fine-needle aspiration biopsies are cytologically indeterminate in 15% to 30% of cases. When cytologically indeterminate thyroid nodules undergo diagnostic surgery, approximately three-quarters prove to be histologically benign. A negative predictive value of more than or equal to 94% for the Afirma Gene Expression Classifier (GEC) is achieved for indeterminate nodules. Most Afirma GEC benign nodules can be clinically observed, as suggested by the National Comprehensive Cancer Network Thyroid Carcinoma Guideline. More than half of the benign nodules with indeterminate cytology (Bethesda categories III/IV) can be identified as GEC benign and removed from the surgical pool to prevent unnecessary diagnostic surgery.


Key points








  • Fifteen to 30% of thyroid fine-needle aspiration biopsies are cytologically indeterminate.



  • When cytologically indeterminate thyroid nodules undergo diagnostic surgery, approximately three-quarters prove to be benign.



  • The Afirma Gene Expression Classifier (GEC) achieved a risk of malignancy of 6% or less on an independent set of 265 prospectively collected cytologically indeterminate nodules when the molecular results were compared with the blinded gold standard central expert histopathology diagnosis.



  • The National Comprehensive Cancer Network Thyroid Carcinoma Guideline states that cytologically indeterminate thyroid nodules determined to have a malignancy risk of ∼5% or less with a molecular test can be clinically observed.



  • For GEC-tested patients, published clinical utility studies demonstrate that approximately half of those with indeterminate cytology (Bethesda III/IV) avoid diagnostic thyroid surgery.






Thyroid cancer multigene expression classifiers: what the surgeon should know


Introduction


Before the advent of thyroid nodule fine-needle aspiration biopsy (FNAB), thyroid nodules were routinely referred for diagnostic surgery because of their 5% to 15% risk of malignancy (ROM). FNAB decreased diagnostic thyroidectomies by one-half because most FNABs are diagnosed as cytologically benign. Still, 15% to 30% of thyroid FNABs are cytologically indeterminate (ie, not clearly benign nor malignant). When cytologically indeterminate thyroid nodules undergo diagnostic surgery, approximately three-quarters prove to be histologically benign. Therefore, patient care could be significantly improved with genomic diagnostic technologies that accurately reclassify these samples as benign with high enough negative predictive value (NPV) to safely avoid the costs and risks of diagnostic thyroid surgery. In choosing which genomic test to order, the surgeon should insure that peer-reviewed publications exist that define the test’s clinical and analytical validity, and most importantly, its clinical utility.


Currently, the Afirma gene expression classifier (GEC) (Veracyte Inc, South San Francisco, CA, USA) is used in cytologically indeterminate nodules (Bethesda III and IV) to reclassify them as benign nodules and to avoid diagnostic surgery. Table 1 lists the Bethesda cytologic category definitions. By accurately excluding malignancy when the test result is benign, the Afirma GEC is known as a “rule-out” test. In addition, it identifies rare neoplasms that are often difficult to diagnose accurately with cytology, such as medullary thyroid cancer (MTC), parathyroid neoplasms, and certain metastases to the thyroid. Given the wealth of published data regarding the Afirma GEC’s clinical validity, analytical validity, and clinical utility, patients should not undergo thyroid surgery for solely diagnostic reasons for lower risk cytologically indeterminate thyroid nodules (Bethesda III and IV) without the physician and patient considering the role of Afirma GEC testing. In the surgical author’s practice, approximately half of the patients with cytologically indeterminate nodules chose to pursue surgery over additional testing. Younger patients, and those with a higher ROM based on cytology (Bethesda V vs Bethesda III/IV), were more likely to elect surgery. For those who chose GEC testing, half avoided thyroid surgery, similar to what was found in 2 multicenter clinical utility studies of Afirma.



Table 1

Performance of the Afirma GEC














































































Bethesda Categories III–V (n = 265)
GEC result Malignant reference standard (n = 85) Benign reference standard (n = 180) Sensitivity, 92% [84–97]
Specificity, 52% [44–59]
PPV, 47% [40–55]
NPV, 93% [86–97]
%FN results, 2.6%
ROM, 32%
Suspicious 78 87
Benign 7 93
Bethesda Category III: Atypia of undetermined significance/Follicular lesion of undetermined significance (n = 129)
GEC result Malignant reference standard (n = 31) Benign reference standard (n = 98) Sensitivity, 90% [74–98]
Specificity, 53% [43–63]
PPV, 38% [27–50]
NPV, 95% [85–99]
%FN results, 2.3%
ROM, 24%
Suspicious 28 46
Benign 3 52
Bethesda Category IV: Follicular or Hürthle cell neoplasm/Suspicious for follicular neoplasm (FN/SFN) (n = 81)
GEC result Malignant reference standard (n = 20) Benign reference standard (n = 61) Sensitivity, 90% [68–99]
Specificity, 49% [36–62]
PPV, 37% [23–52]
NPV, 94% [79–99]
%FN results, 2.5%
ROM, 25%
Suspicious 18 31
Benign 2 30
Bethesda Category V: Suspicious for malignancy (n = 55)
GEC result Malignant reference standard (n = 34) Benign reference standard (n = 21) Sensitivity, 94% [80–99]
Specificity, 52% [30–74]
PPV, 76% [61–88]
NPV, 85% [55–98]
%FN results, 3.6%
ROM, 62%
Suspicious 32 10
Benign 2 11
Bethesda Category II: Cytopathology benign (n = 47)
GEC result Malignant reference standard (n = 3) Benign reference standard (n = 44) Sensitivity, 100% [29–100]
Specificity, 70% [55–83]
PPV, 19% [5–46]
NPV, 100% [86–100]
%FN results, 0%
ROM, 6%
Suspicious 3 13
Benign 0 31

Abbreviations: ROM, risk of malignancy (malignancy prevalence); %FN, percentage false negative.

Adapted from Alexander EK, Kennedy GC, Baloch ZW, et al. Preoperative diagnosis of benign thyroid nodules with indeterminate cytology. N Engl J Med 2012;367(8):710.


Cost, Morbidity, and Risk of Mortality from Surgery


The average direct costs of hemithyroidectomy and total thyroidectomy are conservatively estimated at greater than $6000 and $11,000, respectively. However, the range of costs for these procedures at various inpatient facilities exceeds $20,000 and $25,000, respectively. Nevertheless, the costs of diagnostic surgery include more than just the direct cost of the procedure. Estimates of costs of diagnostic thyroidectomy should include surgical complications, as well as indirect costs due to time lost from work and responsibilities of daily living (eg, child care, cooking, cleaning), impaired quality of life (the fear of potentially having cancer and the anxiety of undergoing diagnostic surgery, postoperative pain and recovery, and the potentially impaired quality of life from iatrogenic hypothyroidism with, or without, a normal serum TSH).


Although thyroid lobectomy is increasingly performed as an outpatient procedure with excellent outcomes in experienced hands, the outcomes from thyroidectomy overall can be sobering. Thyroid surgery is associated with a perioperative mortality of 0.1% to 0.2%, with rates as high as 0.5%. Serious or permanent nonlethal complications of thyroidectomy include hypocalcemia, recurrent and/or superior laryngeal nerve damage, rebleeding, and wound infection. The frequency of these complications is underappreciated and is strongly related to surgeon experience (volume) and expertise. One study reported that 11% of patients undergoing thyroidectomy or parathyroidectomy required a visit to the emergency room at least once within 30 days of surgery, and nearly one-quarter of these patients required hospitalization. Complication rates from thyroid surgery may be much higher in population-based series than in high-volume single academic center series. For example, complications in the state of Maryland were found to be 10.1% for surgeons who did between 1 and 9 cases per year, and 5.9% for surgeons who did more than one hundred cases per year. In fact, more than 50% of thyroid surgeries in the United States are performed by surgeons who do 5 or fewer cases per year, placing many patients at increased risk of complications.


Given the costs and potential complications of diagnostic surgery, efforts to avoid unnecessary surgery are in the best interests of the patient, the health care system, and the physician.


Regulation of Molecular Diagnostic Tests


Laboratory developed tests are currently regulated by the Clinical Laboratory Improvement Amendments (CLIA). Congress passed CLIA in 1988 to establish quality standards for all laboratory testing to ensure the accuracy, reliability, and timeliness of patient test results, regardless of where the test was performed.


In 2004, the Office of Public Health Genomics (OPHG) of the Centers for Disease Control and Prevention (CDC) recognized a critical need for providing guidance to health care providers and patients on the appropriate use of the genomic tests. In response, the OPHG launched Evaluation of Genomic Applications in Practice and Prevention (EGAPP), the first federal evidence-based initiative to address genomic testing specifically. The EGAPP Working Group was established by the National Office of Public Health Genomics at the CDC to standardize evaluation of the rapidly emerging array of genomics diagnostic tests. EGAPP adapts existing evidence review methods to the systematic evaluation of genomic tests and links scientific evidence to clinical recommendations for the use of genomic tests.


In 2009, EGAPP developed a set of methods based on the evaluation of analytical validity, clinical validity, clinical utility, and, to some extent, the ethical, legal, and social implications of each test. Analytic validity includes analytic sensitivity (detection rate of a known positive), analytic specificity (1 − false positive rate), reliability (eg, repeatability of test results), and assay robustness (eg, resistance to small changes in preanalytic or analytic variables such as reagent variability or interfering substances). EGAPP defines the clinical validity of a genetic test as its ability to predict accurately and reliably the clinically defined disorder or phenotype of interest (eg, benign vs malignant nodule). Clinical validity encompasses clinical sensitivity and specificity, and predictive values of positive and negative tests that take into account the disorder prevalence. Finally, the clinical utility of a genetic test is the evidence of improved measurable clinical outcomes, and its usefulness and added value to patient management decision-making compared with current management without genetic testing. For example, a test can have excellent analytical and clinical validity, but if patient outcome is not improved, then clinical utility is not established.


Unfortunately, most physicians and specialty societies do not rigorously evaluate tests as proposed by EGAPP. They often assume that if the test is available, then analytic and clinical validity, as well as clinical utility, have been established. Unfortunately, most suppliers of genomic tests fail to perform robust analytical validation studies or the clinical utility studies necessary to demonstrate improved health care outcomes. Insurance company payers regard clinical utility studies as a key determinant of whether a test is medically necessary and deny paying for new genomic tests because of the lack of demonstrated clinical utility for more than any other reason. Of the commercially available products for molecular testing of thyroid nodules, only the Afirma GEC has published evidence of analytical validity, clinical validity, and clinical utility ( Fig. 1 ), which has resulted in widespread payer coverage for the Afirma GEC.




Fig. 1


Published results for Bethesda III/IV FNAB molecular diagnostics. Clinical validity requires performance characteristics in the intended use samples (eg, Bethesda III and IV cytology specimens). The EGAPP Working Group developed a set of methods based on the evaluation of analytical validity, clinical validity, clinical utility, and, to some extent, the ethical, legal, and social implications of each test. a Clinical utility here refers to the decision to elect clinical observation of the nodule in lieu of diagnostic surgery.

( Data from Refs. )


Veracyte Afirma GEC


The Afirma GEC is based on the measurement of messenger RNA (mRNA) expression. There are 2 key advantages to using RNA instead of DNA for test development. First, although there are only approximately 23,000 known protein-coding DNA genes, each of these may be transcribed into multiple alternatively spliced variants, with more than 240,000 known mRNA isoforms. Disease-causing alterations in the DNA generally exert their effects, at least partially, on the transcriptome. Therefore, mRNA transcript measurement provides an amplification of the effects caused by upstream changes in the DNA blueprint that are quite a bit more difficult to identify without large-scale de novo sequencing. Second, gene expression may be impacted by lifestyle and environmental factors so mRNA gene expression reflects additional information not discernible from DNA analysis alone. The quantification of mRNA expression captures upstream DNA point mutations and gene rearrangements, as well as the actions of microRNAs, which may regulate gene expression. This approach avoids the limitation that common DNA mutations are not present in many cytologically indeterminate nodules. In fact, the most common DNA mutations are so low in frequency in Bethesda III/IV nodules that 7 individuals must be tested to obtain one gene mutation–positive result. Similarly, benign thyroid nodules may carry DNA mutations. Transcriptional analysis assists in identifying gene signatures that reflect whole patterns of pathway activation versus analysis of a small number of genes.


The GEC was developed and validated clinically to identify preoperatively histologically benign nodules among those with indeterminate cytology. Instead of relying on genes previously identified in the literature, analysis of the whole genome (transcriptome) was used to identify candidate genes, and support vector machine learning methods were used to develop the classifier algorithm. By preoperatively identifying patients with cytologically indeterminate nodules who are at low risk of having cancer, clinical and sonographic follow-up may be recommended in lieu of diagnostic surgery, thus ending the diagnostic odyssey ( Fig. 2 ). This approach answers the question of whether one should operate or observe an indeterminate nodule, as opposed to “rule-in” tests, which may be used to answer the question of whether a total versus hemithyroidectomy should be performed. Gene mutation testing and gene expression profiles have been developed to answer the latter question. The Afirma GEC analysis is indicated only for nodules with indeterminate cytology and is not performed on cytologically benign, malignant, or nondiagnostic (insufficient) FNAB samples.




Fig. 2


Implementing the Afirma GEC into clinical practice. ∗Cytologically indeterminate nodules are Bethesda categories III and IV.

( Data from Refs. )


The Afirma GEC test is performed in Veracyte’s CLIA-certified clinical laboratory. The molecular classifier proceeds in a stepwise fashion, first applying 6 cassettes before applying the final benign versus malignant classifier. These cassettes differentiate specific uncommon neoplasm subtypes that are often missed by cytology and act as filters that halt further sample processing if any cassette returns a “suspicious” result. These cassettes classify samples representing (1) malignant melanoma, (2) renal cell carcinoma, (3) breast carcinoma, (4) parathyroid tissue, and (5) medullary thyroid carcinoma. A final cassette (6) was also trained using Hürthle cell adenomas and carcinomas to identify Hürthle cell neoplasms versus Hürthle cell changes or features related to thyroiditis or hyperplasia. Failing to trigger one of these cassettes, the GEC evaluates the expression of 142 genes that are used in a proprietary mathematical algorithm to classify indeterminate thyroid nodule FNABs as either “benign” or “suspicious.” The genes used in the cassettes and main GEC classifier are published.


Analytical Validity


The GEC performance was evaluated in a series of 43 individual reagent and analytical verification studies. Extensive reagent and analytical performance studies were conducted to evaluate the reliability and reproducibility of the GEC under a variety of experimental and clinical conditions, with robust and highly reproducible results. Interfering substances, including human blood and genomic DNA, were not found to interfere with extraction or amplification steps of the assay. Analytical sensitivity studies demonstrated tolerance to variations in RNA input across the range of 5 ng to 25 ng, as well as to dilution of malignant FNAB material down to 20% with FNAB material from lymphocytic thyroiditis and nodule hyperplasia. Analytical sensitivity and specificity studies with blood (up to 83%) and genomic DNA (30%) demonstrated negligible assay interference, although false positive results could result from very bloody FNABs. FNAB preservative solution maintained high quality and quantity of RNA material under various stressed time, temperature, and shipping conditions with no significant effect on GEC scores, or “benign” versus “suspicious” calls (100% concordance). Based on these data, room-temperature storage at the clinical site and chilled-box shipping were verified for routine practice.


Clinical Validity and Clinical Practice Experience


The initial clinical validation publication of the Afirma GEC was performed on an independent sample set of cytologically indeterminate thyroid nodule FNABs within a prospective multicenter, double-blind study design. The Afirma GEC achieved high sensitivity and NPV. After further optimization, the GEC was validated in a second larger independent sample set in a prospective multicenter validation study. Using independent test sets is essential to demonstrating that the GEC algorithm is not overtrained. The second study included the largest ever prospectively collected set of thyroid FNAB biopsies from 3789 unique patients. Based on the expected 24% prevalence of malignancy in cytologically indeterminate samples in clinical practice, a 95% NPV for the Afirma GEC was achieved on an independent sample set of 265 cytologically indeterminate nodules when the molecular results were compared with blinded gold standard central expert histopathology diagnosis.


Analysis of atypia of undetermined significance versus follicular lesion of undetermined significance samples, both of which are Bethesda category III lesions, found no difference in sensitivity or specificity (RT Kloos, unpublished data, 2013). Overall, the ROM for a thyroid nodule with Bethesda categories III and IV indeterminate cytology with an Afirma GEC Benign classifier result is about 5% (see Table 1 ). This risk is comparable to the 6% to 8% cancer risk for an operated thyroid nodule with a benign cytology diagnosis (see Fig. 2 ; Fig. 3 , see Table 1 ), which demonstrates that cytologically indeterminate nodules (Bethesda categories III and IV) with an Afirma GEC benign diagnosis can be managed as would a cytologically benign nodule, as suggested by the National Comprehensive Cancer Network Thyroid Carcinoma Guideline.




Fig. 3


Afirma GEC reclassifies cytologically indeterminate thyroid nodules with a benign genomic signature to GEC benign. ROM is 1 − NPV.

( Adapted from Alexander EK, Kennedy GC, Baloch ZW, et al. Preoperative diagnosis of benign thyroid nodules with indeterminate cytology. N Engl J Med 2012;367(8):705–15.)


Unlike sensitivity and specificity, which are unaffected by the prevalence of cancer, positive predictive values (PPV) and NPV are influenced by the ROM in the cohort being evaluated. This influence complicates comparing the NPVs of different tests when they are described on different cohorts with different prevalences of cancer. One statistic to simplify such a comparison is the likelihood ratio. The formula for the likelihood ratio of a negative test (LR−) is (1 − sensitivity)/specificity. The result should be between 0 and 1. A result of 1 indicates that the result is just as likely in those with the disease as in those without the disease, and it adds no value. Conversely, the test closest to 0 has the greater resolving power to exclude a condition. For the GEC, the likelihood ratio of a negative test in Bethesda III + IV is 0.19, whereas the comparable LR− of mutational markers reported by Nikiforov and colleagues is 0.42.


Specificity is the percentage of truly benign nodules identified by the test as benign. Cytologically indeterminate thyroid nodules have 0% specificity because they are not identified as benign with the microscope. Thus, the GEC raises the pretest specificity from 0% for cytologically indeterminate categories to 52% posttest, indicating that over half of the benign nodules from Bethesda categories III and IV can be identified and removed from the surgical pool. The sensitivity and specificity performance of the GEC were comparable across Bethesda III–V cytologies (see Table 1 ). However, given the higher prevalence of malignancy in Bethesda category V nodules (suspicious for malignancy), the NPV is lowered to 85% (see Fig. 3 , Table 1 ). Thus, although the ROM is reduced from an initial 62% based on the cytologic category to 15% when the GEC is benign, surgery may not be avoidable based on the residual ROM. These Bethesda V nodules are therefore not routinely tested with the GEC. However, some physicians specifically request the GEC be performed in this cytologic category to screen the sample for rare neoplasms, such as medullary thyroid carcinoma. In addition, with an NPV of 85% on Bethesda category V nodules when the GEC is benign, clinicians may use the GEC information to offer a hemithyroidectomy (with possible completion total thyroidectomy) as opposed to an up-front total thyroidectomy.


Several groups have now reported their clinical experience with the Afirma GEC in routine clinical practice (B Michael, personal communication to RT Kloos, 2013). In the 2 largest series, the GEC result was benign just over half the time, and in this case, patients were managed with observation in lieu of operation 92% to 94% of the time. Defining the number needed to test (NNT) as the number of tests needed to be performed to change the clinical outcome of one patient (NNT = 1/[%GEC benign]), then the NNT of these series is 2. Thus, one patient avoids surgery for every 2 patients tested.


Most GEC benign patients in the clinical series reported to date did not undergo surgery, consistent with the purpose of the test (B Michael, personal communication to RT Kloos, 2013). Attempts to validate the GEC test performance on a small set of patients will necessarily result in very wide confidence intervals that are therefore uninterpretable. However, performance can be evaluated among these 654 GEC-tested patients by pooling them together and considering as malignant (false negatives) those GEC benign patients with malignancy found at surgery (4 patients), and as benign (true negatives) those GEC benign patients that underwent surgery and were histologically benign, or were GEC benign and not operated (305 patients). Among these GEC-tested patients across multiple clinical practices, the pooled accuracy of a GEC benign result (NPV) was 99% (95% confidence interval [CI] 96%–100%) ( Fig. 4 ) (B Michael, personal communication to RT Kloos, 2013). Furthermore, the prevalence of malignancy in the GEC suspicious patients (PPV) was 37% (95% CI 32%–43%). Only an adequately trained classifier would demonstrate such reproducible results across 2 prospective independent validation studies and these multiple retrospective analyses of clinical experiences that span both academic-based and community-based practices. Given these consistent results, performance remains high after pooling these clinical practice experiences with the Alexander and colleagues prospective clinical validation trial to evaluate the combined experience across these 864 patients (NPV 98% [95% CI 96%–99%], PPV 37% [95% CI 33%–42%]) (see Fig. 4 ). These data demonstrate a very low prevalence of malignancy (1 − NPV) in patients with cytologically indeterminate thyroid nodules that are Afirma GEC benign and support clinical observation in lieu of diagnostic surgery for most GEC benign patients. Of note, without surgical “truth,” a limitation of ascertaining true negative status is the variable length of nodule follow-up in the clinical practice studies reported in this pooled analysis. Thus, the meta-analytic NPV may be lower than 98% and clinical observation along with periodic ultrasound (US) assessment of unoperated GEC benign nodules is warranted. Such clinical observation is warranted as well for unoperated cytologically benign nodules given their residual ROM. However, Lee and colleagues found no improved detection of malignancies among cytologically benign nodules when followed longer than 3 years and suggested that stopping routine follow-up after this duration of time should be considered. This consideration may also apply to cytologically indeterminate nodules that are GEC benign. Last, because NPV is impacted by the prevalence of malignancy, and the ROM in the pooled analysis was only 21.3%, the authors calculated the NPV for a higher ROM of 25%. This meta-analytic NPV for the pooled analysis adjusted to 25% ROM remained high at 97% (95% CI 95%–99%).




Fig. 4


Consistent Afirma GEC results in real-world clinical experiences recapitulate the clinical validation experience. Outside of the clinical validation experience, the NPV calculation here defines false negatives as GEC benign and histologically malignant, and true negative as GEC benign and histologically benign or unoperated. Statistical calculation method per Newcombe. a Data include Bethesda III and IV.

( Data from Refs. ; B Michael, personal communication to RT Kloos, 2013.)


For the surgeon, in addition to the risk factors of the individual patient, these data inform their clinical decisions for both GEC benign and suspicious patients. For GEC suspicious patients, the ultimate decision is typically between a hemithyroidectomy and a total thyroidectomy. This decision takes into account multiple factors, including imaging findings in the contralateral lobe, the potential value to spare the contralateral thyroid lobe, and the patient’s preferences regarding the possible need for a completion thyroidectomy should malignancy be found after hemithyroidectomy. Given the higher risks of laryngeal nerve injury and hypoparathyroidism with a total thyroidectomy versus hemithyroidectomy, and the PPV of the GEC suspicious result, hemithyroidectomy is often preferred when the Bethesda cytology risk is lower (categories III and IV). In contrast, total thyroidectomy is often advised for the higher-risk Bethesda category V.


Optimally, physicians routinely collect 2 extra passes for potential molecular testing with the Afirma GEC on every FNAB they perform, or have on-site rapid cytologic assessment so that the GEC can be collected on every patient with indeterminate cytology during one patient visit (see Fig. 2 ). This patient-centric approach avoids the inconvenience, delayed diagnosis, and costs associated with repeating the FNAB when the first FNAB comes back indeterminate. In addition, it is well known that cytologically indeterminate nodules may not be categorized as indeterminate if they undergo a repeat FNAB. At first glance, a repeat FNAB might seem like a good idea to potentially restratify cytologically indeterminate patients as either cytologically benign or malignant. Unfortunately, this creates a clinical problem for patients whose repeat FNAB is cytologically benign, because their ROM may not be fully reduced to the same risk as if their first FNAB had been cytologically benign. The conundrum is accentuated by the fact that the GEC is not indicated for cytologically benign material because this is not cost-effective due to the low PPV that results from the low prevalence of malignancy in this setting, and because the GEC specificity of 70% for cytologically benign nodules will predictably result in a false positive GEC suspicious call in too many cytology-benign nodules. For these reasons, it is recommended that the GEC specimen be collected at the same time as the cytology sample during the first thyroid FNAB. When GEC testing is desired in a patient for whom only cytology was previously collected, the cytology must be repeated along with the GEC collection.




Thyroid cancer multigene expression classifiers: what the surgeon should know


Introduction


Before the advent of thyroid nodule fine-needle aspiration biopsy (FNAB), thyroid nodules were routinely referred for diagnostic surgery because of their 5% to 15% risk of malignancy (ROM). FNAB decreased diagnostic thyroidectomies by one-half because most FNABs are diagnosed as cytologically benign. Still, 15% to 30% of thyroid FNABs are cytologically indeterminate (ie, not clearly benign nor malignant). When cytologically indeterminate thyroid nodules undergo diagnostic surgery, approximately three-quarters prove to be histologically benign. Therefore, patient care could be significantly improved with genomic diagnostic technologies that accurately reclassify these samples as benign with high enough negative predictive value (NPV) to safely avoid the costs and risks of diagnostic thyroid surgery. In choosing which genomic test to order, the surgeon should insure that peer-reviewed publications exist that define the test’s clinical and analytical validity, and most importantly, its clinical utility.


Currently, the Afirma gene expression classifier (GEC) (Veracyte Inc, South San Francisco, CA, USA) is used in cytologically indeterminate nodules (Bethesda III and IV) to reclassify them as benign nodules and to avoid diagnostic surgery. Table 1 lists the Bethesda cytologic category definitions. By accurately excluding malignancy when the test result is benign, the Afirma GEC is known as a “rule-out” test. In addition, it identifies rare neoplasms that are often difficult to diagnose accurately with cytology, such as medullary thyroid cancer (MTC), parathyroid neoplasms, and certain metastases to the thyroid. Given the wealth of published data regarding the Afirma GEC’s clinical validity, analytical validity, and clinical utility, patients should not undergo thyroid surgery for solely diagnostic reasons for lower risk cytologically indeterminate thyroid nodules (Bethesda III and IV) without the physician and patient considering the role of Afirma GEC testing. In the surgical author’s practice, approximately half of the patients with cytologically indeterminate nodules chose to pursue surgery over additional testing. Younger patients, and those with a higher ROM based on cytology (Bethesda V vs Bethesda III/IV), were more likely to elect surgery. For those who chose GEC testing, half avoided thyroid surgery, similar to what was found in 2 multicenter clinical utility studies of Afirma.



Table 1

Performance of the Afirma GEC














































































Bethesda Categories III–V (n = 265)
GEC result Malignant reference standard (n = 85) Benign reference standard (n = 180) Sensitivity, 92% [84–97]
Specificity, 52% [44–59]
PPV, 47% [40–55]
NPV, 93% [86–97]
%FN results, 2.6%
ROM, 32%
Suspicious 78 87
Benign 7 93
Bethesda Category III: Atypia of undetermined significance/Follicular lesion of undetermined significance (n = 129)
GEC result Malignant reference standard (n = 31) Benign reference standard (n = 98) Sensitivity, 90% [74–98]
Specificity, 53% [43–63]
PPV, 38% [27–50]
NPV, 95% [85–99]
%FN results, 2.3%
ROM, 24%
Suspicious 28 46
Benign 3 52
Bethesda Category IV: Follicular or Hürthle cell neoplasm/Suspicious for follicular neoplasm (FN/SFN) (n = 81)
GEC result Malignant reference standard (n = 20) Benign reference standard (n = 61) Sensitivity, 90% [68–99]
Specificity, 49% [36–62]
PPV, 37% [23–52]
NPV, 94% [79–99]
%FN results, 2.5%
ROM, 25%
Suspicious 18 31
Benign 2 30
Bethesda Category V: Suspicious for malignancy (n = 55)
GEC result Malignant reference standard (n = 34) Benign reference standard (n = 21) Sensitivity, 94% [80–99]
Specificity, 52% [30–74]
PPV, 76% [61–88]
NPV, 85% [55–98]
%FN results, 3.6%
ROM, 62%
Suspicious 32 10
Benign 2 11
Bethesda Category II: Cytopathology benign (n = 47)
GEC result Malignant reference standard (n = 3) Benign reference standard (n = 44) Sensitivity, 100% [29–100]
Specificity, 70% [55–83]
PPV, 19% [5–46]
NPV, 100% [86–100]
%FN results, 0%
ROM, 6%
Suspicious 3 13
Benign 0 31

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Apr 1, 2017 | Posted by in OTOLARYNGOLOGY | Comments Off on Clinical Diagnostic Gene Expression Thyroid Testing

Full access? Get Clinical Tree

Get Clinical Tree app for offline access