Key Findings and Strength of Evidence

For single-test accuracy, we summarized results from relevant systematic reviews to estimate the accuracy of each imaging modality. With regards to diagnosis and judging resectability in patients with unstaged disease, we drew the following conclusions:

  • For diagnosis using multidetector computed tomography (MDCT), one systematic review yielded a sensitivity estimate of 91 percent (95% confidence interval [CI], 86% to 94%) and a specificity estimate of 85 percent (95% CI, 76% to 91%). (Strength of evidence from published systematic reviews was not graded.)
  • For diagnosis using endoscopic ultrasound with fine-needle aspiration (EUS-FNA), four high-quality and recent systematic reviews yielded sensitivity estimates ranging from 85 percent to 93 percent and specificity estimates ranging from 94 percent to 100 percent. (Strength of evidence from published systematic reviews was not graded.)
  • For diagnosis using magnetic resonance imaging (MRI), three systematic reviews yielded sensitivity estimates of 84 percent to 86 percent and specificity estimates of 82 percent to 91 percent. (Strength of evidence from published systematic reviews was not graded.)
  • For diagnosis using positron emission tomography–computed tomography (PET/CT), three systematic reviews yielded sensitivity estimates of 87 percent to 90 percent and specificity estimates of 80 percent to 85 percent. (Strength of evidence from published systematic reviews was not graded.)
  • In assessing the resectability of the cancer in patients with unstaged disease using MDCT, one systematic review yielded a sensitivity estimate of 81 percent (95% CI, 76% to 85%) and a specificity estimate of 82 percent (95% CI, 77% to 97%). (Strength of evidence from published systematic reviews was not graded.)
  • In assessing the resectability of cancer in patients with unstaged disease using MRI, one systematic review yielded a sensitivity estimate of 82 percent (95% CI, 69% to 91%) and a specificity estimate of 78 percent (95% CI, 63% to 87%). (Strength of evidence from published systematic reviews was not graded.)

Also for single-test accuracy, regarding staging and judging resectability in patients whose disease has been staged, we drew the following conclusions:

  • Two systematic reviews that were not assessed as high quality reported on computed tomography (CT) for assessing vascular invasion. Both concluded that sensitivity and specificity were worse for the subset of studies using older or single-slice CT scanners than for the studies using newer multi-slice CT. Summary sensitivity values for the newer scanners ranged from 80 percent to 85 percent and summary specificity ranged from 82 percent to 97 percent. The evidence base in both reviews was small: four or five studies each. (Strength of evidence from published systematic reviews was not graded.)
  • One systematic review that was not high quality reported on magnetic resonance (MR) for assessing vascular invasion, concluding it had sensitivity of 63 percent and specificity of 93 percent. The evidence base was only four studies. (Strength of evidence from published systematic reviews was not graded.)
  • One review of PET/CT included only a single study, which had reported 82 percent sensitivity and 97 percent specificity for detecting liver metastasis. (Strength of evidence from published systematic reviews was not graded.)

For comparative accuracy, our conclusions appear in Table 15. For diagnosis, we found evidence to support the claim that MDCT and MRI are similarly accurate. We also concluded that MDCT and EUS-FNA are similarly accurate when determining whether an unstaged tumor can be resected. This is an important finding, because MDCT is the standard method of evaluating suspected pancreatic cancer and is often used to guide the key clinical decision of whether or not to operate. Using EUS-FNA instead of MDCT for this purpose would have no impact on the rates of appropriate resection, but it could alter other aspects such as procedural harms (fewer iatrogenic cancers, more iatrogenic pancreatitis and postprocedural pain). We note that surgical planning clearly requires more than just EUS-FNA, and so surgery would not be performed based on EUS-FNA alone.

Turning to staging, we found T (tumor) staging was better with EUS-FNA than MDCT. This may mean more accurate planning of neoadjuvant therapy if EUS-FNA is used. However, as above, this is less important since the key clinical issue is resectability. Two key factors in determining resectability are involvement of blood vessels and the presence of metastatic disease. Regarding the involvement of blood vessels we found that MDCT and MRI are similarly accurate. Regarding detection of metastatic disease we found that PET/CT has a slight advantage over MDCT (statistically significant advantage in specificity and a slight advantage in sensitivity). We note that both technologies had poor accuracy in detecting metastases (sensitivities of 57% for MDCT and 67% for PET/CT) but were quite good at ruling out metastases (specificities of 91% for MDCT and 100% for PET/CT).

Table 15. Summary of conclusions on comparative diagnostic and staging accuracy.

Table 15

Summary of conclusions on comparative diagnostic and staging accuracy.

The literature describes different procedural harms associated with each of the common modalities examined in the diagnosis and staging of pancreatic adenocarcinoma. MDCT and PET/CT both involve radiation and, therefore, can cause cancer. However, it is not possible to quantify the risk specific to the use of these tests for the diagnosis or staging of pancreatic adenocarcinoma. Pancreatitis, postprocedural pain, and puncture, perforation, and bleeding are all associated with EUS-FNA and stem from the physical invasiveness of this procedure. There is a paucity of data regarding patient tolerance. One study of screening found that about 10 percent of patients state that EUS-FNA and MRI are very uncomfortable.

The literature on screening studies did not provide comparative accuracy data for single-test or multiple-test strategies. Studies addressed different populations of high-risk individuals (HRIs). While screening imaging studies did identify some HRIs with pancreatic adenocarcinoma, there were also false positive and false negative results. A major barrier to effectively defining an optimal pancreatic cancer–screening approach is the evolving understanding of the unique biology of pancreatic cancers among HRIs, particularly those with strong genetic predispositions. However, rapid progression and cancer development occurred in some HRI individuals, showing that despite aggressive screening approaches, the natural history of some lesions in HRIs (i.e., familial pancreatic neoplasia) can be aggressive and are still not well understood. Defining and characterizing the appropriate high-risk populations for screening also needs to be further explored.

The development of optimal screening algorithms for HRIs is further complicated by evolution in the understanding of precursor lesions such as intraductal papillary mucinous neoplasia (IPMN) and pancreatic intraepithelial neoplasia (PanIN) lesions. Current imaging technologies are insufficient to differentiate between the low-grade and high-grade dysplasia in IPMNs and PanINs. One consensus-based guideline published in 2012 suggested that main duct IPMN should be resected, whereas branch duct IPMN without high-risk pathology and imaging features (i.e, high-grade dysplasia, increasing size) should be monitored. This reliance on both pathology and radiology to distinguish between high and low grade lesions creates a difficult situation in cases in which an IPMN or PanIN is suspected on imaging. Specifically, although surgical resection is the main treatment for precursor lesions, the timing of surgery versus continued imaging surveillance requires further study, particularly given the potential morbidity and mortality associated with pancreatic surgery.

Findings in Relationship to What is Already Known

We identified five reviews whose purpose was to compare different imaging modalities for the diagnosis and/or staging of pancreatic adenocarcinoma. One28 required that all studies make direct comparisons (as we did in this report for KQ1b through 1g, and KQ2b through 2g), whereas the others did not set that requirement; instead, the reviewers performed an indirect comparison of studies of one modality to studies of another modality. The next five paragraphs discuss the four reviews, along with discussion of how they relate to our conclusions on comparative accuracy (see previous section). Then, we discuss a single identified systematic review of morbidity after EUS-FNA, and how it relates to our findings for KQ3.

Wu et al. (2012)48 indirectly compared PET/CT with diffusion-weighted MRI, and included 16 studies. Authors concluded that PET/CT was highly sensitive and diffusion-weighted MRI was highly specific, and that “enhanced PET/CT seems to be superior to unenhanced PET/CT.” The data they analyzed, however, do not support any assertions of reliable differences among the modalities. The sensitivity of PET/CT was 87 percent, with a reported confidence interval from 81 percent to 82 percent, which must be a typographical error (we contacted the author for a correction, but received no reply). For diffusion-weighted MRI, the sensitivity was 85 percent with a confidence interval from 74 percent to 92 percent, so the sensitivity of MRI could actually have been higher than for PET/CT. Specificities for PET/CT and MRI were 83 percent and 91 percent, respectively, but imprecision means an important difference cannot be excluded by the data. The only comparative statement involves different forms of PET/CT, and we did not include any studies making such a comparison. The authors’ conclusion was based on indirect comparisons, which we chose not to make in this review.

Tang et al. (2011)50 indirectly compared PET/CT, PET alone, and EUS with or without FNA. Some of the EUS studies may have not permitted FNA (even if a lesion had been seen), thus those data are outdated. Authors included 51 studies published up to April 2009 and concluded that for diagnosis, PET/CT was the most sensitive of the three modalities (90%, vs. 88% for PET alone and 81% for EUS), whereas EUS was the most specific (93%, vs. 80% for PET/CT and 83% for PET alone). The authors concluded, based on these results, that PET/CT and EUS could play different clinical roles (e.g., PET/CT for ruling in disease, and EUS for ruling out disease). These authors did not compare technologies to MDCT, whereas all of our conclusions about comparative accuracy involved MDCT, so their conclusions neither conflict with nor confirm ours.

Dewitt et al. (2006)28 directly compared CT (either single detector or multidetector) to EUS (either with or without the ability to perform FNA). Thus, some of the included studies used modalities that are outdated. Authors included 11 pre-2005 studies, each comparing the two technologies, and found there were several methodological flaws, such as retrospectivity and unrepresentative study populations. Despite these flaws, the authors concluded that EUS is more sensitive than CT for diagnosis; for staging and vascular invasion, no conclusion can be reached; and for resectability assessment, the data suggest equivalence. This review reached the same conclusion about resectability, but did not conclude that EUS is more sensitive (or more accurate in general) than MDCT. In comparing EUS-FNA to MDCT for diagnosis, we performed a meta-analysis of three studies. This evidence suggested a slight advantage of EUS-FNA, but the difference was not statistically significant and was too imprecise to permit a conclusion of similar accuracy. The difference may involve the inclusion of single-slice CT by Dewitt (which we excluded because it is an outdated technology).

Bipat et al. (2005)53 indirectly compared “conventional” CT, helical CT, MRI, and transabdominal ultrasound for diagnosis and resectability of pancreatic cancer. The 68 included studies had been published between January 1990 and December 2003; thus, the imaging technologies assessed are outdated (e.g., single-detector CT). For diagnosis, helical CT dominated the other techniques (highest sensitivity and highest specificity). For determining resectability, the technologies had similar sensitivities (81% to 83%); however, helical CT had slightly better specificity at 82 percent as compared with 78 percent for MRI, 76 percent for conventional CT, and 63 percent for transabdominal ultrasound. In terms of correspondence to this review, we concluded similarity between MDCT and MRI, which is largely consistent.

Li et al. (2013)76 compared CT, MR, and EUS without FNA for assessing vascular invasion in primary pancreatic adenocarcinoma. They analyzed four MR studies and twelve CT studies, with a subgroup of four of the latter studying MDCT. They concluded that CT was the most sensitive modality, while all three modalities had similar specificity. Most of the direct comparisons of CT and MR involved single-slice CT, so the authors’ conclusions are based on indirect comparisons of CT to MR. The summary sensitivity for MDCT reported in the review was 80 percent (95% CI, 70% to 89%) and the specificity was 97 percent (CI 93% to 100%). The summary sensitivity for MR was 63 percent (CI 48% to 77%) and the specificity was 93 percent (CI 86% to 98%). By contrast, this review found two studies directly comparing MDCT and MRI for assessing vascular invasion, and we found similar sensitivity (68% for MDCT and 62% for MRI) and specificity (97% for MDCT and 96% for MRI). The reason for the different sensitivity conclusion is unclear, but may be due to their use of an indirect comparison (i.e., perhaps the vessel involvement of patients in their four MDCT studies was easier to detect than the vessel involvement of patients in their four MRI studies).

Regarding procedural harms, one systematic review summarized data on EUS-FNA.178 The authors included 51 articles, and among these studies a total of 8,246 patients had received the procedure for pancreatic indications. Using non-meta-analytic techniques (dividing the total number of incidents by the total number of patients in the studies), they estimated the rates of 0.44 percent for pancreatitis (36/8,246), 0.38 percent for postoperative pain (31/8,246), 0.08 percent for fever (7/8,246), 0.1 percent for bleeding (8/8,246), 0.02 percent for perforation (2/8,246), and 0.01 percent for infection (1/8,246). The authors also investigated whether the observed rates differed among prospective and retrospective studies. For pancreatitis, they found rates of 0.67 percent in prospective studies but only 0.37 percent in retrospective studies. For postoperative pain, they found rates of 1.4 percent in prospective studies but only 0.09 percent in retrospective studies. The authors did not report the statistical significance of these differences, so we performed the chi-square test and found that the difference for pancreatitis was not statistically significant (X2(1)=2.95, p=0.09), but it was for postoperative pain (X2(1)=64.1, p<0.05).

Our review found similarly low rates of procedural harms of EUS-FNA and that the most commonly reported harms are pancreatitis and postoperative pain. We did not attempt to estimate rates because of the wide variation in study methods and reporting. Because of the finding regarding prospective/retrospective studies, however, we investigated whether the finding was apparent in the studies we reviewed for KQ3. It was not. For pancreatitis, findings were in the opposite direction (0.39% for prospective studies, 0.46% for retrospective studies). For pain, findings were in the same direction, but the difference was smaller (1% in prospective studies, 0.7% in retrospective studies). The reason for the difference may involve our more-stringent inclusion criteria. We had required that included studies for harms stated in their Methods sections a plan to measure harms; this was intended to exclude studies that reported harms data only anecdotally. If such anecdotal reports are more common among retrospective studies (a reasonable supposition), then our criteria may explain the difference.

Implications for Clinical and Policy Decisionmaking

Pancreatic adenocarcinoma carries a poor prognosis, in part due to advanced-stage presentation and diagnosis. While the incidence of pancreatic cancer is relatively low, it appears to be rising, increasing by 1.5 percent per year, which is beyond the rate expected based on aging of the population. Some predictions suggest that pancreatic adenocarcinoma will be the second highest incident cancer by 2020. This evidence review compares and summarizes current evidence on the effectiveness of imaging modalities (MDCT, MRI, EUS-FNA, and PET/CT) commonly used for diagnosing, staging, and determining the resectability of pancreatic cancer. In this report, the evidence was usually too imprecise to permit conclusions, but we found sufficient evidence for some tentative evidence-based conclusions, outlined next.

For diagnosis, we found MDCT and MRI have similar accuracy for diagnosing pancreatic adenocarcinoma. Specifically, we estimated a positive predictive value (PPV) of 90 percent and a negative predictive value (NPV) of 88 percent for both of these imaging procedures. In other words, a patient with a positive test result (on either MDCT or MRI) has approximately a 90 percent chance of having pancreatic adenocarcinoma, whereas a patient with a negative test result (on either MDCT or MRI) has only a 12 percent chance of having pancreatic adenocarcinoma. Examination of studies comparing PET/CT versus MDCT or EUS-FNA did not allow us to draw conclusions regarding comparative accuracy for diagnosis, because of low-quality or limited evidence.

For staging, we found that EUS-FNA is more accurate than MDCT for T staging (tumor size). The comparative accuracy of EUS-FNA over other technologies for diagnosis and staging was mostly unclear, although for resectability we did find it was similar to MDCT (detailed below).

In the staging assessment of metastases (M staging), PET/CT was more accurate than MDCT. A positive MDCT result indicates an 80 percent chance of actually having metastases (i.e., PPV of 80%), whereas a positive PET/CT result indicates a near 100 percent chance (i.e., PPV of 100%). A negative MDCT scan indicates a 23 percent chance of having metastases (i.e., NPV 23%), whereas a negative PET/CT scan indicates a 17 percent chance of having metastases (i.e., NPV 17%). M staging was the only area in which PET/CT was found superior to other imaging modalities.

In assessing vessel involvement, MDCT and MRI had similar accuracy. We estimate that a positive test result (on either MDCT or MRI) indicates a 73 percent chance of vessel involvement (i.e., PPV of 73%), whereas a negative test result (on either test) indicates only a 5 percent chance (i.e., NPV of 5%).

For determining resectability of lesions in patients whose disease is not staged, MDCT and EUS-FNA were found to be similar in accuracy. Those whose cancer is deemed unresectable by either MDCT or EUS-FNA have about an 88 percent chance of it actually being unresectable (i.e., PPV of 88%), and those whose cancer is deemed resectable by either test have about a 70 percent chance of it actually being resectable (i.e., PPV of 70%). This is important because upfront determination by imaging or endoscopy that an individual’s tumor is unresectable spares him/her surgery and its associated morbidity. It should be noted that using EUS-FNA instead of MDCT would have little impact on the rates of appropriate resection, but could alter other aspects such as procedural harms (fewer iatrogenic cancers, more iatrogenic pancreatitis and postprocedural pain).

MDCT angiography with 3D reconstruction is a newer technology, for which no conclusions could be drawn in this review because of limited evidence. One study that was performed by the software developers of the technology suggested a greater ability of MDCT angiography with 3D reconstruction to accurately detect resectability over MDCT that does not include reconstruction. Additional research would help verify and further elucidate the role of this imaging study in diagnosis and management of pancreatic cancer.

One of the practical challenges for this review is that although our key questions looked separately at the comparative effectiveness of imaging procedures for diagnosis, staging, and resectability, generally these determinations occur simultaneously or in rapid succession. So, the question naturally arises, do our findings mean that all four imaging modalities should be used in the evaluation of patients with suspected pancreatic adenocarcinoma? Specifically, should an individual have an MDCT or MRI for diagnosis, assessment of vessel involvement, and potential resectability determination, followed by an EUS-FNA for tumor staging, followed by a PET/CT for metastatic staging? Although our results did not permit determination of the optimal sequencing of imaging tests, they do suggest that MDCT or MRI, plus EUS-FNA, plus PET/CT may all be appropriate for the diagnosis, staging, and resectability determination of suspected pancreatic adenocarcinoma.

The choice of which of these modalities an individual patient receives, however, is likely to be influenced by institutional availability, the risks of harms associated with each modality as well as patient preferences and tolerances. MDCT is the most widely available (12,700 scanners in 2012),179 and although it is associated with the least amount of operator/interpreter dependence it does have the potential of harms from radiation exposure and administration of contrast dye. MRI (10,815 units in 2012)179 and PET/CT (2,000 scanners in US in 2009)180 are the next most available. These examinations are associated with slightly more operator/interpreter dependence; PET/CT exposes patients to radiation (predominantly through radioactive isotopes as opposed to CT technique), but MRI does not. Finally EUS-FNA is a highly specialized procedure that is currently less-widely available than the other modalities examined. Unlike the other imaging modalities, this procedure provides direct tissue sampling. However, it is associated with the most amount of operator dependence and also is associated with the most harms, including postprocedure pancreatitis, pain, gastrointestinal perforation, and bleeding. Patient perspectives identified in the literature were from studies screening high-risk populations for pancreatic cancer, in which both EUS-FNA and MRI were found in 10 percent and 11 percent of the population, respectively, to be “very uncomfortable.” However, given the poor prognosis of this disease, a “very uncomfortable” study that could provide significant information for diagnosis and management might be tolerable to an individual potentially facing such a grave diagnosis.

Existing practice follows a multi-modality paradigm that is largely institution-specific, based on technology and resource availability and institution and provider preference. This report sheds additional light on which imaging modalities are more accurate or roughly equivalent for some aspects of diagnosis and staging of pancreatic adenocarcinoma and could be incorporated into additional guidance developed for clinicians. Additional research, particularly among newer technologies such as MDCT angiography with 3D reconstruction or MRI angiography may be useful. However, it is uncertain if the improved resolution associated with newer imaging procedures will replace existing imaging or will simply add to the repertoire of preoperative evaluation.

Similarly, there is no uniform approach for pancreatic cancer screening among asymptomatic HRIs that is widely accepted. The U.S. Preventive Services Task Force recommends against screening for pancreatic cancer among the general population (i.e., average-risk persons), because of the low incidence of this disease. Consensus statements on approaches to a risk-based approach to screening exist181 but, again, are not supported by high-grade evidence. Some cost-effectiveness studies have even suggested that “doing nothing” or not screening is the most appropriate, cost-effective approach for HRIs at this time.182 Others have suggested that imaging and genetic and tumor marker evaluation should be restricted to the context of research.183 Our goal was not to determine whether screening HRIs for pancreatic cancer was appropriate or effective, but rather to determine which imaging modalities might be more accurate for screening. Unsurprisingly, the literature on screening in HRIs includes multiple imaging procedures and in some cases includes genetic (i.e., p16, BRCA2) and/or tumor marker (i.e, CEA, CA19-9) testing in addition to imaging. Differences in the populations, differences in choice of imaging modalities and protocols, differences in use of genetic and tumor markers and a limited number of studies creates significant difficulty in comparing various imaging modalities within a study and between studies. Thus, the studies examined provide no evidence for conclusions regarding comparative accuracy of imaging modalities for screening. At this time, further research is needed to elucidate the benefit of pancreatic screening among HRIs, including preferred imaging modalities and time intervals between imaging.

Reimbursement policies for these imaging modalities vary by payer. We examined the policies of a representative sample of payers to ascertain the status of MR, CT, PET/CT, and EUS-FNA. The most detailed policies were for PET/CT. While specific language varied, the plans we reviewed will reimburse for PET/CT only when CT and/or MR diagnosis and staging results are equivocal or inconclusive, or if those tests are contraindicated for a particular patient. Some policies had additional restrictions on use of PET/CT for restaging or follow-up of patients who have undergone treatment for pancreatic cancer. The policies we reviewed routinely covered CT, MRI, and EUS-FNA for diagnosis, staging, and restaging of pancreatic cancer.

Applicability

We judged the applicability of the evidence based on the PICO framework (patients, interventions, comparisons, and outcomes). Regarding patients, the typical age of patients in the included studies was 60–65 years. By contrast, in the National Cancer Institute’s Surveillance Epidemiology and End Results (SEER) database,5 the median age at diagnosis of pancreatic cancer from 2006 to 2010 was 71 years. The extent to which the accuracy of imaging tests varies by patient age is unclear. Resection may be more appropriate for younger patients (due to fewer comorbidities), but the comparative accuracy of different tests (e.g., MDCT vs. EUS-FNA) may not vary by age. In terms of gender, the typical percentage of patients who were female was 40 percent to 50 percent, and the SEER database reported annual incidence rates of 13.9/100,000 for men and 10.0/100,000 for women; these incidence rates suggest that approximately 42 percent of newly diagnosed cases of pancreatic cancer are in women. Thus, the gender ratio in the studies we included seems typical.

Regarding tests and comparisons, we included data only on imaging technologies that are currently in wide use for the diagnosis and staging of pancreatic adenocarcinoma. MDCT is widely used as the first imaging test for suspected pancreatic cancer, and most studies used CT. Specific test protocols, however, may differ between the studies we included and the typical test parameters used outside the context of a research study.

Regarding settings, most studies were conducted in university-based academic or teaching hospitals, which may limit the applicability of the results to community hospitals. Community hospitals may differ from the settings in the included studies with respect to the experience of the technicians administering the imaging test or the interpretation skills of those reading the imaging results. Bilimoria et al. (2007)184 found that among 35,009 patients treated for pancreatic cancer, 54 percent were at community hospitals whereas only 38 percent were at academic hospitals (another 2% were at Veterans Administration hospitals). For the pancreas-specific studies we included, 77 percent were at academic hospitals. Thus, academic settings were overrepresented in the evidence we reviewed. The implication of this is unclear, but possibly the test readers or practitioners may be more experienced than at nonacademic centers.

Limitations of the Comparative Effectiveness Review Process

This section discusses problems that we encountered conducting this systematic review and how we addressed them. These problems included: (1) whether to include EUS-FNA as a technology of interest; (2) whether to address the issue of single-test accuracy; (3) how to assess the risk of bias of comparative accuracy studies; and (4) how to conceptualize study design and data abstraction for studies of screening.

A first challenge concerned EUS-FNA. Before our involvement, another Evidence-based Practice Center had recommended that this technology not be included in a comparative effectiveness review (CER) of “imaging tests” for pancreatic adenocarcinoma. The reason was that, unlike comparison technologies such as MDCT, EUS-FNA involves more than just imaging, because a biopsy can be performed. Thus, the concern was that any comparison would unfairly favor EUS-FNA. When we scanned the literature, it became clear that EUS-FNA for suspected pancreatic adenocarcinoma is very common and therefore was mentioned by numerous studies of diagnosis and staging. In order to maximize the relevance of our report, we decided to include it, and our Technical Expert Panel supported this decision.

A second challenge involved whether this CER should not only compare different imaging technologies but should also assess test performance data on each modality in isolation (i.e., noncomparative). Strictly interpreted, a “comparative” effectiveness review would only involve comparisons among modalities. However, we were aware of several systematic reviews providing some information about each test in isolation, and as long as the assessment was confined to these reviews, the focus would not be overly distracted from the main comparative questions. Thus, we decided to include two questions (KQ1a and KQ2a) on single test accuracy, limiting our resources to systematic reviews. These systematic reviews resulted in estimates for a subset of the information desired. However, several accuracy estimates have not been addressed by systematic reviews, and may potentially be addressed by de novo analyses of primary studies.

A third challenge involved assessing the risk of bias of comparative accuracy studies. The basic target for this assessment is whether a study comparing the accuracy of test A to that of test B (measuring both against a common gold standard) was biased in favor of one of the two tests. Ideally, we could have used an existing off-the-shelf assessment instrument. Current risk-of-bias instruments for diagnostic studies (e.g., QUADAS-2) do not sufficiently address this topic because they were designed for single-test accuracy studies (e.g., did this study provide unbiased estimates of test accuracy). We thought carefully about potential areas of bias and devised our own instrument for this purpose. The instrument has not been tested by others, and its appropriateness should be verified.

A fourth challenge concerned how to conceptualize study design and data abstraction for studies of screening. Screening for pancreatic precursor lesions is, by its very nature, a different clinical process from diagnosis and staging in symptomatic patients. The idea is not just to find pancreatic adenocarcinomas earlier, but to identify any precursor lesions, determine whether they should be resected, perform the necessary resections, and perform continued surveillance on those resected as well as those deemed lesion-free by initial screening. Thus, we faced challenges in categorizing the lesions found in the included screening studies and in synthesizing the data reported.

Limitations of the Evidence Base

Current evidence is limited in several ways, and below we discuss the two most important limitations: risk of bias and imprecision. Also, we mention reporting bias in the context of our searches of clinicaltrials.gov as well as our relevant quantitative analyses.

The first limitation concerns the risk of bias in the included studies. We judged most studies at moderate risk of bias, and this was due to several types of concerns:

  • One concern is test timing: many studies did not report how many days, weeks, or months had elapsed between the two imaging tests. Given the relatively fast progression of pancreatic cancer, a long interval could cause an apparent difference in test accuracy even between two identically accurate tests.
  • Another concern is an unbalanced availability of information: many studies did not report whether the readers of one test had the same information available as the readers of the other tests. Differential information could cause differential accuracy results.
  • A third concern is the prior expertise of the readers: few studies reported that readers had similar levels of prior experience with the two tests under consideration. Greater experience with one test than the other could bias study results in favor of the first test. This could have resulted in a finding of a difference when in fact the tests are similarly accurate, or it could have resulted in a finding of no difference when in fact the second test is more accurate.

The other major limitation of the evidence is imprecision. In several instances regarding comparative test accuracy, the evidence was too imprecise to conclude that one test is better than another, or that the tests are similarly accurate. We performed meta-analyses to maximize the precision of the data, but still, we often judged the resulting summary statistics too imprecise to determine the direction of effect. For example, an ongoing question in the literature is whether MDCT and MRI are similarly effective in detecting metastases of pancreatic cancer. Our Technical Expert Panel had expressed the general belief that MRI can be better for detecting metastases to the liver. We performed a meta-analysis of five studies comparing the accuracy of these imaging technologies for detecting metastases. Both tests were generally poor, with a pooled sensitivity of about 50 percent (MDCT sensitivity was 48% with a 95% CI from 31% to 66%, as compared with MRI with a sensitivity of 50% and a 95% CI from 19% to 82%). The wide confidence intervals are due to the fact that these five studies had enrolled a total of only 54 patients with metastases from pancreatic cancer.

Regarding potential publication bias, we looked for evidence that earlier publications were more likely to report positive findings. We performed three quantitative analyses to investigate the correlation between the end recruitment dates and observed findings (in the 3 analyses containing 5 or more studies), but we did not find any reliable trends. We also searched clinicaltrials.gov, and did not identify any older trials that were unpublished. We identified eight relevant records (status as of March 2014):

  • One (NCT00920023) was last updated in March 2013 and involved only a single imaging test (MRI); therefore, it would be included only for our KQ3 on harms. Few MRI studies report procedural harms, however, so this study is unlikely to have been included.
  • Another (NCT00885248), with unknown recruiting status, will compare the accuracy of MDCT to PET/CT; it may be published in the future, but the entry has not been updated since 2009.
  • A third (NCT00816179) was still recruiting and involves only EUS-FNA; such studies sometimes meet our inclusion criteria for harms, so it should be considered during updates.
  • A fourth (NCT01717196) was ongoing but not recruiting and compares different aspirate volumes with EUS-FNA with respect to accuracy and complications. The complications data should be considered for updates.
  • A fifth (NCT01662609) was still recruiting and has a purpose of determining “whether Endoscopic Ultrasound (EUS) can detect early stage pre-cancerous or cancerous changes in the pancreas in patients at high-risk for the development of pancreatic cancer.” The estimated study completion date is September 2017.
  • A sixth (NCT00714701), denoted the CAPS4 study, was still recruiting at Johns Hopkins University. It includes five different high-risk groups and two controls, one of which is a group of patients with chronic pancreatitis. There may be overlap with patients in the CAPS studies included for KQ4.
  • A seventh (NCT01997476) was ongoing but not recruiting and involves the diagnosis of chronic pancreatitis (a differential diagnosis) using three imaging modalities.
  • An eighth (NCT00548626) was ongoing but not recruiting and involves the use of multiple needles compared with a single needle for EUS-FNA for the diagnosis of pancreatic neoplasms.

Research Gaps

For characterizing gaps, we used the Hopkins EPC framework proposed by Robinson et al. (2011).185 That system suggests that reviewers identify a set of important gaps and determine the most important reason for each gap. Each gap should be assigned one of the following reasons for the inability to draw conclusions:

  1. Insufficient or imprecise information: no studies, limited number of studies, sample sizes too small, estimate of effect is imprecise
  2. Information at risk of bias: inappropriate study design; major methodological limitations in studies
  3. Inconsistency or unknown consistency: consistency unknown (only 1 study); inconsistent results across studies
  4. Not the right information: results not applicable to population of interest; inadequate duration of interventions/comparisons; inadequate duration of followup; optimal/most important outcomes not addressed; results not applicable to setting of interest

The first important gap concerns the general lack of specific evidence on MDCT angiography. This newer technology had been suggested by one of our Technical Experts as a key technology of interest in the context of the diagnosis and staging of pancreatic adenocarcinoma. Our review included only a single study of this technology; thus, the primary reason for the inability to draw conclusions is insufficient or imprecise information.

The second important gap concerns the lack of evidence on comparative longer-term outcomes such as how patients were managed differently after different tests, the length of survival after undergoing different imaging tests, and the quality of patients’ lives after different tests. No studies have provided comparative health outcome or quality of life information in the context of diagnosis and staging of pancreatic adenocarcinoma; thus, the reason for this gap is insufficient or imprecise information.

The third important gap concerns the lack of evidence on important factors that could influence comparative accuracy, such as the prior experience of test readers (e.g., 2 tests may have similar accuracy if readers are very experienced, but 1 may be much better if readers are less experienced), patient factors (e.g., for patients with jaundice, 1 test may be better, but for patients without jaundice that same test is worse), and tumor characteristics (e.g., for staging small tumors, 1 test is best, but for large tumors, another test is best). Again, no studies provided pertinent data, so the reason for this gap is insufficient or imprecise information.

The fourth important gap concerns the screening of asymptomatic high-risk people. No studies have reported test-specific screening accuracy (insufficient or imprecise information). This is an important gap in the literature because there is little evidence to justify the choice of one screening test over another.

Future research could address these gaps by conducting studies specifically designed to answer the important gaps. For example, to determine whether patients live longer after undergoing MDCT for diagnosis as compared with undergoing EUS-FNA for diagnosis, a future study could randomly assign patients suspected of having pancreatic adenocarcinoma to receive only one of the two modalities. Sufficient followup of all patients should be used to determine which group of patients lives longer. This would represent direct evidence on the most important outcome, survival.

Randomized trials may never be performed, but existing study designs (e.g., studies comparing the diagnosis performance of different modalities) could be analyzed more comprehensively to address other identified gaps. For example, symptoms vary greatly from patient to patient (degree of jaundice, weight loss, abdominal pain). One key gap involved the absence of information on whether comparative test accuracy was influenced by such patient factors. Addressing this gap would not require novel study designs, but simply involves additional analyses of data already being collected in the field.