The electronic search of MEDLINE AND CENTRAL identified 64 systematic review titles and abstracts (Figure 1). The Li 2010 database included 105 additional systematic review titles and abstracts (Figure 1). We excluded 167 of the 169 systematic review titles and abstracts for the following reasons: did not address any of the key questions, narrative summary only, could not retrieve full text to assess, similar inclusion criteria but date of search for studies older than another included systematic review on the same topic, and duplicate reference to an included systematic review. We identified two systematic reviews for inclusion.7,16 One systematic review (Burr 2007)7 addressed the diagnostic test accuracy of candidate screening tests for the detection of open-angle glaucoma (Key Question 3) and the second review (Hatt 2006)16 addressed the question of whether screening-based programs prevent optic nerve damage due to open-angle glaucoma when compared to no screening (Key Question 5) (Evidence Tables 1 and 2 in Appendix C).

The electronic searches conducted for concurrent comparative effectiveness reviews of screening and treatment for OAG identified a total of 4960 primary study titles and abstracts. After removing duplicate citations, conference abstracts and book chapters (N = 1083), we reviewed 3877 titles and abstracts. We retrieved the full text of 652 articles and assessed the studies for inclusion in this review. We included 83 primary studies that addressed the diagnostic accuracy of candidate screening tests for the detection of OAG that were not included in the Burr 2007 systematic review (Key Question 3 - Evidence Tables 2 to 5 in Appendix C) because the investigators examined newer technologies or the manuscript was published after 6 December 2005 (See Figures 2a and 2b). We did not identify any primary studies eligible for inclusion for any other key question. A listing of the 558 excluded studies, with reason(s) for exclusion, is included in Appendix E.

Our search included 165 systematic reviews. We identified two systematic reviews for inclusion. One systematic review addressed the diagnostic test accuracy of candidate screening tests for the detection of OAG (KQ3), and the second review addressed the question of whether screening-based programs prevent optic nerve damage due to OAG when compared to no screening (KQ5).

Figure 2a

Summary of the literature search: Systematic reviews literature search.

The electronic searches conducted identified a total of 4,960 primary study titles and abstracts. After removing duplicate citations, conference abstracts, and book chapters, we reviewed 3,877 abstracts. We retrieved the full text of 652 articles and assessed the studies for inclusion in this review. We included 83 primary studies that addressed the diagnostic accuracy of candidate screening tests for the detection of OAG (KQ3). We did not identify any primary studies eligible for inclusion for any other Key Question

Figure 2b

Summary of the literature search: Primary studies literature search. * Total may exceed number in corresponding box, as articles excluded by two reviewers at this level.

A listing of devices from the primary studies is included in Appendix F. In summary the following number of diagnostic studies included the devices summarized in this comparative effectiveness review:

  • Tests of optic nerve structure—
    • Cup to disc ratio measurement by examination (1 study)
    • Heidelberg retina tomograph (HRT) II (17 studies)
    • HRT III (11 studies)
    • Optic disc photography (2 studies)
    • Optical coherence tomography (OCT) (47 studies)
    • Retinal nerve fiber layer photography (2 studies)
    • Scanning laser polarimetry (GDx device) (27 studies)
  • Tests of optic nerve function—
    • Frequency doubling technology (FDT) 24-2 perimetry (5 studies)
    • FDT 30-2 (2 studies)
    • FDT C-20 (4 studies)
    • FDT N30 (4 studies)
    • Goldmann applanation tonometry (2 studies)
    • Humphrey Visual Field Analyzer (HFA) (10 studies)
    • Non contact tonometry (1 study)
    • Octopus 301 perimeter (1 study)

Because there was appreciable variability in devices, parameters, thresholds, and measurement of outcomes reported in the primary studies of interest, we did not combine the results using meta-analysis and instead present a narrative summary with particular emphasis on studies that identified early disease and/or examined newer and more frequently reported technologies. As we are unable to determine which parameters are most important for identifying persons with OAG, and as our reported results would have been limited to a few parameters in a subset of studies, we chose to include in the evidence tables (Appendix C) and discuss as appropriate the full complement of device parameters and thresholds as reported in the included studies. We summarize, where possible, the magnitude of validity across all parameters of interest for devices considered in this report.

Of the devices that were included in the Burr 2007 review, the following were also identified from the search of the literature conducted for this report: FDT, GAT, HFA, HRT II, non-contact tonometry, optic disc photography, and retinal nerve fiber layer photography. As there are differences in the eligibility criteria for the current report and the Burr 2007 review, including the devices, outcomes, and comparisons of interest, we chose not to undertake an update of the quantitative estimates of sensitivity and specificity from the Burr review for the devices that were common to both reviews.

This comparative effectiveness review also includes discussion of newer technologies including the and FDT 24-2, FDT 30-2, and FDT N-30, GDx, and HRT III.

Key Question 1

We did not identify any study that addressed whether participation in an OAG screening-based program leads to less visual impairment when compared to another screening-based program or no screening.

Key Question 2

We did not identify any study that addressed whether participation in an OAG screening-based program leads to improvements in patient-reported outcomes when compared to another screening-based program or no screening.

Key Question 3

Summary

Burr 2007 reviewed studies that compared standard automated perimetry (SAP) with several candidate tests.6 Results indicated that the sensitivity of SAP was higher than Goldmann tonometry, similar to Heidelberg retina tomography (HRT), and lower than the evaluation of optic nerve head (optic disc) photographs or frequency doubling technology (FDT). Results also indicated that the specificity of SAP was higher than disc photographs and FDT, similar to HRT, and lower than Goldmann tonometry.

Despite improvements in technology, including newer imaging and functional technologies, it is still unclear whether any one test or combination of tests is suitable and sufficient for use in glaucoma screening.

The lack of a definitive diagnostic reference standard for glaucoma and the need for more homogeneity in the design and conduct of diagnostic test accuracy studies prevents a coherent synthesis of data and therefore limits conclusive statements regarding these tests.

Evidence From Systematic Reviews

Burr (2007) conducted a diagnostic test accuracy review of candidate diagnostic and screening tests for OAG.6 Study inclusion criteria and pooled outcomes of sensitivity, specificity, and diagnostic odds ratios based on a common cut-off or threshold are listed in the evidence table (Evidence Table 1 in Appendix C). In summary, the investigators included 40 studies totaling more than 48000 participants 40 years of age and older and those at high risk for the development of OAG based on demographic characteristics or comorbidities. The focus was on studies of participants likely to be encountered in a routine screening setting. Tests of optic nerve structure, optic nerve function, and intraocular pressure were included and compared to other individual or combination tests. The primary reference standard was confirmation of OAG at follow-up examination. Also considered was diagnosis of OAG requiring treatment. Prespecified outcomes were measures related to sensitivity, specificity, harms, acceptability and reliability. There was significant statistical heterogeneity among the included studies for the majority of the tests, with the exception of FDT C-20-1 (sensitivity), HRT II (sensitivity and specificity), and optic disc photography (sensitivity). The authors also note that there were no studies that were at low risk of bias for all of the modified QUADAS domains examined. A small subset of eight studies was judged to have higher quality as the study investigators enrolled participants who were representative of a screening/diagnostic setting (low risk of spectrum bias). As well these studies were at low risk of verification bias (both partial and differential) and test and diagnostic review bias.

We include a narrative summary of the devices and tests included in this review with corresponding pooled estimates of sensitivity and specificity (Table 1). Summary sensitivity and specificity and diagnostic odds ratio (DOR) estimates were reported “as median and 95% credible interval (CrI). Credible intervals are the Bayesian equivalent of confidence intervals.”7 (A listing of devices is included in Appendix F.)

Table 1. Summary of Burr 2007 systematic review.

Table 1

Summary of Burr 2007 systematic review.

Tests of Optic Nerve Structure

Heidelberg Retina Tomograph (HRT) II

HRT II was a diagnostic test of interest in three studies, all with a common cutoff point and two of which were judged to be of higher quality than the third. One study specifically recruited high-risk populations (family history of OAG, African or Caribbean descent, aged 50 years or older). Using the common criterion of one or more results that are borderline or outside normal limits, the pooled sensitivity was 86 percent (95% CrI, 55 to 97%) and the pooled specificity was 89 percent (95% CrI, 66 to 98%).

Ophthalmoscopy

Burr (2007) included seven studies addressing the diagnostic accuracy of ophthalmoscopy including slit-lamp biomicroscopy (two studies) and direct ophthalmoscopy (five studies).6 Using a common cut-off point of a vertical cup-to-disc ratio greater than or equal to 0.7 (also defined as gradings of “normal” and “suspicious” or other subjective criteria as defined by consultant ophthalmologists), pooled sensitivity and specificity for the five studies with common criterion were 60 percent (95% CrI, 34 to 82%) and 94 percent (95% CrI, 76 to 99%), respectively. The diagnostic odds ratio was 25.7 (95% CrI, 5.79 to 109.50) suggesting a 26-fold higher odds of a positive test among those with glaucoma when compared to those without glaucoma.

Optic Disc Photography

There were six studies of optic disc photography with five using a common criterion of a vertical cup-to-disc ratio greater than 0.59 to greater than or equal to 0.7. The range of sensitivity was 65 to 77 percent and the range of specificity was 59 to 98 percent. The authors noted that some photographs were taken with pupils dilated (three of six studies) while the remaining did not specify whether dilation was used.

Retinal Nerve Fiber Layer (RNFL) Photography

The common cut-off point for the four included studies was diffuse and/or localized defect observed on RNFL photographs. Among these studies, two were described as including participants “representative of a screening or diagnostic setting.” The pooled diagnostic odds ratio was 23.1 (95% CrI, 4.41 to 123.50), and the pooled sensitivity and specificity were 75 and 88 percent, respectively.

Tests of Optic Nerve Function

FDT (C-20-1) Perimetry

Three studies of FDT (C-20-1) were considered, all of which used the common diagnostic criterion of one abnormal test point. The pooled sensitivity and specificity results for this test were high (92 and 94% respectively).

FDT (C-20-5) Perimetry

Five studies of FDT (C-20-5) with significant heterogeneity were included using the common cut-off point of one abnormal test point. The range of sensitivity was 7 to 100 percent; the specificity range was 55 to 89 percent.

Goldmann Applanation Tonometry (GAT)

At the common cut-off point of intraocular pressure greater than 20.5–22 mm Hg, nine studies with significant heterogeneity reported sensitivity in the range of 10 to 90 percent and specificity in the range of 81 to 99 percent.

Noncontact Tonometry

One study with an inappropriate reference standard reported a sensitivity of 92 percent and specificity of 92 percent using of the criterion of intraocular pressure greater than 21 mm Hg.

Oculokinetic Perimetry

Four studies were included that examined the diagnostic accuracy of oculokinetic perimetry. Three were studies of participants who may be encountered in a screening setting; one was judged to be of higher quality (based on modified QUADAS domains). The common criterion varied in description, but is best described as one or more points missing. The odds of a positive test were 57 times higher (DOR, 57.54) for those with glaucoma when compared to those without glaucoma (95% CrI, 4.42 to 1585.00). The pooled sensitivity and specificity were 86 and 90 percent respectively.

SAP Suprathreshold Test

Nine studies, including the Baltimore Eye Survey and the Blue Mountains Eye Study, were included in the analysis. Although the sensitivity and specificity were similar among the Baltimore and Blue Mountains studies, there was significant heterogeneity among the included studies. The range in sensitivity was 25 to 90 percent; the range in specificity was 67 to 96 percent.

SAP Threshold Test

Among the five studies analyzed for SAP threshold both Humphrey 30-2, 24-2 threshold, and Octopus 500 were evaluated. The pooled sensitivity was 88 percent and specificity was 80 percent for the common cutoff point (the definition of the common cut-off point differed by included study, but is defined in Burr (2007).

Direct Comparisons of Candidate Tests

Six studies included comparisons of standard automated perimetry (SAP) to optic disc photography, HRT II, FDT, and/or Goldmann applanation tonometry. Burr 2007 concluded that sensitivity results at the common cut-off point for each test revealed that SAP performed better than Goldmann applanation tonometry. One of the two studies that addressed the comparison of SAP to Goldmann applanation tonometry (GAT) reported estimates of sensitivity of 89 percent and 3–14 percent respectively. Specificity values were 73 percent for SAP and 98–99 percent for Goldmann applanation tonometry. Burr 2007 also concluded that SAP was similar to HRT II. The sensitivity of SAP and HRT II was 72 percent and 69 percent respectively in one of the two included studies; the specificity for both tests was 95 percent. There was one included study in which the investigators compared SAP to optic disc photography. Optic disc photographs had a similar sensitivity (73 to 77 percent) and specificity (59 to 62 percent) with SAP (sensitivity 50 to 71 percent; specificity 58 to 83 percent). In the two studies that included comparisons of SAP to frequency doubling technology (FDT), one study reported similar sensitivity estimates (SAP 63 to 90 percent; FDT C-20-5 68 to 84 percent) and similar specificity values (SAP 58 to 74 percent; FDT C-20-5 55 to 76 percent).

Based on analyses of the common criterion for each test, test accuracy, combination tests, tests for glaucoma at specific stages, and direct and indirect comparisons of tests, Burr (2007) concluded that optic disc photography, HRT II, FDT, SAP and Goldmann applanation tonometry were candidates for use in a screening-based program.

Detailed Analysis of Primary Studies

We undertook a search for additional primary studies, as described in the Methods section to address the diagnostic accuracy of candidate screening tests, and identified 83 studies.

With respect to the risk of bias of included primary studies, 68 percent of the included studies were at high risk of spectrum bias as the study investigators enrolled participants who were not representative of those who would receive the test in practice, i.e., healthy volunteers compared to participants with known glaucoma. Six percent of the studies were at high risk of differential verification bias because the study investigators applied a different reference standard to a subset of participants enrolled in the study. A low percentage (2%) were at high risk of incorporation bias, but due to the lack of detail in the descriptions of the reference standard, it was unclear whether the reference standard and candidate tests were independent of each other in 12 percent of the included studies.

With respect to masking of study personnel interpreting the results of the reference standard and candidate tests, the candidate test(s) was/were interpreted without knowledge of the reference standard result in 29 percent of the included studies and the reference test interpreted without knowledge of the candidate test(s) in 44 percent of included studies, but we judged these domains to be unclear in 54 percent and 48 percent of the included studies respectively. Forty-eight percent (48%) of the studies did not include an explanation for withdrawals from the study, while 46 percent of the studies reported the number of uninterpretable test results.

The judgments for the 13 QUADAS risk of bias domains as well as sensitivity, specificity, and/or area under the ROC curve (AUS) results by device/test parameter, are summarized in the evidence tables (Appendix C).

A narrative summary of the results follows with a particular emphasis on studies that identified early disease, and/or examined newer and more frequently reported technologies.

Tests of Optic Nerve Structure

HRT II

Seventeen studies included measures of diagnostic accuracy for HRT II (Table 2).1733 Naithani (2007)25 and Uysal (2007)27 which specifically focused on detecting early or moderate glaucoma, are discussed in this narrative section. The populations, devices and reference standards for all studies including the remaining 15 studies are summarized in the table above and the estimates of diagnostic accuracy are detailed in the evidence tables (Appendix C).

Table 2. Characteristics of included HRT II studies.

Table 2

Characteristics of included HRT II studies.

Naithani (2007) enrolled 60 participants with glaucoma (30 early defects and 30 moderate visual field defects) and 60 healthy volunteers.25 Area under the receiver operating characteristic curve (AUC values) were reported to be in the range of 0.474 (disc area ratio parameter) to 0.852 (vertical cup-to-disc ratio parameter).

Uysal (2007) enrolled 70 participants with early or moderate glaucoma and 70 healthy volunteers.27 The range of sensitivity across 12 parameters was 47.1 percent (RNFL cross-sectional area) to 74.3 percent (linear cup/disc area ratio) and the range of specificity was 47.1 percent (mean RNFL thickness) to 71.4 percent (cup shape measure). The investigators concluded that some parameters have better sensitivity and specificity than others including cup/disc area ratio, linear cup/disc area ratio, and mean cup depth. In addition, subgroup analysis by disc size revealed that it was more difficult to distinguish glaucoma in participants with smaller discs.

The remaining 15 studies explored comparisons of HRT II with other devices such as the GDx VCC, OCT, HRT III, and FDT. Overall, HRT II was found not to perform as well as GDx VCC, OCT, and FDT. HRT II and HRT III were found to have a similar diagnostic profile. Three of the included studies concluded that HRT II was not an appropriate tool for population-based glaucoma screening studies.

HRT III

Eleven studies examined the diagnostic accuracy of HRT III (Table 3).23,24,28,3441 Reddy (2009) identified 81 participants with early visual field loss (out of 247 participants with glaucoma) and 142 healthy volunteers. Early visual field loss was defined as a mean deviation less than 5dB36. The sensitivity of the Glaucoma Probability Score for distinguishing eyes with early field loss from healthy eyes was 67.9 percent and the Moorfields Regression Analysis was 71.9 (at a fixed specificity of 92%). The investigators concluded that, “Moorfields Regression Analysis and Glaucoma Probability Score have similar ability to detect glaucomatous changes, and typically agree. The relative ease and sensitivity of the operator-independent Glaucoma Probability Score function of the HRT III may facilitate glaucoma screening.”

Table 3. Characteristics of included HRT III studies.

Table 3

Characteristics of included HRT III studies.

Badala (2007) compared four imaging methods for their ability to distinguish early glaucoma from healthy eyes.40 Forty-six eyes from 46 participants with early OAG and 46 eyes from healthy volunteers were enrolled. Sensitivity (parameter: reference height) ranged from 4 to 70 percent (Frederick S. Mikelberg discriminant function and Reinhard O. W. Burk discriminant function) when holding the specificity of the test constant at 95 percent.

Optical Coherence Tomography (OCT)

Of the 48 included studies that investigated the diagnostic accuracy of OCT, 18,25,26,2932,34,40, 4279,83 34 considered the Stratus OCT, 10 included the Cirrus OCT, six considered the RTVue OCT, 2 included Spectralis OCT, two examined the OTI OCT and one included the OTI Spectral OCT/SLO (Table 4). Across the 34 studies that examined the Stratus OCT, all were at high risk of spectrum bias because those with known disease along with healthy eyes were enrolled in the studies. The sample size ranged from 26 to 95 participants with glaucoma or suspected glaucoma and 37 to 128 healthy volunteers with one study also enrolling 130 participants with ocular hypertension. For the parameter average RNFL thickness, the range of sensitivity was 24 to 96 percent, suggesting appreciable heterogeneity among the studies. The range of the specificity was 66 to 100 percent. The evidence table for this report (Appendix C) includes diagnostic test accuracy outcomes for more than 25 additional parameters.

Table 4. Characteristics of included OCT studies.

Table 4

Characteristics of included OCT studies.

Optic Disc Photography

We included two studies of the diagnostic accuracy of optic disc photography33,80 and one study of cup to disc ratio measurement as measured by an ophthalmologist using a slit lamp biomicroscope and 78 Diopter lens (Table 5).81 Danesh-Meyer (2006) included participants with OAG (n=42), as well as glaucoma suspects (n=23) and healthy volunteers (n=45).33 Investigators took optic disc photographs using the Canon CF60U camera, 30-degree setting, with Kodak Ektachrome EPR 150 film and graded by two investigators. Two investigators determined the Disc Damage Likelihood Score by using a Nikon 60 diopter fundus lens with a slit-lamp. The AUC (comparison of those deemed to have glaucoma and borderline disease versus normal) was 0.84 (95% CI, 0.74 to 0.92) for the cup-to-disc ratio and 0.95 (95% CI, 0.80 to 0.98) for Disc Damage Likelihood Score suggesting that the Disc Damage Likelihood Score is a more effective means of discriminating people with and without disease. The diagnostic accuracy of cup to disc ratio measurement from the Francis (2011) study is described in the section on FDT C-20 perimetry.

Table 5. Characteristics of included optic disc photography studies.

Table 5

Characteristics of included optic disc photography studies.

Scanning Laser Polarimetry (GDx)

Twenty-seven studies included an investigation of the GDx with variable corneal compensation (VCC) (Table 6).18,20,22,26,2932,40,45,58,59,61,64,71,77,78,80,8290 The aim of eight studies was to discriminate early glaucoma from no disease.18,20,40,45,83,85,88 In the studies that focused on early OAG, the range of sensitivity across all comparisons and cut-offs for the most frequently reported parameter, Temporal, Superior, Nasal, Inferior, Temporal average, was 29.8 to 81.63 percent. Specificity was fixed at 80, 90, or 95 percent in three studies, and the lowest reported specificity was 66.36 percent. The range in sensitivity for the nerve fiber indicator parameter across all comparisons and cut-offs was 28.3 to 93.3 percent. Specificities ranged from 52.9 percent to a fixed cut off of 80, 90, and 95 percent.

Table 6. Characteristics of included scanning laser polarimetry studies.

Table 6

Characteristics of included scanning laser polarimetry studies.

Three studies examined the GDx with enhanced corneal compensation (ECC)59,86,87 The sample sizes of the included studies ranged from 63 to 92 glaucoma participants and 41 to 95 healthy volunteers. Medeiros (2007) compared the AUCs for GDx with variable corneal compensation and GDx with enhanced corneal compensation and reported that GDx with enhanced corneal compensation performed significantly better than GDx with variable corneal compensation for the parameters Temporal, Superior, Nasal, Inferior, Temporal average, Superior average, and Inferior average (p =<0.01).86 Sehi (2007)59 and Mai (2007)87 concurred with Medeiros (2007) that imaging with enhanced corneal compensation appears to improve the ability to diagnose OAG.

RNFL Photography

Two studies examined the accuracy of retinal nerve fiber layer (RNFL) photography (Table 7).82,83 Hong (2007b) analyzed RNFL photographs of 72 glaucoma and 48 healthy participants taken with the Heidelberg retina angiograph 1.83 Two investigators, unaware of the participant’s diagnosis, reviewed the photographs. A third investigator served as an adjudicator to resolve any disagreements. Results showed the RNFL defect score II, with an AUC of 0.75 (p < 0.001), was the best parameter for discriminating early glaucoma and healthy eyes (sensitivity 58.3% and specificity 95.8%).

Table 7. Characteristics of included RNFL photography studies.

Table 7

Characteristics of included RNFL photography studies.

Medeiros (2004) compared RNFL photography to the GDx with variable corneal compensation in 42 participants with OAG, 32 OAG suspects, and 40 healthy volunteers.82 Investigators photographed one eye of each participant using the Topcon TRC-50VT camera and Kodak Kodalith high-contrast film (red-free filter). Two investigators used a set of 25 reference photos to score photographs and a third investigator adjudicated disagreements. The sensitivities of the global RNFL score were 36 and 81 percent respectively for fixed specificities of 95 and 80 percent. At a fixed specificity of 95 percent, the sensitivity of the Nerve Fiber Indicator was 71 percent versus the 36 percent reported above for red-free photos. Overall, the global RNFL score determined from red-free photos did not perform as well as scanning laser polarimetry. The area under the ROC curve was 0.91 for the GDx with variable corneal compensation Nerve Fiber Indicator versus 0.84 for the global RNFL score.

Tests of Optic Nerve Function

FDT 24-2 Perimetry

Five studies examined the diagnostic accuracy of FDT 24-2 threshold tests using the Humphrey Matrix Perimeter (Table 8).64,9396 All studies included participants with known glaucoma and healthy volunteers and we judged these studies to be at high risk of spectrum bias. The range of sample size was 25 to 174 glaucomatous eyes and 15 to 164 healthy eyes. Sensitivities and specificities were reported for the parameters mean deviation, pattern standard deviation and glaucoma hemifield test outside of normal limits. There was appreciable heterogeneity in the estimates of sensitivity at 80, 90 and 95 percent specificity that may be attributed to a number of factors including different patient populations and variations in cut-off points. The sensitivity for the mean deviation was 55 and 94 percent at 80 percent fixed specificity.94,95 Tafreshi (2009) and Leeprechanon (2007) reported 39 and 87 percent at 90 percent fixed specificity, and 32 and 82 percent at fixed 95 percent specificity respectively.93,95 Sensitivity and specificity for pattern standard of deviation (PSD) and glaucoma hemifield test are reported with respective cut-off points in the evidence tables in Appendix C.

Table 8. Characteristics of included FDT 24-2 perimetry studies.

Table 8

Characteristics of included FDT 24-2 perimetry studies.

Bagga (2006)64 and Burgansky-Eliash (2007)96 reported the AUC for the mean deviation parameter (0.69 for both studies with p < 0.04 and 95% CI, 0.564 to 0.815 respectively). The AUCs for PSD were 0.66 (p = 0.09)64 and 0.733 (95% CI, 0.618 to 0.848).96

FDT 30-2 Perimetry

Two studies discuss the detection of early glaucoma using the FDT 30-2 threshold test with the Humphrey Matrix Perimeter (Table 9).60,83 Both Hong (2007a)60 and Hong (2007b)83 enrolled OAG participants with early visual field loss and healthy controls. The mean deviation and PSD were judged to be good parameters for distinguishing between eyes with early disease and eyes with no known defects. The mean deviation was 0.795 and 0.750 and the PSD 0.808 and 0.934 for Hong (2007a) and Hong (2007b) respectively. Both study groups, however, determined that the best parameter for distinguishing eyes with early glaucoma from healthy eyes was the number of points that have p less than 5 percent in the pattern deviation plot with AUCs of 0.985 (95% CI, 0.943 to 0.998) and 0.990 (p < 0.001) in Hong (2007a) and Hong (2007b) respectively.

Table 9. Characteristics of included FDT 30-2 perimetry studies.

Table 9

Characteristics of included FDT 30-2 perimetry studies.

FDT C-20 Perimetry

Four studies discussed the accuracy of FDT C-20 perimetry (Table 10).18,81,91,92 Pueyo (2009) enrolled 130 participants with ocular hypertension and 48 healthy volunteers.18 Using a cut-off of a cluster of at least four points with a sensitivity outside 95 percent normal limits, or three points outside 98 percent, or at least one point outside 99 percent, investigators determined the sensitivity of FDT to be 31.25 percent and specificity 72.9 percent among the subset of 32 participants with glaucomatous optic neuropathy (of the 130 with ocular hypertension). The investigators concluded that FDT might not be an ideal test for participants with early defects.

Table 10. Characteristics of included FDT C-20 perimetry studies.

Table 10

Characteristics of included FDT C-20 perimetry studies.

Salim (2009) enrolled 35 participants with known OAG and 35 age- and sex-matched controls with no evidence of glaucoma. Investigators used FDT, non-contact tonometry, and a questionnaire individually and in all possible combinations to determine the accuracy of single and combination tests.91 Sensitivity of FDT was 58.1 percent and specificity 98.6 percent. Overall, FDT was determined to be the best among the candidate single and combination tests in the study, despite fair sensitivity for detecting OAG.

Pierre-Filho (2006) enrolled glaucoma patients who had never experienced perimetry prior to the study.92 The investigators reported that 21 (32.8%) of the 64 participants with glaucoma were identified as having early disease, but data were not provided for this subgroup. Sensitivity and specificity were 85.9 and 73.6 percent for the presence of at least one abnormal location and 82.8 and 83 percent respectively for two or more abnormal locations regardless of severity.

Francis (2011) conducted a population-based screening of 6,082 Latinos aged 40 years and older as a part of the Los Angeles Latino Eye Study (LALES) to determine the diagnostic accuracy of candidate screening tests performed alone or in combination.81 Participants completed Humphrey Visual Field testing in addition to FDT C-20-1, Goldmann applanation tonometry, and central corneal thickness and cup to disc ratio measurements. Diagnostic test accuracy outcomes were assessed for the general population as well as high risk subgroups defined as persons who were 65 years and older, those with a family history of glaucoma, and persons with diabetes. Of the 6,082 participants screened, 4.7 percent (286) were diagnosed as having open-angle glaucoma. Based on three glaucoma diagnosis definitions (glaucomatous optic nerve appearance, glaucomatous visual field loss, glaucomatous optic nerve and visual field loss) the test parameters vertical cup to disc ratio ≥ 0.8 and Humphrey Visual Field (HVF) false negatives ≥ 33 percent had the highest specificity regardless of the definition of glaucoma (98%). HVF mean deviation < 5 percent had the highest sensitivity (78%) using the definition of optic nerve defects only, while HVF glaucoma hemifield test had the highest sensitivity under the other two definitions (90% for glaucomatous visual field loss and 90% for both field loss and optic nerve damage). Specific results for the FDT C-20-1 were as follows (sensitivity/specificity, definition of glaucoma): 59%/79%, glaucomatous optic nerve appearance only; 68%/80%, glaucomatous visual field loss only; 67%/79%, both glaucomatous optic nerve appearance and visual field loss). The investigators reported similar results when high-risk subgroups were analyzed and concluded “these results suggest that screening of high-risk groups based on these criteria may not improve over screening of the general population over age 40.”

FDT N-30 Perimetry

Four studies examined the accuracy of the FDT N30 threshold test (Table 11).20,94,97,98 Zeppieri (2010) focused on the detection of early glaucoma among a sample of 75 participants with OAG, 87 with ocular hypertension, 67 with glaucomatous optic neuropathy and 90 healthy volunteers.20 At the best cut-off of less than -0.78, the sensitivity of the mean deviation parameter was 61.3 percent and the specificity was 73.7 percent for distinguishing early OAG from healthy eyes. At the best cut-off of greater than 3.89, the sensitivity of the PSD was 76.0 percent and the specificity was 87.8 percent. The investigators concluded that, “FDT can potentially detect eyes with very early functional defects that do not show structural changes in patients at risk of developing glaucoma.” Salvetat (2010) focused on the detection of early disease among a sample of 52 participants with early OAG and 53 healthy volunteers.98 The sensitivity of mean deviation for distinguishing early OAG from healthy eyes at the best cut-off (less than −1.12) was 67 percent and the specificity was 74 percent. At the best cut-off of greater than 3.97, the sensitivity of the parameter PSD was 96 percent and the specificity was 85 percent.

Table 11. Characteristics of included FDT N-30 perimetry studies.

Table 11

Characteristics of included FDT N-30 perimetry studies.

Humphrey Visual Field Analyzer (HFA)

Ten studies examined the diagnostic accuracy of the HFA. Of these, six examined HFA short wavelength automated perimetry,18,44,64,93,97,99 two tested HFA-SAP, (SAP)-SITA and HFA SAP-Full Threshold (FT),93,97 four examined HFA-SITA-Standard,33,90,92,96 and one tested the HFA SITA-Fast protocol (Table 12).92 The HFA short wavelength automated perimetry testing protocol (the most frequently reported) included 25 to 286 participants with glaucoma and 22 to 289 healthy volunteers across the six included studies. Sensitivity across all comparisons and cutoffs for the mean deviation was 25.9 to 83 percent. Specificity was 80 to 95.2 percent. Cutoff points ranged from −5.42 to −11.06 dB.

Table 12. Characteristics of included Humphrey visual field analyzer studies.

Table 12

Characteristics of included Humphrey visual field analyzer studies.

Goldmann Applanation Tonometry (GAT)

Two studies64,81 included examination of Goldmann applanation tonometry (GAT) (Table 13). Bagga (2006) compared the ability of various tests of structure and function to discriminate healthy eyes (n= 22) from eyes with known glaucomatous optic neuropathy (n = 25).64 The AUC for intraocular pressure, as measured by GAT, was 0.66 (p = 0.05). The methods of the Francis (2011) study (LALES) are discussed in the FDT C-20 section of this review, but the specific sensitivity and specificity values for GAT using a cut off of ≥ 21 mm Hg for the three definitions of glaucoma were as follows (sensitivity/specificity, definition of glaucoma: 21%/97%, glaucomatous optic nerve appearance only; 23%/97%, glaucomatous visual field loss only; 24%/97%, both glaucomatous optic nerve appearance and visual field loss).

Table 13. Characteristics of included Goldmann applanation tonometry studies.

Table 13

Characteristics of included Goldmann applanation tonometry studies.

Noncontact Tonometry

Salim (2009) included noncontact tonometry, individually and in all possible combinations, with other measures of structure and function to determine the accuracy of single and combination tests (Table 14).91 Intraocular pressure, as measured by noncontact tonometry, was found not to be a very sensitive test for detecting glaucoma (sensitivity 22.1%). The investigators acknowledge that use of topical medications by the glaucoma participants could limit the ability to identify those with disease.

Table 14. Characteristics of included noncontact tonometry studies.

Table 14

Characteristics of included noncontact tonometry studies.

Tendency-Oriented Perimetry

Pierre-Filho (2006) compared frequency-doubling technology (FDT), tendency-oriented perimetry using the Octopus 301 G1-TOP program, SITA Standard and SITA Fast in 117 eyes (64 with glaucoma and 53 healthy eyes) (Table 15).92 The Octopus 301 perimeter test was considered abnormal under two conditions: when the mean defect was” > 2dB and/or the loss variance > 6 dB (TOP 1), and… there were at least seven points (three of them contiguous) with a reduction in sensitivity ≥ 5 dB in the corrected comparisons graphic (TOP 2).” The sensitivity using definition TOP 1 was 87.5 percent (95% CI: 76.3–94.1%) and the specificity was 56.6 percent (95% CI: 42.4–69.9%). With definition TOP 2 the sensitivity was 89.1 percent (95% CI: 78.2–95.1%) and the specificity was 62.3 percent (47.9–74.9%).

Table 15. Tendency-oriented perimetry.

Table 15

Tendency-oriented perimetry.

Grading of Evidence

The grading of the evidence for this comparative effectiveness review is summarized in Table 16. We judged the overall strength of evidence to be low based on a summary assessment of the risk of bias, consistency, directness, and precision of the included studies. We concluded that the 83 observational studies were at high risk of bias primarily due to the large percentage (68%) that enrolled participants who were not representative of those who would receive the test in practice. We determined that the wide variability in effects sizes and significant clinical heterogeneity contributed to inconsistency in the evidence base. The evidence is also indirect as we did not identify any studies that linked screening to the final health outcomes of interest (Figure 1). The sensitivity and specificity of the candidate screening tests were determined to be imprecise due to the wide confidence intervals accompanying the point estimates.

Table 16. Grading of evidence.

Table 16

Grading of evidence.

Applicability

The applicability of the evidence is limited by the participants, tests, and setting selected for the included studies (Tables 215). Three of the 83 studies included a population-based sample and the remaining included healthy participants and those with known or suspected glaucoma at the time of screening. Given that the majority of the studies included those with known or suspected disease, the evidence is not applicable to routine screening and primary care settings and the estimates of sensitivity and specificity may be overestimates of the true effect given that the spectrum of disease represented in the studies includes more severely affected individuals compared to those who are unaffected (healthy controls). The included tests not only varied with respect to the skill required to operate and interpret the findings, portability, and availability, but were also devices that are almost exclusively found in eye care provider settings. The exceptions are tonometry and ophthalmoscopy, which may be found in primary care settings, but have limited sensitivity to detect persons with glaucoma. Finally, although we intended to include a discussion of the validity of community and non-eye care health provider screenings, the studies that met the inclusion criteria were conducted in eye care provider settings only. As a result, the findings of this comparative effectiveness review are not generalizable to primary care and other non-eye care settings.

Conclusion

Based on the Burr (2007) findings,7 standard automated perimetry was compared with other tests available at the time. SAP had higher sensitivity than Goldmann tonometry, similar sensitivity compared to HRT, and lower sensitivity than disc photos or FDT. In terms of specificity, SAP performed better than disc photos and FDT, similar to HRT, and worse than Goldmann tonometry.

We identified several additional studies assessing the performance of glaucoma screening tests not included in the Burr review. The studies included newer imaging (GDx, HRT III, OCT) and functional (Short Wavelength Automated Perimetry, new FDT patterns) technologies. However, despite improvements in the technology, it is still not clear that there is any one test or combination of tests suitable for use in glaucoma screening in the general population. Significant barriers to identifying and characterizing potential glaucoma screening tests remain including the lack of a definitive diagnostic reference standard for glaucoma and the heterogeneity in the design and conduct of the studies. Because of these barriers, the ranges of sensitivities, specificities, and areas under the ROC curve are large and prevent a coherent synthesis.

Key Question 4

We did not identify any study that addressed whether participation in an OAG screening-based program leads to reductions in intraocular pressure when compared to another screening-based program or no screening.

Key Question 5

Evidence From Systematic Reviews

Hatt (2006) undertook a systematic review of randomized trials of screening modalities for OAG compared to no screening (including opportunistic case finding and referral). There were no restrictions on included populations.16 The primary outcome of interest was the prevalence of visual field loss, defined as the proportion of participants with a pre-specified severity of visual field loss diagnosed by either manual or automated field assessment. Other primary outcomes included the prevalence of optic nerve damage and visual impairment. Electronic searches of five databases including MEDLINE and CENTRAL were conducted in 2006 and again in January 2009, but none of the studies that were identified were eligible for inclusion. The review authors acknowledged that randomized controlled trials require lengthy follow-up and are predicated on identifying appropriate candidate tests that may be incorporated into a screening-based program.

Detailed Analysis of Primary Studies

We did not identify any primary study that addressed whether participation in an OAG screening-based program leads to reductions in visual field loss or optic nerve damage when compared to another screening-based program or no screening.

Key Question 6

We did not identify any study addressing the harms associated with screening for OAG.

Image methodsf1