U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Kemper AR, Coeytaux R, Sanders GD, et al. Disease-Modifying Antirheumatic Drugs (DMARDs) in Children With Juvenile Idiopathic Arthritis (JIA) [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2011 Sep. (Comparative Effectiveness Reviews, No. 28.)

  • This publication is provided for historical reference only and the information may be out of date.

This publication is provided for historical reference only and the information may be out of date.

Cover of Disease-Modifying Antirheumatic Drugs (DMARDs) in Children With Juvenile Idiopathic Arthritis (JIA)

Disease-Modifying Antirheumatic Drugs (DMARDs) in Children With Juvenile Idiopathic Arthritis (JIA) [Internet].

Show details

Results

Literature Search and Screening

Searches of all sources identified a total of 4815 potentially relevant citations. Table 3 details the number of citations identified from each source.

Table 3. Sources of citations.

Table 3

Sources of citations.

Figure 2 describes the flow of literature through the screening process. Of the 4815 citations identified by our searches, 3998 were excluded at the abstract screening stage. Of the 817 articles that passed the initial abstract screening, 313 were gray literature articles that were excluded from further review. The remaining 504 articles went on to full-text screening. Of these, 306 were excluded, leaving a total of 198 included articles. Appendix F provides a complete list of articles excluded at the full-text screening stage, with reasons for exclusion.

Figure 2 describes the flow of literature through the screening process. Of the 4815 citations identified by our searches, 3998 were excluded at the abstract screening stage. Of the 817 citations that passed the abstract screening, 313 were gray literature articles that were reviewed separately and excluded from further review. 504 articles (including 1 gray literature article) went on to full-text screening. Of these, 306 were excluded for various reasons relating to the specific key question(s) for which they were considered, leaving a total of 198 included articles.

Figure 2

Literature flow diagram.

Figure 3 summarizes the treatment comparisons evaluated in the included efficacy studies (Key Questions 1, 2, and 4). Six non-biologic DMARDs and seven biologic DMARDs have been compared to conventional treatment with or without methotrexate. Two different sets of non-biologic DMARDs have been directly compared (leflunomide vs. methotrexate and hydroxychloroquine vs. penicillamine), and two biologic DMARDs have been directly compared (etanercept vs. infliximab). Three of the biologic DMARDs that have been compared to conventional treatment were in the same class (TNF inhibitors: adalimumab, etanercept, and infliximab). However, study heterogeneity precluded meta-analysis of this combined class versus conventional treatment. Details on the number of studies describing each treatment comparison are provided under the relevant Key Question, below.

Figure 3 summarizes the treatment comparisons evaluated in the included efficacy studies (Key Questions 1, 2, and 4). Six non-biologic DMARDs (azathioprine, leflunomide, methotrexate, hydroxychloroquine, penicillamine, sulfasalazine) and seven biologic DMARDs (abatacept, anakinra, tocilizumab, intravenous immunoglobulin [IVIG], etanercept, adalimumab, infliximab) have been compared to conventional treatment with or without methotrexate. Two different sets of non-biologic DMARDs have been directly compared (leflunomide vs. methotrexate and hydroxychloroquine vs. penicillamine), and two biologic DMARDs have been directly compared (etanercept vs. infliximab). Three of the biologic DMARDs that have been compared to conventional treatment are in the same class; these are the tumor necrosis factor [TNF] inhibitors adalimumab, etanercept, and infliximab.

Figure 3

Treatment comparisons evaluated in efficacy studies.

Key Question 1. In children with JIA, does treatment with DMARDs, compared to conventional treatment, improve laboratory measures of inflammation or radiological progression, symptoms (e.g., pain, symptom scores), or health status (e.g., functional ability, mortality)?

Key Points

  • Among the non-biologic DMARDs, there is some evidence that methotrexate is superior to conventional therapy and oral corticosteroids.
  • Among children who have responded to a biologic DMARD, randomized discontinuation trials suggest that continued treatment for 4 months to 2 years decreases the risk of having a flare. Although these studies evaluated DMARDs with different mechanisms of action (abatacept, adalimumab, anakinra, etanercept, intravenous immunoglobulin [IVIG], tocilizumab) and used varying comparators, followup periods, and descriptions of flare, the finding of a reduced risk of flare was precise and consistent.
  • Conventional treatment has changed over time (e.g., use of oral corticosteroids in older studies of non-biologic DMARDs versus more frequent use of methotrexate in more recent studies of biologic DMARDs). Comparing the effectiveness of biologic and non-biologic DMARDs is challenging because of variations in comparators and how these comparators are described.
  • There is significant variation in outcome measures and how these outcome measures are reported.

Detailed Analysis

Literature Identified

We identified of 20 publications describing 18 unique studies and involving 1532 patients that compared DMARDs to conventional treatments with or without methotrexate. Among these were 10 studies that evaluated seven biologic DMARDs (abatacept, adalimumab, anakinra, etanercept, infliximab, IVIG, and tocilizumab; see Table 4) and eight studies that evaluated five non-biologic DMARDs (azathioprine, penicillamine, hydroxychloroquine, methotrexate, and sulfasalazine; see Table 5).

Table 4. Studies comparing biologic DMARDs versus conventional treatments with or without methotrexate.

Table 4

Studies comparing biologic DMARDs versus conventional treatments with or without methotrexate.

Table 5. Studies comparing non-biologic DMARDs versus conventional treatments with or without methotrexate.

Table 5

Studies comparing non-biologic DMARDs versus conventional treatments with or without methotrexate.

There were 10 RCTs, of which four (described in five papers) were of good quality,11-15 four were of fair quality,16-19 and two were of poor quality.20,21 Key problems in the fair- and poor-quality studies included unclear methods of allocating to therapy, questionable blinding, and incomplete followup. There were two open-label comparison studies of poor quality.22,23 Six studies were randomized discontinuation studies, of which three (described in four papers) were of good quality,24-27 two were of fair quality,28,29 and one was of poor quality.30

A detailed summary of these studies, by DMARD evaluated, is provided below.

There were no good-quality RCTs comparing biologic DMARDs to conventional therapy. There were two good-quality RCTs comparing methotrexate, a non-biologic DMARD, to conventional therapy.13,14 However, in both studies, each group could also receive oral corticosteroids, which are not currently considered conventional therapy. A single good-quality trial of sulfasalazine showed better short-term (24-week) outcomes than treatment with NSAIDs.15

Biologic DMARDs Versus Conventional Treatment With or Without Methotrexate

Abatacept

One good-quality randomized discontinuation study evaluated abatacept.24 During the 6-month double-blind period of this study, there was statistically significant improvement compared to placebo in the active joint count (4.4 vs. 6; p = 0.02), CHAQ score (0.8 vs. 0.7; p = 0.04), physician global assessment (14.7 vs. 12.5; p < 0.01), and ACR Pediatric 90 (40 percent vs. 16 percent; p < 0.01). There was no statistically significant improvement in parent/patient global assessment (17.9 vs. 23.9; p = 0.70) or erythrocyte sedimentation rate (ESR; 25.1 vs. 30.7; p = 0.96).

Adalimumab

We found one good-quality randomized discontinuation trial that compared adalimumab to conventional therapy.25 The results were stratified by use of methotrexate. At the end of the 48-week double-blind phase, the proportion of patients who had a flare of disease in the adalimumab without methotrexate group was lower than in the conventional treatment group without methotrexate (43 percent vs. 71 percent; p = 0.03), and lower than in those groups that did receive methotrexate (37 percent vs. 65 percent; p = 0.02). The proportion who achieved ACR Pediatric 50 score in the adalimumab without methotrexate group was higher than in the conventional treatment without methotrexate group (53 percent vs. 32 percent; p = 0.01), and higher than in those groups that received methotrexate (63 percent vs. 38 percent; p = 0.03). Although the proportion who achieved ACR Pediatric 90 score was higher in the adalimumab without methotrexate group than in the conventional treatment without methotrexate group (30 percent vs. 18 percent), the difference was not statistically significant (p = 0.28). Similarly, the difference in the proportion who achieved the ACR Pediatric 90 among those who also received methotrexate was higher in the adalimumab group than in the conventional treatment group, but did not achieve statistical significance (42 percent vs. 27 percent; p = 0.17).

Anakinra

One randomized discontinuation trial compared anakinra to conventional therapy.30 This study was rated as poor in quality because it did not have sufficient statistical power to evaluate efficacy, there was insufficient reporting of randomization and concealment. The main goal of the study was to evaluate safety. By week 28 of blinded treatment, 16 percent who received anakinra and 40 percent who received placebo had had a flare (p = 0.11). There was improvement in the CHAQ score in the anakinra group compared to placebo (-0.25 vs. 0.13; no p-value reported). Similarly, there was improvement in the ESR among those who were treated with anakinra (-2.21 vs. 13.73; no p-value reported).

Etanercept

Two studies evaluated etanercept versus placebo. One good-quality randomized discontinuation trial evaluated children with a polyarticular course of JRA.26 In the double-blind component, fewer patients who received etanercept had a flare (28 percent vs. 81 percent; p = 0.003). There was also an improvement in the CHAQ score (-0.8 vs. -0.1). Overall, there was a 54 percent median improvement among those who received etanercept compared to no median change in the placebo group. There was an overall improvement in the number of active joints (7 vs. 13; no p-value reported); physician global assessment (2 vs. 5; no p-value reported); parent global assessment (3 vs. 5; no p-value reported); ESR (18 vs. 30; no p-value reported); and the proportion who achieved ACR Pediatric 50 (72 percent vs. 23 percent; no p-value reported).

The other study of etanercept was a fair-quality RCT that evaluated efficacy for the treatment of uveitis.16 This study had a small sample size. During the study, 6 of 12 in the test treatment arm and 2 of 5 in the conventional treatment arm improved. This was described by study investigators as no apparent difference.

Infliximab

One fair-quality RCT compared infliximab to conventional treatment.17 This study inconsistently and incompletely reported outcomes. The study did not find statistically significant differences between infliximab and conventional treatment in the ACR Pediatric 50 at 14 weeks (50 percent vs. 33.9 percent, respectively; p = 0.13) or the rate of clinical remission at 52 weeks (44.1 percent vs. 43.1 percent, respectively).

IVIG

Three studies compared IVIG to conventional treatment. One small (19 total in the double-blind phase), fair-quality, randomized discontinuation trial28 found a 3 percent decrease in the active joint count among those who were treated compared to a 30 percent increase in the placebo group. Physician global assessment improved for 3 percent of patients in the treatment group and worsened for 91 percent in the placebo group. This study used a main outcome measure that has not been validated and provided no statistical significance testing; there was also a potential conflict of interest with the study sponsor.

Another study22 compared IVIG to methylprednisolone. This study was considered to be of poor quality because it was open-label and non-randomized, analyses were not adjusted for baseline differences, and the sample was not adequately described. Investigators found no statistically significant difference between the IVIG and methylprednisolone groups for ESR (59 at baseline and 21 at 6 months vs. 61 at baseline and 24 at 6 months, respectively).

A small RCT20 found that IVIG compared to conventional therapy was associated with a non-statistically significant improvement in the median change in active joint count (-2 vs. -1) and in physician global assessment of improvement (50 percent improvement vs. 27 percent improvement; p > 0.3). This study was considered to be of poor quality because of the small sample size and high dropout rate.

Tocilizumab

One fair-quality randomized discontinuation trial evaluated tocilizumab.29 The screening and randomization procedures were not described. No p-values were reported for the outcomes of interest in this review. From the RCT component, the active joint count in the tocilizumab group decreased from 3.5 to 0. Similarly, in the conventional treatment group it decreased from 4 to 0. There was improvement in the CHAQ score for each group (-0.5 vs. -0.25). Both physician global assessment (51.0 to 5.5 vs. 51 to 14) and parent global assessment (51.0 to 4.5 vs. 55 to 39) improved. The ESR decreased for both the tocilizumab and conventional treatment group (35 to 0.1 vs. 38 to 15). The ACR Pediatric scores were reported graphically. The ACR Pediatric 70 increased in the tocilizumab group from approximately 70 percent to approximately 80 percent, but decreased in the conventional treatment group from approximately 80 percent to approximately 30 percent.

Meta-Analysis of Randomized Discontinuation Trials

Randomized discontinuation trials include only patients who initially responded to a treatment and primarily assess the risk of worsening when treatment is withdrawn. These studies evaluate sustainability of treatment effects and not the potential treatment effect among those who have not yet begun treatment. The randomized discontinuation trials identified by our search evaluated only biologic DMARDs (abatacept, adalimumab, anakinra, etanercept, IVIG, tocilizumab).

Four of the trials reported flare of arthritis,24-26,30 allowing us to calculate a summary measure of the risk of flare over the 4-month to 2-year durations of the studies. Other outcomes were too heterogeneous or were reported too incompletely to calculate a summary estimate. Although there were differences in the interventions, comparators, and duration of followup among the four studies, we found very little statistical heterogeneity. Figure 4 summarizes the risk ratio (RR) for flare (with 95 percent confidence interval [CI]) based on a random-effects model. Overall, the RR for having a flare among those who continued compared to those who discontinued was 0.48 (95 percent CI 0.36 to 0.63) over 4 months to 2 years. Although there is heterogeneity in study design, the RR for having a flare was similar across all studies (χ2 = 3.18, df = 3, p = 0.36; I2 = 6 percent). This suggests that among those who respond to a biologic DMARD, there is a significant risk of flare after discontinuation. There was insufficient evidence regarding the efficacy of the biologic DMARDs from the other studies that compared these treatments to conventional therapy with or without methotrexate.

Figure 4 displays a forest plot of the risk ratio (RR) of flare in the randomized discontinuation trials of certain biologic DMARDs (abatacept, adalimumab, anakinra, etanercept, IVIG, and tocilizumab). Overall, the RR for having a flare among those who continued compared to those who discontinued was 0.48 (95 percent CI 0.36 to 0.63) over 4 months to 2 years. Although there is heterogeneity in study design, the RR for having a flare was similar across all studies (chi-squared = 3.18, df = 3, p = 0.36; I-squared = 6 percent).

Figure 4

Comparison of symptomatic flares in children with JIA randomized to continuing a biologic DMARD versus placebo. Flares are listed as “Events” in the figure.

Non-Biologic DMARDs Versus Conventional Treatment With or Without Methotrexate

Azathioprine

One poor-quality RCT evaluated azathioprine.18 Allocation was not specified; there were baseline differences between those who received and did not receive azathioprine; it was unclear if outcomes were assessed blinded to the intervention status of subjects; and the outcomes were not well described. At 16 weeks of treatment, this study found non-statistically significant improvements with azathioprine in the number of active joints (-7 vs. -1; p = 0.45), physician global assessment (-5 vs. -2; p = 0.12), and the proportion with 50 percent improvement in ESR (4/13 subjects vs. 2/11 subjects; p = 0.36).

Hydroxychloroquine

Two RCTs evaluated hydroxychloroquine. One (described in two publications11,12) found no significant difference in the change in mean active joint count compared to placebo after 12 months (6.7 [95 percent CI -9.4 to -4] vs. -5.4 [-8 to -2.8]). The physician global assessment appeared slightly better for hydroxychloroquine than for placebo (70 percent better, 26 percent same, 2 percent worse compared to 53 percent better, 41 percent same, 6 percent worse; no p-value reported). There was no difference in the mean ESR decrease at 12 months (10 each).

The other study was an open-label RCT that compared hydroxychloroquine to gold.21 This study was considered to be of poor quality because allocation concealment was not specified, there were important baseline differences between the treatment groups, it was unclear if outcomes were assessed blinded to the intervention, and the outcomes were not well described. At 50 weeks, there were no statistically significant differences in the active joint count (–4 vs. –5), median change in the physician global assessment (-8 vs. -9), or change in the ESR (–12 vs. –11). Similarly, the physician overall assessment of at least 50 percent improvement was not statistically significantly different between the hydroxychloroquine group and the gold group (12 of 17 improved vs. 10 of 15 improved, respectively).

Methotrexate

Three studies compared methotrexate to conventional treatment without methotrexate. One good-quality RCT compared low-dose methotrexate, very low-dose methotrexate, and placebo in a 6-month trial.13 The mean active joint count decreased with low-dose methotrexate (-7.5), very low-dose methotrexate (-5.2), and placebo (-5.2; p > 0.3 overall). Physician global assessment improved with low-dose methotrexate compared to placebo (p = 0.02), but there was no statistically significant difference between the low-dose and very low-dose methotrexate groups for this outcome (p = 0.06). Based on a composite index with at least 25 percent improvement in articular score and improvement according to physicians and parents, 63 percent of those in the low-dose methotrexate group improved, compare to 32 percent in the very low-dose methotrexate group, and 36 percent in the placebo group (p = 0.013).

Another good-quality study14 compared methotrexate to placebo among children with extended oligoarticular JIA or systemic JIA in a double-blind RCT with crossover. Among those with oligoarticular JIA, there was statistically significant improvement in physician global assessment (p < 0.001) and ESR (p < 0.001) with methotrexate. The change in the number of joints with synovitis (-3) did not achieve statistical significance (p < 0.1). Similarly, among those with systemic JIA, there was improvement in physician global assessment (p < 0.001), but not in ESR (p = 0.06) or in the number of joints with synovitis (p = 0.06) in patients taking methotrexate.

A poor-quality, non-randomized study compared methotrexate to NSAIDs and to methylprednisolone.23 In this study, the active joint count improved more in the methylprednisolone group than in either the methotrexate or NSAID groups (-7.1 vs. -4 vs. -0.8, respectively; p = 0.008). This study, however, had confounding by indication; the analysis did not adjust for potential confounders; outcomes were not assessed blinded to the treatment condition; and patients were not blinded to their treatment assignments.

Penicillamine

Four publications describing three distinct studies evaluated penicillamine. One good-quality RCT11,12) found no statistically significant effect on the mean active joint count with penicillamine compared to placebo after 12 months (-3 [95 percent CI -4.8 to -1.1] vs. -5.4 [-8 to -2.8]); results were similar for physician global assessment (56 percent better, 28 percent same, 16 percent worse vs. 53 percent better, 41 percent same, 6 percent worse) and mean decrease in ESR (9.4 vs. 10).

A fair-quality RCT19 found no statistically significant effect on ESR in a 6-month study in patients treated with penicillamine compared to conventional treatment (-18 vs. -8). However, this study did find a statistically significant decrease in the number of painful joints in patients taking penicillamine (-3 vs. -1.6; p < 0.04). This study was of fair quality because the patients in the placebo group may have had worse disease.

A poor-quality, open-label RCT21 found no statistically significant effect for penicillamine compared to gold at 50 weeks in the active joint count (-2.5 vs. -5), median change in the physician global assessment (-7.5 vs. -9), change in ESR (-8 vs. -11), or the proportion of patients who had at least a 50 percent improvement based on physician assessment (8/12 vs. 10/15).

Sulfasalazine

One good RCT evaluated sulfasalazine versus placebo.15 In this study, it was unclear which time points were compared. However, there was statistically significant improvement with sulfasalazine in active joint count (-5.54 vs. -0.78; p = 0.005), physician global assessment (-1.95 vs. -0.99; p = 0.0002), patient/parent global assessment (-0.98 vs. -0.44; p = 0.01), and decrease in ESR (-0.74 vs. -0.04; p < 0.001). The number of improved joints by x-ray findings was not statistically significantly different (0.71 vs. 0.53).

Key Question 2. In children with JIA, what are the comparative effects of DMARDs on laboratory markers of inflammation or radiological progression, symptoms (e.g., pain, symptom scores), or health status (e.g., functional ability, mortality)?

Key Point

  • There are few direct comparisons of DMARDs in children with JIA, and insufficient evidence to determine if any specific drug or drug class has greater beneficial effects.

Detailed Analysis

Literature Identified

We identified six reports describing five unique studies and involving 520 patients that directly compared various DMARDs with one another (Table 6). Among these studies were one that compared two biologic DMARDs (etanercept and infliximab) and four that compared various non-biologic DMARDs (penicillamine, hydroxychloroquine, leflunomide, methotrexate, and sulfasalazine). A detailed summary of these studies, by treatment comparison, is provided below. Of the five studies, one was an open-label, non-randomized comparison, and the rest were RCTs. However, only two of the studies were considered to be of good quality (one comparing penicillamine to hydroxychloroquine and another comparing leflunomide to methotrexate in a non-inferiority design study); the rest were poor in quality.

Table 6. Studies comparing various DMARDs with one another.

Table 6

Studies comparing various DMARDs with one another.

Comparisons of Biologic DMARDs

Etanercept vs. Infliximab

One poor-quality, non-randomized, open-label study compared etanercept to infliximab.31 This study was considered to be of poor quality because drug switching made it hard to interpret findings, few data were provided about the subjects, and assessment was not blinded to therapy. In addition, a total of 6 of the 24 subjects did not complete the study. Among the 10 receiving etanercept, one was withdrawn for non-compliance. Among the 14 receiving infliximab, 4 withdrew because of adverse events and one withdrew because of failure to reach the ACR Pediatric 50. After 12 months of treatment, the change in active joint count was similar between etanercept (-9.5 [95 percent CI -19 to -3]) and infliximab (-11.5 [95 percent CI -17 to -7.5]). Results were also similar in the two treatment groups for changes in the CHAQ score (-0.81 vs. –0.31; p = 0.12), physician global assessment (-29 vs. -35; p = 0.65), patient/parent global assessment (-24.5 vs. -27.5; p = 0.81), ACR Pediatric 75 (67 percent each), ACR Pediatric 50 (78 percent vs. 89 percent; p-value not reported, but calculated as 0.53) and ESR (28.5 vs. -25; p = 0.37).

Comparisons of Non-Biologic DMARDs

Penicillamine vs. Hydroxychloroquine

Two publications11,12 described a good-quality RCT that compared penicillamine and hydroxychloroquine to placebo (results described above, under Key Question 1) and to one another. At 12 months, neither active drug was superior to the other based on active joint count, ESR, or physician global assessment.

One poor-quality, open-label RCT21 compared hydroxychloroquine and penicillamine to gold (results described above, under Key Question 1) and to one another. At 50 weeks, there were no significant differences between the two DMARDs in active joint count, physician global assessment, or ESR.

Sulfasalazine vs. Hydroxychloroquine

One poor-quality RCT compared sulfasalazine to hydroxychloroquine.32 This study was considered to be of poor quality because there was an inadequate description of the subjects, it was unclear if the study was blinded, and many of the outcomes were not validated. After 6 months, the average number of affected joints decreased by 1.5 in the sulfasalazine group and by 0.6 in the hydroxychloroquine group (no p-value reported). During this time, the ESR decreased in both the sulfasalazine group (52.7 to 36.3; no p-value reported) and hydroxychloroquine group (41.2 to 28.9; no p-value reported). Physician global assessment (9 better, 9 worse, 3 no effect for sulfasalazine vs. 8 better, 3 worse, 7 no effect for hydroxychloroquine; no p-value reported) and patient global assessment (10 better, 7 worse, 3 no effect for sulfasalazine vs. 7 better 5 worse 3 no effect for hydroxychloroquine; no p-value reported) were similar in the two groups.

Leflunomide vs. Methotrexate

One good-quality RCT compared leflunomide to conventional treatment with methotrexate.33 This 16-week study with a 32-week blinded extension found improvements in both groups. The active joint count decreased for the leflunomide and conventional treatment groups (–8.1 vs. -8.9; p = not significant). Similarly, in both groups there were improvements in the CHAQ score (–0.44 vs. -0.39; p = not significant), physician global assessment (-31.5 vs. -32.1; p = not significant), parent global assessment (-15.9 vs. -22; p = not significant), and ESR (-6.5 vs. 7.2; p = not significant). As the trial proceeded, the methotrexate group appeared to have a greater improvement in the proportion of patients who had an ACR Pediatric 30, Pediatric 50, or Pediatric 70 response. For example, 70 percent of the leflunomide group and 83 percent of the methotrexate group achieved an ACR Pediatric 70 response at 48 vs. 16 weeks. The improvement was not statistically significant for either the leflunomide (p = 0.01) or methotrexate (p = 0.06) groups. No statistical comparison was made between the two groups.

Key Question 3. In children with JIA, does the rate and type of adverse events differ between the various DMARDs or between DMARDs and conventional treatment with or without methotrexate?

Key Points

  • There are few direct comparisons of DMARDs with one another in children with JIA, and insufficient evidence to determine if there are differential rates of adverse events between specific drugs or drug classes.
  • Reported rates of adverse events are similar between DMARDs and placebo in nearly all published RCTs.
  • Adverse event rates may be underestimated by clinical trials that excluded patients who did not tolerate an intervention during a run-in phase.
  • Our review identified 11 incident cases of cancer among several thousand children treated with one or more DMARDs.
  • Two recently published studies identified 66 cases of malignancy worldwide in children with JIA exposed to a tumor necrosis factor α blocker.
  • The available data on harm must be interpreted with caution because data on adverse events have not been systematically collected or reported across studies.

Detailed Analysis

Literature Identified

Of the 15 eligible RCTs identified by our search strategy, 13 included a placebo comparison and reported adverse events. Eight of these were traditional RCTs and five were randomized discontinuation trials. Because one of these studies included three study arms, a total of 14 DMARDs or DMARD combinations were directly compared to placebo. Anakinra, abatacept, etanercept, infliximab, tocilizumab, azathioprine, hydroxychloroquine, and sulfasalazine were each represented by a single study; etanercept, IVIG, and penicillamine were each represented by two studies; and methotrexate was compared to placebo in one study and was used in combination with infliximab in another study. A total of 914 unique patients were represented in the 13 placebo-controlled trials.

Our wider review of the adverse events literature identified a total of 151 publications that reported adverse events possibly associated with a DMARD among patients with JIA (Appendix E). Of these 151 publications, 19 (13 percent) were RCTs; the remainder were open-label extension phases of previously published RCTs, prospective or retrospective series, or case reports. Four thousand and three hundred and forty-four (4344) patients were represented in these reports, with 2286 patients (53 percent) participating in an RCT. There was insufficient information in these publications to determine whether data from some patients were included in more than one published report. Furthermore, some series included some patients who were either adults or who did not have JIA.

An additional two publications34,35 identified 66 (possibly not unique) cases of malignancies diagnosed in children undergoing treatment for JIA with a DMARD; we discuss these two studies separately because they did not include information about the population of patients from which these cases were identified.

Reporting standards for adverse events varied greatly across studies. For the purpose of this report, we consolidated the many different descriptions of reported adverse events into 24 broad categories, which we in turn categorized as involving a primary organ system, being an isolated symptom, or as “other.” We did not include minor or transient events (e.g., rash) that were identified by the authors of the published reports as possibly associated with infusion of the drug.

Placebo-Controlled RCTs of Biologic DMARDs

Safety data from the 13 placebo-controlled trials are summarized in Table 7 (Parts 1-3) and described in greater detail for the specific DMARDs evaluated in the sections that follow.

Table 7. Adverse events reported in RCTs.

Table 7

Adverse events reported in RCTs.

Abatacept

One good-quality study24 randomized 62 patients to abatacept in a 6-month RCT that was preceded by an open-label run-in phase. No adverse events associated with abatacept or placebo were reported.

Anakinra

One study rated as being of fair quality for the purposes of evaluating safety randomized 25 patients to anakinra in a 16-week RCT that was preceded by an open-label run-in phase.30 Among the patients in the anakinra arm, 6 (24 percent) had gastrointestinal events, 2 (8 percent) had dermatologic events, 8 (32 percent) had respiratory events, 6 (24 percent) had neurologic events, 3 (12 percent) had fever, 2 (6 percent) reported pain, and 7 (28 percent) had other adverse events. None of the adverse events was considered by the authors to be serious. These rates were similar to those observed in the placebo arm, with the exception of the 10 patients (40 percent) who reported dermatologic events.

Etanercept

Two studies compared etanercept to placebo. One26 was a good-quality study that evaluated only children with polyarticular JRA. Of the 25 patients randomized to the etanercept arm after an open-label run-in phase, gastrointestinal and dermatologic events were each reported in one patient (four percent). There were no dropouts due to adverse events. The second study16 was a fair-quality RCT that evaluated the safety and efficacy of etanercept for the treatment of uveitis. Unspecified infections were reported in 5 of the 7 patients (71 percent) in the etanercept arm, and in 3 of the 5 patients (60 percent) in the placebo arm

Infliximab

Infliximab plus methotrexate was compared to placebo plus methotrexate in one fair-quality RCT.17 This study inconsistently and incompletely reported outcomes, and there was insufficient information to compare adverse event rates in the two study arms over all time periods. Infection was reported in 41 of the 60 patients (68 percent) who received infliximab 3 mg/kg plus methotrexate during the 14 weeks of the RCT phase and the subsequent 38 weeks of the open-label continuation phase, compared to 28 of 62 patients (45 percent) in the placebo plus methotrexate arm during the 14-week RCT phase. Nineteen serious adverse events were reported among the 60 patients (32 percent) in the infliximab plus methotrexate arm over 52 weeks, compared to 3 of 62 patients (5 percent) in the placebo plus methotrexate group over 14 weeks. The nature of the serious adverse events was not reported. Two patients (three percent) in the infliximab plus methotrexate arm and one patient (two percent) in the placebo plus methotrexate arm dropped out because of adverse events.

IVIG

Two studies compared IVIG to placebo. One small, fair-quality study28 reported no adverse events during the course of the 4-month RCT phase preceded by a 3- to 6-month run-phase among the 10 patients randomized to IVIG or the 9 patients randomized to placebo. Another study,20 rated poor in quality, reported macrophage activation syndrome in 1 patient (7 percent) and elevated liver enzymes in another (7 percent) among the 14 patients randomized to IVIG, and no similar adverse events among the patients in the placebo arm.

Tocilizumab

One fair-quality study compared tocilizumab to placebo during a 12-week double-blind RCT phase preceded by a 6-week run-in phase.29 One patient in each group (5 percent) dropped out because of adverse events. Of the 20 patients in the tocilizumab arm, 1 (5 percent) reported a gastrointestinal event, 2 (10 percent) reported a respiratory event, and 1 (5 percent) reported a mononucleosis infection. Similar rates of adverse events were reported by patients in the placebo arm.

Placebo-Controlled RCTs of Non-Biologic DMARDs

Azathioprine

One fair-quality study compared azathioprine to placebo in a 16-week RCT.18 Among the 17 patients randomized to azathioprine, 3 (18 percent) dropped out because of adverse events, 3 (18 percent) had an infection, 2 (12 percent) had renal or urologic events, and 2 (12 percent) had a hematologic abnormality. The adverse event rate for dermatologic events, fever, nausea/vomiting, pain, alopecia, or bleeding was 6 percent among patients in the azathioprine arm. Among the 15 patients randomized to placebo, none dropped out because of adverse events, 2 (13 percent) reported pain, and 1 (7 percent) reported alopecia.

Hydroxychloroquine

One fair-quality RCT compared both hydroxychloroquine and penicillamine to placebo over the course of 12 months.11 Of the 57 patients in the hydroxychloroquine arm, 3 (5 percent) dropped out due to adverse events, 2 (4 percent) had a dermatologic event, 7 (12 percent) had a renal or urologic event, 6 (11 percent) had anemia, 4 (7 percent) had a hematologic abnormality, and 8 (14 percent) had other laboratory abnormalities. Adverse event rates were similar among patients in the placebo arm.

Methotrexate

A single good-quality study compared methotrexate to placebo in a double-blind RCT of 6 months' duration.13 Forty-six patients were randomized to low-dose (10 mg/m2/week) methotrexate, 40 were randomized to very low-dose (5 mg/m2/week) methotrexate, and 41 were randomized to placebo. Of the 86 patients in a methotrexate arm, 3 (3 percent) dropped out due to adverse events, 10 (12 percent) reported a gastrointestinal event, 6 (7 percent) reported pain, and 30 (35 percent) had a laboratory abnormality (compared to 13 percent in the placebo arm). None of the patients in the placebo arm dropped out because of adverse events.

Penicillamine

One fair-quality RCT compared both penicillamine and hydroxychloroquine to placebo over the course of 12 months.11 Of the 51 patients in the penicillamine arm, 2 (4 percent) dropped out do to adverse events, 4 (8 percent) had a dermatologic event, 1 (2 percent) had an ophthalmologic event, 2 (4 percent) had anemia, 4 (8 percent) had a hematologic abnormality, and 9 (17 percent) had other laboratory abnormalities. In another study, a good-quality RCT of 6 months' duration,19 38 patients were randomized to the penicillamine arm. Among those patients, 6 (16 percent) reported a gastrointestinal event, 3 (8 percent) reported a dermatologic event, 2 (5 percent) had an infection, and 1 (3 percent) had a hematologic abnormality. Adverse event rates were similar among the patients in the placebo arms in both studies.

Sulfasalazine

A single good-quality RCT of 6 months' duration compared sulfasalazine to placebo.15 Among the 35 patients randomized to sulfasalazine, 10 (29 percent) dropped out due to adverse events (compared to none in the placebo arm), 24 (69 percent) reported a gastrointestinal event, 9 (26 percent) reported a dermatologic event, 9 (26 percent) reported a neurologic event, 2 (6 percent) had hematologic abnormalities, 2 (6 percent) had elevated liver enzymes, and 4 (11 percent) had other laboratory abnormalities. All of the adverse event rates were higher in the sulfasalazine group than in the placebo group.

Other Studies

The data from our wider review of the literature reporting adverse events among patients with JIA undergoing treatment with a DMARD are summarized in Appendix E. Patients treated with one or more DMARDs in the placebo-controlled RCTs described in the preceding two sections are included in Appendix E; patients in non-DMARD comparison arms of those RCTs are not included. The “other” category of adverse events includes a wide variety of events that were infrequently reported, such as asthenia, malaise, hostility, or taste disturbance.

A single death possibly associated with DMARD use was reported in a girl on immunosuppressive therapy with cyclosporine A and methotrexate who died of Legionella pneumonia at the age of 53 months.36 Autopsy revealed stage IV lymphoma that was not previously diagnosed.

An additional 10 cases of cancer, seven of them lymphomas, were identified: two cases of thyroid carcinoma (one with etanercept,37 the other with etanercept plus methotrexate38); a case of yolk sac carcinoma with etanercept plus methotrexate;38 two cases of lymphoma with etanercept plus methotrexate;38,39 two cases of lymphoma in patients who had received infliximab, etanercept, and methotrexate;39 and three cases of lymphoma with methotrexate alone.40-42 Apart from than the 11 cases of cancer among the several thousand patients represented by the publications we reviewed, there was no clear evidence of a high incidence or prevalence of any given serious adverse event associated with DMARDs.

Two studies reported cases of malignancies possibly associated with tumor necrosis factor α blockers in children with JIA. Diak et al.34 searched the U.S. Food and Drug Administration Adverse Event Reporting System through April 2008 to identify reported malignancy among persons aged 22 years or younger who had received treatment with infliximab, etanercept, or adalimumab. The authors identified 48 cases, half of which were lymphomas. The majority of reported cases (88 percent) involved the concomitant use of other immunosuppressants. McCroskery et al.35 searched the etanercept clinical trials database and global safety databases to identify 15 confirmed and 3 potential malignancies in children with JIA who had been treated with etanercept. Seven of the confirmed cases were lymphomas. Neither study reported the size of the population of children from which these cases were identified, thereby precluding accurate estimation of event rates.

Key Question 4. How do the efficacy, effectiveness, safety, and adverse effects of treatment with DMARDs differ among the various categories of JIA?

Key Point

  • Insufficient data are available to evaluate the efficacy, effectiveness, safety, or adverse effects of treatment with DMARDs by category of JIA.

Detailed Analysis

Literature Identified

The studies considered for this question were those identified for Key Questions 1 and 2, which also included the placebo-controlled trials considered for Key Question 3.

Efficacy and Effectiveness

Only one study compared the efficacy of the DMARD studied (methotrexate) across different diagnostic categories of JIA.14 There was no statistically significant difference in the efficacy of methotrexate for oligoarticular JIA versus systemic JIA.

Safety and Adverse Events

The only study we identified that explicitly compared the efficacy of treatment by diagnostic category14 did not report data on safety data or adverse events. We did not identify any studies that provided reliable information on the comparative safety or rates or types of adverse events among the various categories of JIA.

Key Question 5. What are the validity, reliability, responsiveness, and feasibility of the clinical outcomes measures for childhood JIA that are commonly used in clinical trials or within the clinical practice setting?

Key Points

  • The CHAQ was the most extensively evaluated instrument of the priority measures we considered. While it demonstrated high reproducibility and internal consistency, it had only moderate correlations with indices of disease activity and quality of life, and poor to moderate responsiveness. The CHAQ is sensitive to the degree of disability at baseline, with higher responsiveness for those with initially worse functional impairment.
  • In general, reliability was moderate to high for measures of physical function for all measures examined, but poor to moderate for psychosocial domains. Similar findings were noted for measures of validity and responsiveness, where measures of psychosocial function and quality of life showed less correlation with disease activity indices and less responsiveness compared to the physical aspects of JIA. These findings are important to consider when discussing risk and benefits of altering treatments, as patients may have different tradeoffs based on the psychosocial aspects of disease.
  • No one instrument or outcome measure appears superior in describing the various aspects of JIA with adequate reliability, validity, and responsiveness.
  • Definitions to describe various disease states including improvement, remission, and flare have been developed, but further studies are needed to better define their psychometric properties.

Detailed Analysis

Measures Evaluated

As described in the Methods section, based on our initial review of the literature identified, and in collaboration with the project's technical expert panel (TEP), we selected seven measures for detailed evaluation for Key Question 5. This section provides basic descriptions of these seven measures. While several other outcome instruments have been developed for JIA, including the Juvenile Arthritis Functional Assessment Scale and Report and the Juvenile Arthritis Functionality Scale, their psychometric properties were not independently examined, as they were not selected as priority measures by the TEP.

Measures of Disease Activity
  • Active joint count (AJC): Standard full joint count assesses 71 possible joints for active disease, defined as joints with swelling or pain/tenderness on range of motion. Limited range of motion may also be assessed, but this is listed as a separate measure from active joint count. This requires a full musculoskeletal exam by a health professional.
  • Physician global assessment of disease activity (PGA): Typically assessed by asking the physician to rate the child's overall disease activity on a visual analog scale (VAS), with higher scores indicating greater disease activity. Most commonly assessed utilizing a 100 mm VAS; representative anchors are “remission” and “very severe.” The same scale is used for all categories of JIA.
  • Parent/patient global assessment of well-being (PGW): Assessed by a VAS, most commonly by asking the parent/caretaker to assess how their child is doing after considering all the ways that arthritis affects their child's life. Representative anchors are “very well” and “very poorly.” While the PGA assesses only disease activity, the PGW is an assessment of overall well-being.
Measures of Functional Status/Disability
  • Childhood Health Assessment Questionnaire (CHAQ): The CHAQ was adapted from the Stanford Health Assessment Questionnaire (HAQ), a validated measure used in adult populations to describe disability quantitatively. The CHAQ focuses on disability and discomfort caused by JIA, which have previously been identified as the major indicators of disease impact. The CHAQ consists of a disability index (CHAQ-DI; 30 items, 8 domains), and two visual analogue scales, one for pain/discomfort (100 mm VAS), and the second for overall well-being (100 mm VAS). The disability index is scored based on the amount of difficulty the child has in completing various tasks. To allow for variation based on the child's age and development, rather than disease status, a “not applicable” category also exists. The instrument is usually completed by parents, although there is a child's form for children over 8 years of age. The CHAQ is scored from 0 to 3, with higher scores indicating greater disability. The CHAQ is widely used and has been validated in multiple languages. A ceiling effect has been noted with the CHAQ, with poor discriminate ability for children with mild functional impairments. Furthermore, it does not distinguish nor correct for impairments due to old damage versus active disease.
Measures of Health-Related Quality of Life
  • Child Health Questionnaire (CHQ): The CHQ is a general quality-of-life questionnaire which has been in used in children with JIA. It is a self-administered questionnaire with both a parent form, which is available in two lengths (50 or 28 items) and a child form with 87 items (for children aged > 10 years). Most studies in JIA utilize the 50-item questionnaire for parents. The CHQ addresses multiple domains, including physical functioning, bodily pain or discomfort, general health, range in health, limitations in schoolwork and activities with friends, mental health, behavior, self-esteem, family cohesion, limitations in family activities, and emotional or time impact on parent. Scores range from 0 to 100, with higher scores indicating better well-being. Scores are calculated using equations provided in the CHQ manual. The CHQ is reported as a physical score (CHQ PhS) and a psychosocial score (CHQ PsS), as well as a combined score.
  • Pediatric Quality of Life Inventory (PedsQL) 4.0: The PedsQL is a self-administered questionnaire consisting of generic core questions and disease-specific questions. It applies to children ages 2 to 18 years and includes both a child and parent component. The generic core has 23 items assessing 4 domains: physical, emotional, social, and school functioning.
  • Pediatric Quality of Life Inventory Rheumatology Module (PedsQL-RM): The PedsQL-RM consists of 22 items addressing 5 domains: pain and hurt, daily activities, treatment, worry, and communication. The total score is on a 0 to 100 scale, with higher scores indicating better quality of life. The total score is calculated from the physical score and a psychosocial score (average of emotional, social, and school functioning scores).

The above-listed measures are further described and compared in Table 8.

Table 8. Outcomes measures assessed.

Table 8

Outcomes measures assessed.

Definitions of Treatment Response now Under Development

In addition to the measures prioritized for detailed evaluation, we identified four developing definitions of treatment response: ACR Pediatric response criteria, a consensus-based definition of remission,43,44 flare,45 and minimal disease activity. These definitions are multi-dimensional, often using data from the measures we evaluated in detail.

Literature Identified

We identified of 35 publications describing 34 unique studies and involving 14,831 patients that investigated the psychometrics of the selected outcomes measures or developing definitions of treatment response (see Table 9). Among these were 14 studies that evaluated reliability, 21 studies that evaluated validity, and 9 that evaluated responsiveness for the selected outcomes measures. Overall, there were 3 RCTs, 11 longitudinal non-randomized trials, 16 cross-sectional studies, 3 studies with both a longitudinal arm and cross-sectional component, and 1 study (of a developing definition of treatment response) that involved a consensus-forming process. Of our selected outcomes measures, the CHAQ was most extensively studied, with 23 studies. The overall quality of the studies was fair, with few studies commenting on blinding, and only one46 reporting sample size calculations.

Table 9. Studies of psychometric properties of common JIA outcomes measures and developing definitions of treatment response.

Table 9

Studies of psychometric properties of common JIA outcomes measures and developing definitions of treatment response.

Reliability

Reliability addresses the consistency of the instrument in measuring the construct of interest. We examined three areas of reliability: reproducibility, inter-rater reliability, and internal consistency. Instruments with greater reproducibility and inter-rater reliability may be more feasible to use in clinical trials and require smaller sample sizes to detect clinically important differences between treatment groups. We identified 10 studies examining various aspects of reliability for the CHAQ;46,47,50,52,54,57-60,72 two studies each for the PGA, PGW73,75 and PedsQL;65,72 and one for the CHQ.64

Reproducibility, also called test-retest reliability, measures the extent to which an instrument scores the same value on repeat administration, assuming the patient's status is unchanged. This was assessed for the CHAQ in five studies, all of which demonstrated high correlation between administrations (correlation coefficient range 0.79 to 0.96).47,52,54,57,58 The reliability of the PedsQL and CHQ are less well established in JIA populations. We did not identify any studies reporting reproducibility or internal consistency data in JIA populations for the joint counts, PGA, PGW, CHQ, or PedsQL.

Inter-rater reliability was most commonly explored to determine the correlation between parent and patient scores. Inter-rater reliability was measured for the CHAQ, CHQ, and PedsQL, all of which demonstrated a moderate to strong correlation between parent and child when assessing functional status or disability (CHAQ: 0.54 to 0.84;46,50,57,72 CHQ PhS: 0.69 to 0.87;64 PedsQL: 0.46 to 0.8, and PedsQL-RM: 0.3 to 0.90.65,72 The correlation between parent and child was lower for psychosocial domains in two studies, including the PedsQL-RM worry domain (correlation coefficient 0.3)65 and the CHQ PsS (correlation coefficient range, 0.38-0.53).64

Inter-rater reliability of the global assessment measures (PGA and PGW) was examined through comparisons of the physician and parent assessments, rather than parent/patient. The PGA and PGW were compared in two studies73,75 and were found to have high rates of discordance. The first study focused on discordance between parent- and physician-reported global assessment of 0 (no disease activity/good overall well-being), while the second study examined discordance overall in the rating between parents and physicians across the spectrum of disease activity (as defined by a difference of greater than 1 cm on the VAS). Both studies demonstrated discordance in 60 percent of participants.

Internal consistency, assessed most commonly using Cronbach's alpha, refers to the extent to which all items measure the same construct. Internal consistency was evaluated in four studies for the CHAQ, with all showing high internal consistency (Cronbach's alpha 0.88 to 0.94 for all domains except the domain for “arising” [0.69]).54,57,59,60 In addition, shorter versions of the CHAQ-DI were found to have high internal consistency, with Cronbach's alpha of 0.93 for both the 29-item and 18-item instruments.59

Validity

Validity refers to how well an instrument measures what it claims to measure. For some outcomes, such as joint inflammation, a reference standard is available (e.g., synovial biopsy) but may not be feasible or acceptable to patients. However, for many of the constructs assessed by the clinical outcome instruments we evaluated, there is no reference standard. Therefore, we evaluated construct validity based on how well the measures correlated with other indicators of disease, such as global assessments, articular counts, and scores from other validated instruments. We focused on studies in which the psychometric dimensions of the instrument were specifically evaluated for children with JIA. Validation studies looking at the performance of an instrument among rheumatology patients in general, but not specifically in JIA patients, are not included in this review.

Of the 21 articles that met our inclusion criteria, 17 explored validation of the CHAQ47,48,51-57,59-61,63,66,67,71,75 four validation of the CHQ,62-64,71 and two validation of the PGA and PGW.74,75 In addition, one study focused on the correlation of the PedsQL and PedsQL-RM with pain assessments.65

Results are summarized in Table 10. The CHAQ was most strongly correlated with the PGW, with a median correlation of 0.54 (0.44 to 0.7, 6 studies).48,53,54,56,71,75 Of the articular measures of disease, both the AJC and the joints with limited range of motion (LROM) demonstrated moderate correlations with the CHAQ, with a median correlation of 0.45 (0.14 to 0.67, 9 studies48,53-57,60,71,75) and 0.49 (0.3 to0.76, 7 studies47,48,53,55,63,66,71), respectively. There was considerable variability in these correlations, with the most significant variations among children categorized by disease duration. For children early in the course of disease, the CHAQ correlated less well with AJC than for children later in the course of disease (0.14 and 0.61, respectively). Those with late disease had a strong correlation with LROM (0.76), but lower correlations with PGA (0.51).53 Modified forms of the CHAQ, including reduced-item and digital versions, have been validated as well, although the correlation with measures of articular measures is slightly less than for the original CHAQ (values of 0.34 to 0.59).47,51,59

Table 10. Validity—correlations of instruments with measures of diseases and other instruments.

Table 10

Validity—correlations of instruments with measures of diseases and other instruments.

While there were no strong correlations between indicators of disease activity and the CHAQ, there were moderately strong correlations with other measures of functional status, including Steinbrocker functional class (Kendall Tau b 0.77).57 There were also moderate correlations with measures of quality of life, including the PedsQL (-0.62) and the PedsQL-RM (-0.63).48 Of interest, while there were moderate correlations between the CHAQ and the physical scale of the CHQ (PhS) (-0.58), there was poor correlation with the psychosocial scale of the CHQ (PsS) (-0.25).64

Studies of the CHQ reported on the physical scale and psychosocial scales separately. The two studies reporting on validity of the CHQ found consistently higher correlations between the physical component on all measures, from physician and parent/patient global assessments to articular indices and functional status.63,64 While the CHQ was found to differentiate healthy children from those with JIA, we did not find any results indicating discriminate validity to accurately classify children with JIA by the extent of their disease.62

The PedsQL and PedsQL-RM have been studied in the general pediatric rheumatology populations, but the only study focusing on JIA evaluated correlations of both instruments with pain assessments. Child-reported pain assessments correlated with all subscales of the PedsQL and PedsQL-RM, and parent pain assessments correlated with three of four subscales for both instruments.65

Responsiveness

Responsiveness is determined by two properties: reproducibility and the ability to register changes in scores when a patient's symptom status shows clinically important improvement or deterioration. Although there is no universally recommended measure of responsiveness, most indices rely on calculation of an effect size. The effect size is a unit-free index that uses the mean change score in the numerator and a measure of variability in the denominator. The standardized response mean (SRM)79 and the responsiveness index80,81 are particularly useful approaches to calculating effect sizes for this application because they incorporate information about the response variance into the denominator. According to Cohen and colleagues,82 an effect size of 0.2 to 0.3 is considered a small effect, around 0.5 (0.4 to 0.7) a medium effect, and 0.8 or above a large effect. Deyo and others argue that the issue is not just sensitivity to change, but the ability to discriminate between those who improve and those who do not.80,83 Receiver operating characteristic (ROC) curves are proposed as an approach for describing how well various changes in scale scores can distinguish between improved and unimproved patients. This approach requires a valid reference standard to make these clinical classifications.

Responsiveness was assessed in nine studies (Table 11). The responsiveness of the CHAQ was assessed in six studies.46,56,68-71 The results of the six studies were quite variable, with effect sizes ranging from 0 to 0.5. The two studies evaluating responsiveness in oligoarticular populations found the CHAQ was less responsive in patients with oligoarticular disease compared to polyarticular disease, with SRM of 0 to 0.25 for oligoarticular and 0.48 to 0.6 for polyarticular populations.46,56,68-70 This difference in responsiveness by disease category was seen even when the same definition of improvement was used.56,69

Table 11. Responsiveness.

Table 11

Responsiveness.

Three studies reported on the responsiveness of the global assessment measures and joint count indices. The most responsive measure was the PGA, with a large effect size, 1.59 (95 percent CI 1.0 to 2.32).68-70 However, in two of these studies, the patients' initial designation as improved or not improved was based on the physician's assessment, either as a categorical assessment on a 5-point scale for the first study,70 or by a definition of flare based on the addition or escalation of therapy in the second.68 Swollen joint count and active joint count were also found to have moderate to high responsiveness (effect sizes 1.3 and 0.7, respectively) and may be appropriate alternative measures.69

The responsiveness of the CHQ was formally evaluated in two studies, both of which demonstrated poor overall responsiveness, with an SRM of 0.23 and an effect size of 0.18 to 0.23.64,70 However, in the study that reported responsiveness separately based on disease state, the responsiveness was high in those designated as improved, at 0.96., indicating that the CHQ is sensitive to improvement, but the SRM was lower (-0.60) in those with worsening disease. 64

The minimum clinically important difference (MCID) was evaluated for the CHAQ in two studies. The MCID helps clinicians interpret study results by estimating the amount of change on an instrument that is associated with a clinically meaningful change in the patient's status. The first study explored the question of minimal clinically important change using a theoretical scenario, and found a mean MCID for improvement of -0.13 in the CHAQ, and 0.75 for worsening.50 The second study evaluated MCID in a JIA population and found that results differed by which external standard of disease was used, patient, parent, or physician assessment of disease. The mean MCID for improvement was -0.188 to 0 compared to child ratings, and 0 for parent and physician ratings.49 The authors concluded that changes in a patient's condition did not correlate well with the CHAQ, and therefore that the CHAQ is unlikely to be to a useful tool when making short-term medical decisions.

The ability of the various outcome measures to differentiate those who improved from those who did not was assessed using ROC curves. In general, ROC curves of 0.5 indicate the measure is no better than chance in discriminating between those who improved compared to those who worsened, while values closer to 1 indicate better discrimination. One study reported on ROC curves for our instruments of interest. The most discriminate measure of the instruments we examined was the physician global assessment, with a ROC curve of 0.86 (95 percent CI 0.72 to 0.95), compared to the parent global assessment value of 0.63 (0.46 to 0.78) and the CHAQ value of 0.56 (0.41 to 0.71).70

Composite Definitions of Disease Status or Response to Therapy

Because JIA is a complex disorder, several composite definitions have been developed to categorize disease status or response to therapy. We describe these briefly below

American College of Rheumatology Pediatric Response Criteria (ACR Pediatric 30)

The ACR Pediatric 30 response criteria is based on a core set of six variables: (1) physician global assessment of disease activity; (2) parent/patient global assessment of overall well-being; (3) measure of functional ability (CHAQ or JAFAS); (4) number of joints with active arthritis; (5) number of joints with limited range of motion; and (6) ESR.76-78 This measure is scored on a relative scale, based on percent improvement or worsening, and was developed to assess response to therapy in clinical trials. The initial response criteria were developed using a combination of statistical and consensus formation techniques.77 For each of the 240 definitions of improvement considered, the sensitivity and specificity were calculated using the physicians' consensus rating of improvement as the reference standard. Nine of the definitions with a sensitivity and specificity greater than 80 percent were retained, including the ACR Pediatric 30, which was rated highest based on sensitivity, specificity, measures of agreement, and face validity. The ACR Pediatric 30 is defined as 30 percent or more improvement in three of the six variables, with no more than one variable worsening by more than 30 percent. Similar definitions exist for ACR Pediatric 20, 50, 70, and 90, with the exception of requiring greater percentages of improvement, with no more than one variable worsening by 30 percent or more. These scores provide a relative measure of response, but not current disease state.

Juvenile Arthritis Disease Activity Score (JADAS)

The JADAS is a recently developed composite instrument designed to better characterize disease activity in JIA patients. It consists of four measures: (1) physician global assessment of disease activity (10 cm VAS); (2) parent/patient global assessment of overall well-being (10 cm VAS); (3) number of joints with active arthritis; and (4) ESR. While these measures are also included in the ACR Pediatric 30, 50 and 70 core set, the JADAS excludes the measures for “functional assessment” and “number of joints with limited range of motion,” as they were considered to reflect disease damage rather than just disease activity. Furthermore, the JADAS aims to quantify the absolute level of disease activity, rather than relative improvement, as measured by the ACR Pediatric response criteria. While initial validation studies have been performed,84 it is unclear how fully this outcome measure will be adopted in future studies, though its ability to characterize a patient's absolute response to therapy, as well as to describe differences in disease activity between groups of patients, is promising.

Remission

A consensus-based definition of “remission” identifies three categories: inactive disease, remission on medications, and remission off medications.43,44 A Delphi serial questionnaire consensus-formation approach was used to draft the criteria. The criteria for inactive disease include no active arthritis; no fever, rash, splenomegaly, serositis, or generalized lymphadenopathy attributable to JIA; a normal ESR or C-reactive protein; and the best possible score on the physician global assessment of disease activity. In addition, the definition of inactive disease requires there to be no active uveitis. Children with 6 continuous months of inactive disease, as defined above, on medication meet the definition for clinical remission on medication, while 12 months of inactive disease off antirheumatic medications defines clinical remission off medication.43,44 While these definitions have been applied retrospectively to JIA populations, further validations studies are underway.

Flare

A preliminary definition of flare was derived from a cohort of patients with polyarticular JIA using the six core response variables as defined in the ACR Pediatric.26,45 The authors defined the standard of flare as treatment with placebo and then examined various definitions of flare based on receiver-operator characteristics. All 25 in the etanercept arm were presumed not to flare; therefore, the specificity of the flare definition equals the number without relapse by the candidate definition divided by the total in the etanercept group. Based on this methodology, a flare was defined as a 40 percent worsening in two of six core set items without improvement in more than one core set variable by 30 percent. This study was based on 51 children, and further validation studies are needed.

Minimal Disease Activity

The authors who defined minimal disease activity (MDA) developed the definition in acknowledgement that many children with JIA do not achieve full remission with current treatments, and that a more reasonable goal for treatment might be minimally active disease.85 They therefore reviewed patient visits where changes in therapy were initiated verse visits where no change was made or medication was discontinued. They examined measures of disease activity at those visits and established cutoff values that best identified states of MDA. Their results defined MDA as a physician global assessment of < 2.5 cm and swollen joint count of 0 for oligoarticular disease; and a physician global assessment of < 3.4 cm, parent global assessment < 2.1 cm, and a swollen joint count of < 1 for polyarticular disease.86 Validation studies are needed.

Views

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...