U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Chou R, Deyo R, Friedly J, et al. Noninvasive Treatments for Low Back Pain [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2016 Feb. (Comparative Effectiveness Reviews, No. 169.)

Cover of Noninvasive Treatments for Low Back Pain

Noninvasive Treatments for Low Back Pain [Internet].

Show details

Methods

The methods for this Comparative Effectiveness Review (CER) follow the guidance in the Agency for Healthcare Research and Quality (AHRQ) “Methods Guide for Effectiveness and Comparative Effectiveness Reviews.”19

Topic Refinement and Review Protocol

This topic was nominated to AHRQ for a CER through a public process. The Scientific Resource Center developed preliminary Key Questions based on input from the topic nominator. An Evidence-based Practice Center further revised the Key Questions and defined the populations, interventions, comparators, outcomes, timing, and study designs (PICOTS) of interest with input from a group of Key Informants assembled for this purpose. Key Informants disclosed financial and other conflicts of interest prior to participation. The AHRQ Task Order Officer and the investigators reviewed the disclosures and determined that the Key Informants had no conflicts of interest that precluded participation. The provisional Key Questions, PICOTS, and analytic framework were posted on the AHRQ Web site for public comment from December 17, 2013, through January 17, 2014.

After reviewing public comments, the research team at our Evidence-based Practice Center developed the final protocol with input from AHRQ and a Technical Expert Panel (TEP) convened for this report. The TEP consisted of 14 members with expertise in primary care, pain medicine, behavioral sciences, physical medicine and rehabilitation, complementary and alternative therapies, physical therapy, occupational medicine, and pharmacology. TEP members disclosed financial and other conflicts of interest prior to participation. The AHRQ Task Order Officer and the investigators reviewed the disclosures and determined that the Key Informants had no conflicts of interest that precluded participation. Some changes were made in response to public comments. The PICOTS were revised to include tai chi as an intervention and the time between back pain episodes was added as an outcome. The Key Questions and PICOTS were also revised to include combinations of therapies as interventions and comparators. We made additional wording edits to the Key Questions to clarify inclusion of oral and topical pharmacological therapies and to group the nonpharmacological noninvasive therapies into related categories (e.g., exercise and related interventions, complementary and alternative therapies, psychological therapies, and physical modalities). We also revised the PICOTS to be clearer that the population included patients with acute, subacute, or chronic low back pain, and added self-directed care to the setting description.

The final version of the protocol for this CER was posted on the AHRQ Effective Health Care Program web site (www.effectivehealthcare.ahrq.gov) on October 9, 2014. The protocol was registered in the PROSPERO international database of prospectively registered systematic reviews.

Literature Search Strategy

A research librarian conducted searches in Ovid MEDLINE, the Cochrane Central Register of Controlled Trials and the Cochrane Database of Systematic Reviews, through August 2014 (see Appendix A for full search strategies). We restricted search start dates to January 2008 because searches in the prior APS/ACP review, were conducted through October 2008; the APS/ACP review was used to identify studies published prior to 2008.20 For interventions (electrical muscle stimulation, taping, tai chi) not addressed in the APS/ACP review, we searched the same databases without a search date start restriction.

We also hand-searched the reference lists of relevant studies and searched for unpublished studies in ClinicalTrials.gov.

We conducted an update search in April 2015 using the same search strategy as in the original search.

Study Selection

We developed criteria for inclusion and exclusion of studies based on the Key Questions and PICOTS, in accordance with the AHRQ Methods Guide.21 Inclusion and exclusion criteria are summarized below and described in more detail in Appendix B. Abstracts were reviewed by two investigators, and all citations deemed potentially appropriate for inclusion by at least one of the reviewers was retrieved. Two investigators then independently reviewed all full-text articles for final inclusion. Discrepancies were resolved by discussion and consensus. A list of the included studies can be found in Appendix C; excluded studies and primary reason for exclusion can be found in Appendix D.

Population and Condition of Interest

This report focuses on adults with low back pain of any duration (categorized as acute [<4 weeks], subacute [4 to 12 weeks], and chronic [≥12 weeks]), including nonradicular and radicular low back pain. Radicular pain was defined as back pain with leg pain, with or without sensory or motor deficits in a nerve root distribution; radicular pain could be based on clinical presentation or require imaging correlation (e.g., due to herniated disc or spinal stenosis). Patients with nonradicular low back pain could have nonspecific imaging findings such as degenerative disc disease, bulging intervertebral disc, or facet joint arthropathy. Patients with low back pain due to cancer, infection, inflammatory arthropathy, high velocity trauma, fracture, low back pain during pregnancy, and low back pain associated with severe or progressive neurological deficits were excluded.

Interventions and Comparisons

We included pharmacologic and noninvasive, nonpharmacological therapies for low back pain. Pharmacological therapies were restricted to those administered orally or topically; we evaluated nonsteroidal anti-inflammatory drugs, acetaminophen, opioids, tramadol and tapentadol, antidepressants, skeletal muscle relaxants, benzodiazepines, corticosteroids, anti-epileptic medications, capsaicin, and lidocaine. We excluded studies of medications administered intravenously but included studies of medications administered intramuscularly. Nonpharmacological therapies were multidisciplinary rehabilitation (also known as interdisciplinary rehabilitation, which we defined as a coordinated program with both physical and biopsychosocial treatment components (e.g., exercise therapy and cognitive behavioral therapy) provided by professionals from at least two different specialties; psychological therapies; exercise and related interventions (e.g., yoga and tai chi); complementary and alternative therapies (spinal manipulation, acupuncture, and massage); passive physical modalities (heat, cold, ultrasound, transcutaneous electrical nerve stimulation [TENS], electrical muscle stimulation [EMS], interferential therapy [IFT], short-wave diathermy, low level laser therapy [LLLT], and lumbar supports or braces); and taping. Although we placed nonpharmacological interventions into broad groupings for the purpose of organizing the report, this was not meant to imply that they are associated with similar effectiveness or necessarily based on similar mechanisms of action, and the benefits and harms of each intervention was evaluated separately. Interventional therapies involving injections to the spine, ablative therapies, and surgical therapies were excluded. For opioids, we excluded the drug propoxyphene, a weak analgesic associated with risk of cardiac arrhythmia which is no longer available in the United States or Europe. For skeletal muscle relaxants and benzodiazepines, we included drugs not available in the United States but available in Europe, but noted such instances.

Comparisons were of an included therapy versus placebo (drug trials), sham (functionally inert) treatments (nonpharmacological intervention), no treatment, wait list, or usual care (usually defined as care as typically provided at the discretion of the clinician, though components of usual care varied across studies and settings), as well as comparisons of one included therapy versus another. We also evaluated comparisons of the combination of one included therapy plus another included therapy, versus one of the therapies alone. We excluded comparisons involving multicomponent therapy that did not meet the definition for multidisciplinary rehabilitation and did not compare the effects of the multicomponent therapy versus individual components, because it is not possible to determine the incremental benefits of multicomponent therapy over its individual components from such comparisons.

Outcomes, Timing, and Setting

We evaluated effects of interventions on reduction or elimination of low back pain, including related leg symptoms, improvement in back-specific and overall function, improvement in health-related quality of life (HRQOL), reduction in work disability/return to work, global improvement, number of back pain episodes or time between episodes, and patient satisfaction. Of these outcomes, pain and function were the most consistently reported, and we designated them as priority outcomes for the purpose of reporting results. We also evaluated adverse effects, including serious adverse events (e.g., anaphylaxis with medications, neurological complications, and death) and less serious adverse events. When possible, timing of outcomes was stratified as long term (at least 1 year) and short term (up to 6 months); we also noted outcomes assessed immediately after the completion of a course of treatment. We included studies conducted in inpatient or outpatient settings.

Study Designs

Given the large number of interventions and comparisons addressed in this review, we included systematic reviews of randomized trials.22, 23 For each intervention, we selected the systematic review that was the most relevant to our Key Questions and scope (as defined in the PICOTS), had the most recent search dates, and was of highest quality based on assessments using the AMSTAR tool.24 We included nonoverlapping reviews of the same intervention that addressed specific outcomes, populations, or interventions, and in some cases included more than one overlapping review that was similar in terms of search dates and quality, if we could not identify a single best “match.” If good-quality systematic reviews were not available, we included fair-quality systematic reviews only if we could address the methodological shortcomings of the review (e.g., if a review reported overall risk of bias of studies but did not report details regarding specific methodological shortcomings, we assessed the risk of bias in the primary studies ourselves). We preferentially selected good-quality systematic reviews that were more comprehensive (e.g., a systematic review on exercise therapy in general, versus a specific type of exercise therapy) or were updates of reviews included in the APS/ACP review. We compared the results of our report with the findings from systematic reviews that were not included in the discussion.

We supplemented systematic reviews with randomized trials that were not included in the reviews. We did not include systematic reviews identified in update searches, but checked reference lists for additional randomized trials. For harms, we included cohort studies for interventions and comparisons when randomized trials were sparse or unavailable. We excluded case-control studies, case reports, and case series.

We only included non-English language articles included in English-language systematic reviews. We noted English language abstracts of non-English language articles to identify studies that would otherwise meet inclusion criteria, in order to help assess for the likelihood of language bias. Studies only published as conference abstracts were excluded, but we noted studies published only as abstracts that otherwise met inclusion criteria, to help assess for potential publication bias.

Data Extraction and Data Management

For systematic reviews we abstracted the following data: inclusion criteria, search strategy, databases searched, search dates, the number of included studies, study characteristics of included studies (e.g., sample sizes, interventions, duration of treatment, duration of followup, comparison, and results), methods of quality assessment, quality ratings for included studies, methods for synthesis, and results.

We did not abstract data for primary studies included in systematic reviews. Rather, we relied on the information provided in the review. For primary studies not included in systematic reviews, we abstracted the following data: study design, year, setting, country, sample size, eligibility criteria, population and clinical characteristics, intervention characteristics, and results. Information relevant for assessing applicability was also abstracted, including the characteristics of the population, interventions, and care settings; the use of run-in or washout periods, and the number of patients enrolled relative to the number assessed for eligibility.

All study data were verified for accuracy and completeness by a second team member. See Appendix E for evidence tables with extracted data.

Assessing Methodological Quality of Individual Studies

Two investigators independently assessed quality (risk of bias) of systematic reviews and primary studies not included in systematic reviews using predefined criteria, with disagreements resolved by consensus. Randomized trials were evaluated using criteria and methods developed by the Cochrane Back Review Group25 and cohort studies were evaluated using criteria developed by the US Preventive Services Task Force.26 Systematic reviews were assessed using the AMSTAR quality rating instrument.23 These criteria and methods were used in conjunction with the approach recommended in AHRQ Methods Guide.22 Studies were rated as “good,” “fair,” or “poor.” We re-reviewed the quality ratings of studies included in the prior American Pain Society review to ensure consistency in quality assessment.24

For primary studies included in systematic reviews, we relied on the quality ratings or risk of bias assessments as performed in the systematic reviews, as long as they used a standardized method for assessing quality (e.g., Cochrane Back Review Group, Cochrane Risk of Bias tool, PEDro tool). We used the overall grade (e.g., good, fair, or poor; or high or low) as presented in the systematic review, and provided details about the methods used to categorize studies (e.g., “higher quality” defined as meeting more than 6 of 11 Cochrane Back Review Group criteria). If we were uncertain about the methods used to assess risk of bias, or quality, we assessed the quality of individual studies ourselves, using the methods described above. In some cases, we supplemented the quality ratings from the reviews with additional methodological considerations.

Primary studies rated “good” are considered to have the least risk of bias, and their results are generally considered valid. Good-quality studies use valid methods to select patients for inclusion and allocate patients to treatment; report similar baseline characteristics in different treatment groups; clearly report attrition and have low attrition; use appropriate methods to reduce performance bias (e.g., blinding of patients, care providers, and outcome assessors), and use appropriate analytic methods (e.g., intention-to-treat analysis; for cohort studies, adjustment for potential confounders).

Studies rated “fair” are susceptible to some bias, though not enough to necessarily invalidate the results. These studies may not meet all the criteria for a rating of good quality, but no flaw is likely to cause major bias. The study may also be missing information, making it difficult to assess limitations and potential problems. The fair-quality category is broad, and studies with this rating will vary in their strengths and weaknesses. The results of some fair-quality studies are likely to be valid, while others may be only possibly valid.

Studies rated “poor” have significant flaws that imply biases of various types that may invalidate the results. They have a serious or “fatal” flaw in design, analysis, or reporting, such as inadequate methods for allocating patients to treatment; large amounts of missing information; discrepancies in reporting; or serious problems in the delivery of the intervention. The results of these studies are at least as likely to reflect flaws in the study design as the true difference between the compared interventions. We did not exclude studies rated poor quality a priori, but such studies were considered to be less reliable than higher-quality studies when synthesizing the evidence, particularly when discrepancies among studies were present.

For systematic reviews that classified studies as “higher” versus “lower” quality, we considered “higher” to incorporate good-quality and better fair-quality studies, and “lower” to include poor-quality studies and fair-quality studies with more methodological shortcomings.

Systematic reviews rated “good” had to use of multiple sources in the literature search, apply predefined inclusion and exclusion criteria, assess quality using an appropriate tool, use methods to reduce errors in data abstraction and quality rating (e.g., multiple independent reviewers), use appropriate methods for evidence synthesis (qualitative or quantitative), and use an explicit system for considering the body of evidence that includes the major domains of strength of evidence (risk of bias, consistency, precision, and directness). As noted above, we included systematic reviews that had shortcomings in one or more of these areas only if we could address the shortcomings (e.g., by assessing quality of the primary studies ourselves or independently determining strength of evidence from the information provided in the review).

For further details about the quality of included studies see Appendix F.

Assessing Applicability

We recorded factors important for understanding the applicability of studies, such as whether the publication adequately described the study sample, the country in which the study was conducted, the characteristics of the patient sample (e.g., age, sex, race, duration and severity of pain, presence of radicular symptoms, medical comorbidities, and psychosocial factors), the characteristics of the interventions used (e.g., specific intervention, dose or intensity, duration of treatment), the clinical setting (e.g., primary care or specialty setting), and the magnitude of effects on clinical outcomes, as well as timing of assessments.27 We classified the magnitude of effects for pain and function using the same system as in the APS/ACP review.14, 28 A small/slight effect was defined for pain as a mean between-group difference following treatment of 5 to 10 points on a 0- to 100-point visual analogue scale (VAS), 0.5 to 1.0 points on a 0- to 10-point numerical rating scale, or equivalent; for function as a mean difference of 5- to 10-point difference on the 0- to 100-point Oswestry Disability Index (ODI) or 1 to 2 points on the 0- to 24-point Roland-Morris Disability Questionnaire (RDQ), or equivalent; and for any outcome as a standardized mean difference (SMD) of 0.2 to 0.5. A moderate effect was defined for pain as a mean difference of 10 to 20 points on a 0- to 100-point VAS, for function as a mean difference of 10 to 20 points on the ODI or 2 to 5 points on the RDQ, and for any outcome as an SMD of 0.5 to 0.8. Large/substantial effects were defined as greater than moderate. Proposed thresholds for minimum clinically important changes in studies of low back pain are 15 on a 0- to 100-point visual analogue pain scale, 5 points on the RDQ, or 10 for the ODI, roughly correlating with the “moderate” classification.28 However, the clinical relevance of effects classified as small/slight might vary for individual patients depending on preferences, baseline symptom severity, harms, cost, and other factors. We also recorded the funding source and role of the sponsor.

Applicability depends on the particular question and the needs of the user of the review. There is no generally accepted universal rating system for applicability. In addition, applicability depends in part on context. Therefore, a rating of applicability (such as “high” or “low”) was not assigned because applicability may differ based on the user of this report.

Evidence Synthesis and Rating the Body of Evidence

We synthesized data qualitatively (see Grading the Strength of Evidence, below). Results are organized by Key Question and intervention, organized according to the duration of symptoms (acute, subacute, or chronic), type of low back pain (nonradicular or radicular low back pain), and type of comparison (e.g., versus placebo or sham, versus usual care, or versus another active intervention) with prioritized outcomes (pain, function) presented first. Synthesis was based on the totality of evidence (i.e., evidence included in the prior APS/ACP review plus new evidence). We synthesized results for continuous as well as dichotomous outcomes. We reported binary outcomes based on the proportion of patients achieving successful pain, function, or some composite overall measure of success as defined in the trials, which varied in how they categorized successful outcomes (e.g., >30% improvement in pain score vs. >50% improvement vs. “good” or “excellent” outcomes on a categorical scale). See Appendix G for descriptions of the outcome measures used in the included studies.

In addition, we reported meta-analysis from systematic reviews that reported pooled estimates from studies that were judged to be homogeneous enough to provide a meaningful combined estimate and used appropriate pooling methods (e.g., random effects model in the presence of statistical heterogeneity). When statistical heterogeneity was present, we examined the type of inconsistency present (e.g., did some trials find that an intervention was more effective than placebo and other no effect, or did most trials find that the intervention was more effective, but varied in the strength of the estimate) and evaluated subgroup and sensitivity analyses based on study characteristics, intervention factors, and patient factors.

We did not conduct updated meta-analysis with new studies. Rather, we qualitatively examined whether results of new studies were consistent with pooled or qualitative findings from prior systematic reviews.

When we included more than one systematic review for a particular intervention and comparison, we evaluated the consistency of results among reviews. When findings among reviews were discordant, we evaluated potential sources of discordance, such as differential inclusion of studies, differences in ratings for risk of bias, or differences in methods used to synthesize evidence.

Grading the Strength of Evidence for Each Key Question

We assessed the strength of evidence for each Key Question and outcome using the approach described in the AHRQ Methods Guide,21 based on the overall quality of each body of evidence, the quality (graded good, fair, or poor); the consistency of results across studies (graded consistent, inconsistent, or unable to determine when only one study was available); the directness of the evidence linking the intervention and health outcomes (graded direct or indirect); the precision of the estimate of effect, based on the number and size of studies and confidence intervals (CI) for the estimates (graded precise or imprecise); and reporting bias (suspected of undetected). The strength of evidence was based on the totality of evidence (i.e., evidence in prior reviews as well as new evidence).

Assessments of reporting bias were based on whether studies defined and reported primary outcomes, identification of relevant unpublished studies, and when available, by comparing published results to results reported in trial registries.

We graded the strength of evidence for each Key Question using the four key categories recommended in the AHRQ Methods Guide.21 A “high” grade indicates high confidence that the evidence reflects the true effect and that further research is very unlikely to change our confidence in the estimate of effect. A “moderate” grade indicates moderate confidence that the evidence reflects the true effect and further research may change our confidence in the estimate of effect and may change the estimate. A “low” grade indicates low confidence that the evidence reflects the true effect and further research is likely to change the confidence in the estimate of effect and is likely to change the estimate. An “insufficient” grade indicates evidence either is unavailable or is too limited to permit any conclusion, due to the availability of only poor-quality studies, extreme inconsistency, or extreme imprecision.

See Appendix H for the strength of evidence table.

Peer Review and Public Commentary

Peer reviewers with expertise in primary care and back pain have been invited to provide written comments on the draft report. The AHRQ Task Order Officer and an Evidence-based Practice Center Associate Editor will also provide comments and editorial review. The draft report will be posted on the AHRQ Web site for 4 weeks for public comment. A disposition of comments report with authors' responses to the peer and public review comments will be posted after publication of the final CER on the public Web site.

Views

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...