Testimony of Peter Lurie, M.D., M.P.H., and Sylvia Park, M.D., M.P.H.
Public Citizen’s Health Research Group
before the Neurological Devices Panel of the
Medical Devices Advisory Committee Meeting on
Repetitive Transcranial Magnetic Stimulation, Duraseal and Vagus Nerve Stimulation
Although the three devices to be discussed at today’s meeting either treat different conditions or use significantly different mechanisms of action to treat the same condition, they all share one characteristic: their path to approval is indicative of the lax standards currently employed by the Center for Devices and Radiological Health (CDRH). We are also mystified as to why the FDA has only asked for a discussion of the issues related to rTMS, rather than a formal vote, and why some material provided on the FDA’s website for this device was redacted (e.g., Figure B, patient disposition, and Figure E1, sponsor summary of Study 01 and ECT results).
1. Repetitive Transcranial Magnetic Stimulation (rTMS)
The fundamental problem with this application dates to a meeting between Neuronetics, the manufacturer of rTMS, and the FDA, at which the FDA mistakenly allowed this application to proceed through the 510(k) shortcut, rather than through the more rigorous Premarket Approval (PMA) process. This allows a company to market a new device based on its similarity to an existing (predicate) device, in this case electroconvulsive therapy (ECT), rather than by directly proving that it is efficacious for the indicated condition. In order to qualify for 510(k), a device that has the same intended use as the predicate device and has different technological characteristics (both the case here) must 1. not raise new types of questions of safety and effectiveness; and 2. have a risk-benefit profile comparable to the predicate device.
Inasmuch as this device bears only limited resemblance to ECT (the differences, particularly in safety, are the basis for its attractiveness) and fundamental questions remain about how to administer rTMS (left vs. right side of the brain, high frequency vs. low, etc.), we cannot see how Criterion 1 is met. And since the company’s three principal studies do not even include one that compares rTMS to ECT, Criterion 2 also cannot be fulfilled. Indeed, we are perplexed as to why the principal study proffered in support of a device that would be approved based on its similarity to an existing device would compare the device to sham therapy rather than the existing device. It is no small irony, therefore, that the sponsor has been unable to even demonstrate that its device is superior to sham therapy.
This was a nine-week randomized, sham-controlled clinical trial designed to examine the safety and effectiveness of rTMS for subjects with major depression who had not benefited from oral antidepressants. The primary, prespecified effectiveness endpoint was the change in Montgomery Asberg Depression Rating Scale (MADRS) from baseline to Week 4. Secondary outcome measures included a large number of clinician- and patient-rated measures. Concomitant antidepressant medications were not permitted. This study is the primary basis for deciding on the approval of rTMS.
The following figure, taken directly from the FDA reviewer’s Executive Summary, depicts data for the 301 subjects (155 rTMS and 146 sham) with evaluable data. It indicates that the difference in least squares mean MADRS score relative to baseline between the rTMS group (-5.6 points) and the sham group (-3.5 points) was not statistically significant (p= 0.057). Furthermore, even this non-statistically significant difference in mean MADRS score between the two groups, 2.1 points on a 60-point scale, is hardly a benefit of great clinical significance.
Least squares mean MADRS score at baseline (0 weeks), 2 weeks and 4 weeks with 95% confidence intervals.
Not content with this clearly unfavorable outcome, Neuronetics embarked on a strategy of torturing the data till they confessed. Relying upon post hoc-defined (and in any case trivial) baseline differences between the treatment and sham groups, it removed six treatment and two sham patients to produce a statistically significant finding for the primary outcome variable. The focus then turned to the 26 secondary outcome variables, 13 of which provided statistically significant findings favoring treatment. The FDA reports that after a conservative correction for multiple comparisons, none of the 26 secondary endpoints were statistically significant.
Application site pain occurred at a much greater frequency in the rTMS group than it did in the sham group (35.8% vs. 3.8%, respectively), as did muscle twitching (20.6% vs. 3.2%), calling into question whether the patients in the rTMS group were really blinded to their treatment. The FDA requested that Neuronetics perform further analyses to examine the adequacy of the patient blind. There was a correlation between the presence and severity of “any pain/discomfort” and the mean change in MADRS scores (p= 0.034). The FDA reviewer stated, “This observation suggests that a placebo effect may have occurred during the Study 01.” In multivariate analyses adjusted for any pain/discomfort, the non-significant finding for the MADRS score became even less significant (p=0.057 to p=0.227) and two of the secondary outcomes switched from statistically significant to not significant. The reviewer concluded, “Based on these findings, the report of any pain in aggregate could account for some of the observed treatment effect.”
In sum, Study 01 did not demonstrate a statistically significant benefit for rTMS over sham therapy. To the extent that a (non-statistically significant) benefit was detected, this benefit is of a magnitude not likely to be clinically meaningful and is likely explained in part by patient unblinding.
This was a nine-week, open-label, uncontrolled clinical trial in which 131 subjects who did not respond to either rTMS or sham treatment during Study 01 were treated with rTMS. The primary effectiveness endpoint was the change in the total MADRS score from the end of Study 01 to the sixth week of Study 02. The sponsor reported substantial changes: drops of 12.5 points on the MADRS score for those who had not responded to rTMS in Study 01 and of 17.0 points for those who had “not responded” to sham treatment in that study. The fact that both groups’ MADRS scores dropped in both Studies 01 and 02 is partially explained by regression to the mean: the selection of patients with relapsing conditions who are selected into studies when they are near their worst clinically and are thus likely to improve, on average, with or without further treatment. In addition, the medical reviewer pointed out that even though the rTMS-treated patients in Study 01 had a decrease in MADRS score of 5.6 points, the non-responders in Study 01, when retreated in Study 02, had a decrease of 12.5 points. The reviewer concluded, “This may indicate a placebo effect in the open label study.” No efficacy conclusions can thus be drawn from this uncontrolled study.
This 24-week open-label, uncontrolled clinical trial evaluated the effectiveness of maintenance oral antidepressant monotherapy among responders to any arm of Studies 01 and 02. Among those who had responded to rTMS in Study 01, 36.4% required reintroduction of the device by Week 24. However, the study was still ongoing at the time of application, so the data reported are interim results and lack information on concomitant medication use.
Whatever the deficiencies of its studies, we do wish to compliment Neuronetics for providing us with a handy acronym for explaining, in part, the apparent improvements observed in rTMS-treated patients. To paraphrase James Carville, it’s Regression To the Mean, Stupid.
Historical Comparisons to ECT
Neuronetics is seeking approval for its rTMS system by claiming it is substantially equivalent to ECT. Yet the sponsor’s main studies do not include a head-to-head comparison with ECT that would permit an actual weighing of risks and benefits. Instead, Neuronetics relies on a historical data on ECT to serve as its control, an approach that would never be countenanced as the basis for approval in the FDA’s Center for Drug Evaluation and Research (CDER). Historical control data are plagued by the possible existence of differences between the compared groups in patient demographics, disease severity, and concomitant therapies; study design; and secular trends in treatment practices, among others. For example, over 50 percent of subjects in Study 01 had failed therapy with only a single antidepressant in their current episode. It is likely that those exposed to ECT are considerably more treatment-resistant than that. We are therefore unable to make any reliable assessment of the comparative risk-benefit profile between the two treatments based on these data.
However, the data that Neuronetics do provide inspire little confidence that rTMS approaches the efficacy of ECT. The standardized effect size for ECT vs. sham therapy in the six studies comprising the UK ECT Review Group report was -0.91, whereas the standardized effect sizes for the MADRS, 24-item Hamilton Depression Rating Scale (HAMD24) and HAMD17 in Study 01 were -0.39, -0.48, and -0.55, respectively. Moreover, the studies comprising the UK report most commonly used the HAMD17 as their primary outcome measure and in that review ECT reduced this index by 9.7 points compared to sham treatment; in Study 01, rTMS reduced HAMD17 scores by a meager 1.9 points on a 54-point scale compared to sham treatment. Incidentally, the studies in the UK report were conducted between 1963 and 1981; many refinements/improvements in the administration of ECT have occurred since then.
Thus, the only adequately controlled study submitted is one that fails to establish that rTMS is superior to sham treatment. How, then, can one conclude that it is “substantially equivalent” to ECT, as the law requires? The FDA has told the company that rTMS could be considered substantially equivalent to ECT if rTMS is not as effective as ECT but has commensurately lower toxicity (i.e, its risk-benefit ratio would be similar). But, if this device is approved based on the current data, the 510(k) standard will literally have been diluted to the point that if a device can demonstrate that it is less toxic than a predicate device (using historical controls no less), it can be declared “substantially equivalent” to the predicate device, and thus approvable, even if it is no better than nothing. We hope that the committee would not endorse this outcome.
2. Duraseal Dura Mater Sealant
The preliminary data from the post-approval randomized, controlled trial of Duraseal do not yet permit assessment of whether this device is associated with increased rates of cerebrospinal fluid leaks or infections, the concerns raised in the uncontrolled trial which was the basis for the device’s approval on April 7, 2005. But they do show that a randomized, controlled trial, which could have answered these questions definitively, could have been done prior to approval. As Willie Nelson once said, “It’s Not Supposed to be That Way” (Phases and Stages, Atlantic Records, 1974). The regulatory history of this device is literally upside-down with a randomized, controlled trial being used post-approval to support an approval decision based on uncontrolled data. In the meantime, patients are being exposed to this inadequately tested device.
3. Vagus Nerve Stimulation (VNS)
The postapproval data on VNS presented to the committee today are largely irrelevant to the basic issue surrounding VNS: Does the device actually work? The very need for a study comparing different amounts of electrical charge indicates just how inadequate the data in the approval package were. But, because the registry, whose early findings are being presented today, is uncontrolled and unblinded, this post-approval study will continue to leave VNS’ efficacy unresolved.
The history of the approval of this device remains an embarrassment to the FDA and to this committee specifically. As documented in the Senate Finance Committee’s February 2006 report on the approval process for VNS, the CDRH director, who typically does not make device approval decisions, overruled at least 20 staff members who recommended against approval of this device on the grounds that efficacy had not been demonstrated. Not one staff member recommended approval. This committee came in for specific criticism in that report when the committee’s Executive Secretary described the committee’s June 15, 2004, meeting on VNS as “very unusual, emotional, not data driven.”
Public Citizen opposed the approval of VNS on the grounds that the device had not been proved to work. We have since petitioned the FDA to reverse its approval and have also asked the Centers for Medicare & Medicaid Services to deny Cyberonics’ application for a favorable National Coverage Determination. We expect they will. As of September 6, 2006, 10 individual CMS contractors in 19 separate applications had turned down the company’s application for a favorable Local Coverage Determination. None had issued a favorable Determination.
Incidentally, both the VNS approval and the rTMS application indicate the existence of a dangerous double-standard within the FDA. Whereas CDER requires randomized, placebo-controlled trials to approve antidepressants, the lax approval standard for devices (“reasonable assurance that the device is safe and effective” for devices compared to “substantial evidence of effectiveness for the claimed indications” for drugs) has been interpreted by CDRH to allow liberal use of historical controls. At least for devices that make a disease claim (e.g., “treats depression”), it defies logic and endangers patients to have a lower approval standard for a device than a drug making a similar claim.