Basis for the Interpretation of Noninferiority Studies: Considering the ROCKET–AF, RE-LY, and ARISTOTLE Studies

doi:10.1016/j.rec.2013.10.018

INTRODUCTION

Noninferiority randomized clinical trials (RCTs) are usually performed when an experimental treatment is not expected to be more effective than the standard treatment but offers additional benefits. These advantages could consist of a better safety profile, fewer adverse effects, easier administration, less need for laboratory monitoring, or a lower overall cost.1 Noninferiority RCTs vs warfarin include the RE-LY (Randomized Evaluation of Long-Term Anticoagulation Therapy), ROCKET-AF (Rivaroxaban Once Daily Oral Direct Factor Xa Inhibition Compared With Vitamin K Antagonism for Prevention of Stroke and Embolism Trial in Atrial Fibrillation), and ARISTOTLE (Apixaban for Reduction in Stroke and Other Thromboembolic Events in Atrial Fibrillation) studies. This article will review various concepts that are relevant for the interpretation of these studies and all noninferiority RCTs.

WHAT ARE NONINFERIORITY CLINICAL TRIALS?

In RCTs, attempts are made to answer research questions with a reasonable degree of certainty. Whereas superiority RCTs aim to determine whether a new treatment is superior to the best available treatment, noninferiority RCTs concentrate on showing that the new treatment is not inferior to the standard one. Thus, the nature of the research question and the possible answers are different. In the specific case of the RE-LY, ROCKET-AF, and ARISTOTLE RCTs, the initial question of interest was: is the new treatment at least as effective as warfarin in reducing thromboembolic events? The 2 possible answers, mutually exclusive and in the form of hypotheses, are as follows:

•
H0 (the null hypothesis): the new treatment is less effective than vitamin K antagonists in reducing thromboembolic events (it is inferior).
•
H1 (the alternative hypothesis): the new treatment is at least as effective as vitamin K antagonists in reducing thromboembolic events (it is not inferior).

Adoption of the H0 or H1 “answer” as true involves a decision rule based on the statistical significance of the P value. However, the P value that is calculated in noninferiority RCTs is special, and is called the P value for noninferiority. Let us suppose that the rate of thromboembolic events with the new treatment is lower than with warfarin at a P value for noninferiority of less than .001. In this case, the alternative hypothesis H1 is accepted, because if the new treatment was actually inferior to vitamin K antagonists, obtaining this result would have been as unlikely as P < .001.

In noninferiority RCTs, what is considered “at least as effective as” or “not inferior to” the conventional treatment must be defined a priori. Accordingly, a minimum noninferiority margin or threshold has to be selected. Noninferiority RCTs aim to show that the effect of the experimental treatment is not inferior to that of the standard treatment to “a certain extent”, which is termed the noninferiority threshold or noninferiority margin or delta (δ). The δ value represents the maximum difference tolerated between the effect of the control and the experimental treatment, favoring the former, for the experimental treatment to still be considered noninferior to the control.2

HOW IS THE MINUMUM NONINFERIORITY THRESHOLD δ CHOSEN?

The noninferiority margin δ has to be selected according to the best evidence available on the efficacy of the standard treatment compared with placebo,2 taking into account the degree of certainty or uncertainty that is applied to the effect of the standard treatment, which tends to be conservative. The noninferiority margin cannot exceed the smallest effect that the standard treatment, in our case warfarin, could plausibly have vs placebo.

In the case of the new anticoagulants, the minimum noninferiority threshold was selected based on a meta-analysis published in 1999, which quantified the effect of warfarin on the prevention of thromboembolic events vs placebo or absence of treatment, at a relative risk (RR) of 0.38 (95% confidence interval [95%CI], 0.28-0.52).3 The procedure for selecting the threshold is as follows: first, the reference category is changed, as if the effect of the “placebo or absence of treatment” was being calculated with respect to that of warfarin. In our case, this effect would be the inverse of 0.38, which corresponds to an RR of 2.63 (95%CI, 1.92-3.57). The lower margin of this confidence interval (1.92) could be considered the minimum noninferiority threshold for the new anticoagulants. However, the regulatory agencies were more demanding, and chose a noninferiority threshold that assumes that warfarin has a hypothetical effect that is just 50% of its real effect. Accordingly, the minimum noninferiority threshold was set at 1.46, which means that, to conclude that the new treatment is not inferior to the standard one, the upper limit of the 95%CI of the effect of the new treatment compared with that of warfarin cannot exceed 1.46. The possible scenarios that could be obtained in comparisons between the new anticoagulants and warfarin are outlined in Figure 1.

Figure 1.

Various possible scenarios of the results of a noninferiority study.

a Noninferiority threshold.

b If the experimental treatment is shown to be superior, it is automatically demonstrated that it is not inferior.

(0.11MB).

DOES THE STATISTICAL ANALYSIS IN NONINFERIORITY RANDOMIZED CLINICAL TRIALS HAVE ANY SPECIAL FEATURES?

Statistical analysis of noninferiority RCTs generally follows a similar methodology to that of superiority RCTs, except that a noninferiority threshold-related P value for noninferiority is calculated, which differs from the P value for “superiority”. Thus, for example, the effect of the treatment on the primary outcome variable “stroke or systemic embolism” in the intention-to-treat (ITT) analysis of the ROCKET-AF trial had a hazard ratio (HR) of 0.88 (95%CI, 0.75-1.03). The P value for superiority was .12, whereas the P value for noninferiority was less than .001, which is unsurprising, given that the noninferiority threshold is to the right of the threshold for the absence of an effect.

An important point for the interpretation of the results of noninferiority RCTs involves the type of analysis performed: ITT analysis, per-protocol analysis, or safety analysis. In the ITT approach, all patients randomized to either treatment arm are analyzed, regardless of whether they actually received the treatment or if there were protocol violations. Safety analyses include patients that received at least one treatment dose, regardless of whether there were protocol violations. Per-protocol analyses include patients that received at least 1 treatment dose without detected protocol violations. Although the regulatory agencies consider ITT analysis to be obligatory in superiority RCTs, they prioritize the use of per-protocol or safety analysis in the investigation of noninferiority hypotheses.2 In noninferiority studies, ITT analysis is usually more “liberal”. In other words, the inclusion of patients with protocol violations or treatment interruptions tends to bias the results toward showing the absence of differences between the treatments, favoring the demonstration of noninferiority.

Finally, can both inferiority and superiority hypotheses be tested in a single RCT? Yes, as long as the α risk is controlled. The α risk value refers to the probability of rejecting H0 when it is actually true. The greater the number of hypotheses that are tested in a study with the same data, the greater the likelihood that a statistical association will be found by chance. Accordingly, to avoid false positives, the α error should be “distributed” among the hypotheses, meaning that the P value has to be lower to find statistically significant results. Appropriate adjustments to the α risk were made in the 3 studies, although the ROCKET-AF analysis had certain peculiarities.

DISTINCTIVE FEATURES OF THE ANALYSIS PLAN OF THE ROCKET-AF TRIAL

In the ROCKET-AF trial, ITT, safety, and per-protocol analyses were performed for the main outcome measure (stroke or peripheral embolism), with one peculiarity: both the per-protocol and the safety analyses included only the period when the patient was receiving the experimental drug or the placebo and until 48hours after discontinuation (the as-treated population). This point is important, as the survival analysis method considers the “time-to-event”, and not the proportion of events. Thus, if a patient received the study medication (experimental or placebo) for 730 days (2 years) and was subsequently followed up for a further 90 days (approximately 0.25 years), this patient would contribute 732 days (730 days on medication + 2 days after discontinuation ≈2 patient-years) to the per-protocol and safety analyses, but 820 days (≈2.25 patient-years) to the ITT analysis (Figure 2).

Figure 2.

Contribution to the survival analysis of a hypothetical patient in the ROCKET-AF study. Per-protocol analysis: 2 patient-years. Safety analysis: 2 patient-years. Intention-to-treat analysis: 2.25 patient-years.

(0.09MB).

The main noninferiority analysis of ROCKET-AF was performed in the per-protocol and ITT populations. Moreover, safety superiority analysis and various sensitivity analyses were performed to evaluate the noninferiority and superiority of the ITT population, with the appropriate α risk adjustments. Because the main hypothesis is of noninferiority, the appropriate principal analysis is that of the per-protocol population.2 On the other hand, as the international and national agencies acknowledge, analysis of the safety population is suitable for evaluating clinical efficacy, because this population excludes those patients that do not receive the experimental treatment or change to the control treatment. In the ROCKET-AF study, the superiority hypothesis was evaluated in these safety and ITT populations (for the primary outcome: safety population, HR = 0.79; 95CI%, 0.65-0.95; P for superiority = .02; ITT population, HR = 0.88; 95%CI, 0.75-1.03; P for superiority = .12).4,5

ARE THE RESULTS OF THE THREE TRIALS COMPARABLE?

Because the 3 trials assess the same noninferiority hypothesis and all use warfarin as control, it is tempting to compare their results. However, any comparison made among them would be an indirect comparison, and the differences in the study populations, the control intervention, and the design are potential sources of bias.6 Accordingly, there were differences in the risk of thromboembolism: the mean risk of thromboembolism measured by CHADS2 was 3.47 in ROCKET-AF compared with 2.1 in both RE-LY and ARISTOTLE. The higher risk of thromboembolism was largely due to the greater inclusion of patients with a history of stroke (55% in ROCKET-AF vs 20% in RE-LY and ARISTOTLE). Moreover, the mean time in therapeutic range of the international normalized ratio (INR) also differed considerably among the studies (55% in ROCKET-AF vs 65% and 62.2% in RE-LY and ARISTOTLE, respectively). Although various analytical techniques have been developed for indirect comparisons, such as network meta-analysis, adjusted indirect comparisons, and the Bucher method,7,8 direct comparison is the only trustworthy method for determining differences in efficacy between drugs. Thus, although some indirect comparisons between studies have been published,9 the differences mentioned in population type, controls, and study design make these comparisons subject to certain biases, and caution must be exercised in their interpretation.

CONCLUSIONS

The use of new oral anticoagulants certainly represents a significance advance in the prevention of thromboembolic phenomena in nonvalvular atrial fibrillation. The RCTs discussed here studied the efficacy and safety of the 3 new drugs, examining both noninferiority (as the primary hypothesis) and superiority hypotheses. Although there is an understandable eagerness to identify the most efficacious, effective, and efficient drug, appropriate direct comparisons are required to reliably obtain this information. It is likely that, as time passes and additional data are obtained from observational studies, the characteristics of the disease, the environment, and, above all, the patient (eg, comorbidities, hemorrhagic risk, psychosocial factors) will continue to be defined, enabling the establishment of the precise indications of each drug for each specific patient group.

CONFLICTS OF INTERESTS

The author has received honoraria for teaching courses and giving academic talks from Boehringer-Ingelheim, Bayer, and Pfizer.

ACKNOWLEDGMENTS

The author thanks Content Ed Net for assisting in the editing of the manuscript.

References

[1]

H.M. James Hung, S.J. Wang, Y. Tsong, J. Lawrence, R.T. O’Neil.

Some fundamental issues with non-inferiority testing in active controlled trials.

Stat Med, (2003), 22 pp. 213-225

http://dx.doi.org/10.1002/sim.1315 | Medline

[2]

R.B. D’Agostino Sr., J.M. Massaro, L.M. Sullivan.

Non-inferiority trials: design concepts and issues - the encounters of academic consultants in statistics.

Stat Med, (2003), 22 pp. 169-186

http://dx.doi.org/10.1002/sim.1425 | Medline

[3]

R.G. Hart, O. Benavente, R. McBride, L.A. Pearce.

Antithrombotic therapy to prevent stroke in patients with atrial fibrillation: a meta-analysis.

Ann Intern Med, (1999), pp. 131492-131501

[4]

National Institute for Health and Clinical Excellence (NICE). Rivaroxaban for the prevention of stroke and systemic embolism in people with atrial fibrillation [accessed 2012 May 21]. Available at: www.nice.org.uk/guidance/ta256

[5]

Rivaroxaban en la prevenció de l’ictus i l’embòlia sistèmica en pacients amb fibril·lació auricular no valvular i com a mínim un factor de risc. Barcelona: Agència d’Informació, Avaluació i Qualitat en Salut, Servei Català de la Salut, Departament de Salut, Generalitat de Catalunya; 2013.

[6]

Canadian Agency for Drugs and Technologies in Health. Indirect evidence: indirect treatment comparisons in meta-analysis [accessed 2012 May 21]. Available at: www.cadth.ca/media/pdf/H0462_itc_tr_e.pdf

[7]

F. Song, D.G. Altman, A.M. Glenny, J.J. Deeks.

Validity of indirect comparison for estimating efficacy of competing interventions: empirical evidence from published meta-analyses.

BMJ, (2003), 326 pp. 472

http://dx.doi.org/10.1136/bmj.326.7387.472 | Medline

[8]

H.C. Bucher, G.H. Guyatt, L.E. Griffith, S.D. Walter.

The results of direct and indirect treatment comparisons in meta-analysis of randomized controlled trials.

J Clin Epidemiol, (1997), 50 pp. 683-691

Medline

[9]

G.Y. Lip, T.B. Larsen, F. Skjøth, L.H. Rasmussen.

Indirect comparisons of new oral anticoagulant drugs for efficacy and safety when used for stroke prevention in atrial fibrillation.

J Am Coll Cardiol, (2012), 60 pp. 738-746

http://dx.doi.org/10.1016/j.jacc.2012.03.019 | Medline

Year/month	Html	Pdf	Total
2025 August	7	5	12
2025 July	327	34	361
2025 June	395	19	414
2025 May	322	36	358
2025 April	279	26	305
2025 March	330	20	350
2025 February	325	35	360
2025 January	720	57	777
2024 December	349	25	374
2024 November	306	35	341
2024 October	306	38	344
2024 September	233	13	246
2024 August	299	34	333
2024 July	279	29	308
2024 June	202	23	225
2024 May	226	28	254
2024 April	287	48	335
2024 March	272	43	315
2024 February	348	42	390
2024 January	433	55	488
2023 December	278	47	325
2023 November	304	59	363
2023 October	284	90	374
2023 September	246	44	290
2023 August	191	26	217
2023 July	262	58	320
2023 June	342	40	382
2023 May	319	34	353
2023 April	259	39	298
2023 March	322	41	363
2023 February	294	23	317
2023 January	268	53	321
2022 December	239	36	275
2022 November	282	38	320
2022 October	252	39	291
2022 September	244	60	304
2022 August	309	83	392
2022 July	294	329	623
2022 June	320	52	372
2022 May	345	59	404
2022 April	407	49	456
2022 March	364	60	424
2022 February	347	38	385
2022 January	332	54	386
2021 December	337	70	407
2021 November	357	44	401
2021 October	376	67	443
2021 September	305	59	364
2021 August	293	47	340
2021 July	345	60	405
2021 June	256	40	296
2021 May	282	94	376
2021 April	687	78	765
2021 March	398	64	462
2021 February	334	47	381
2021 January	351	61	412
2020 December	350	56	406
2020 November	403	31	434
2020 October	264	26	290
2020 September	265	43	308
2020 August	349	28	377
2020 July	284	21	305
2020 June	231	37	268
2020 May	303	50	353
2020 April	302	42	344
2020 March	326	52	378
2020 February	363	34	397
2020 January	345	66	411
2019 December	381	43	424
2019 November	839	70	909
2019 October	744	61	805
2019 September	526	69	595
2019 August	438	81	519
2019 July	557	118	675
2019 June	258	96	354
2019 May	224	61	285
2019 April	204	32	236
2019 March	296	29	325
2019 February	332	38	370
2019 January	384	23	407
2018 December	419	30	449
2018 November	440	30	470
2018 October	498	31	529
2018 September	249	19	268
2018 August	183	13	196
2018 July	176	32	208
2018 June	175	18	193
2018 May	172	27	199
2018 April	171	18	189
2018 March	186	14	200
2018 February	141	20	161
2018 January	142	8	150
2017 December	93	10	103
2017 November	121	23	144
2017 October	140	11	151
2017 September	111	16	127
2017 August	92	12	104
2017 July	79	11	90
2017 June	150	15	165
2017 May	156	20	176
2017 April	148	14	162
2017 March	215	34	249
2017 February	266	6	272
2017 January	166	15	181
2016 December	158	20	178
2016 November	256	16	272
2016 October	195	16	211
2016 September	325	32	357
2016 August	151	26	177
2016 July	126	26	152
2016 June	143	38	181
2016 May	136	43	179
2016 April	113	40	153
2016 March	118	47	165
2016 February	131	49	180
2016 January	98	51	149
2015 December	101	40	141
2015 November	105	54	159
2015 October	124	57	181
2015 September	112	40	152
2015 August	146	48	194
2015 July	149	40	189
2015 June	120	16	136
2015 May	113	19	132
2015 April	110	23	133
2015 March	136	59	195
2015 February	82	33	115
2015 January	77	16	93
2014 December	79	8	87
2014 November	63	17	80
2014 October	67	12	79
2014 September	76	17	93
2014 August	69	19	88
2014 July	57	21	78
2014 June	97	60	157
2014 May	113	36	149

Year/month	Html	Pdf	Total
2025 August	7	5	12
2025 July	327	34	361
2025 June	395	19	414
2025 May	322	36	358
2025 April	279	26	305
2025 March	330	20	350
2025 February	325	35	360
2025 January	720	57	777
2024 December	349	25	374
2024 November	306	35	341
2024 October	306	38	344
2024 September	233	13	246
2024 August	299	34	333
2024 July	279	29	308
2024 June	202	23	225
2024 May	226	28	254
2024 April	287	48	335
2024 March	272	43	315
2024 February	348	42	390
2024 January	433	55	488
2023 December	278	47	325
2023 November	304	59	363
2023 October	284	90	374
2023 September	246	44	290
2023 August	191	26	217
2023 July	262	58	320
2023 June	342	40	382
2023 May	319	34	353
2023 April	259	39	298
2023 March	322	41	363
2023 February	294	23	317
2023 January	268	53	321
2022 December	239	36	275
2022 November	282	38	320
2022 October	252	39	291
2022 September	244	60	304
2022 August	309	83	392
2022 July	294	329	623
2022 June	320	52	372
2022 May	345	59	404
2022 April	407	49	456
2022 March	364	60	424
2022 February	347	38	385
2022 January	332	54	386
2021 December	337	70	407
2021 November	357	44	401
2021 October	376	67	443
2021 September	305	59	364
2021 August	293	47	340
2021 July	345	60	405
2021 June	256	40	296
2021 May	282	94	376
2021 April	687	78	765
2021 March	398	64	462
2021 February	334	47	381
2021 January	351	61	412
2020 December	350	56	406
2020 November	403	31	434
2020 October	264	26	290
2020 September	265	43	308
2020 August	349	28	377
2020 July	284	21	305
2020 June	231	37	268
2020 May	303	50	353
2020 April	302	42	344
2020 March	326	52	378
2020 February	363	34	397
2020 January	345	66	411
2019 December	381	43	424
2019 November	839	70	909
2019 October	744	61	805
2019 September	526	69	595
2019 August	438	81	519
2019 July	557	118	675
2019 June	258	96	354
2019 May	224	61	285
2019 April	204	32	236
2019 March	296	29	325
2019 February	332	38	370
2019 January	384	23	407
2018 December	419	30	449
2018 November	440	30	470
2018 October	498	31	529
2018 September	249	19	268
2018 August	183	13	196
2018 July	176	32	208
2018 June	175	18	193
2018 May	172	27	199
2018 April	171	18	189
2018 March	186	14	200
2018 February	141	20	161
2018 January	142	8	150
2017 December	93	10	103
2017 November	121	23	144
2017 October	140	11	151
2017 September	111	16	127
2017 August	92	12	104
2017 July	79	11	90
2017 June	150	15	165
2017 May	156	20	176
2017 April	148	14	162
2017 March	215	34	249
2017 February	266	6	272
2017 January	166	15	181
2016 December	158	20	178
2016 November	256	16	272
2016 October	195	16	211
2016 September	325	32	357
2016 August	151	26	177
2016 July	126	26	152
2016 June	143	38	181
2016 May	136	43	179
2016 April	113	40	153
2016 March	118	47	165
2016 February	131	49	180
2016 January	98	51	149
2015 December	101	40	141
2015 November	105	54	159
2015 October	124	57	181
2015 September	112	40	152
2015 August	146	48	194
2015 July	149	40	189
2015 June	120	16	136
2015 May	113	19	132
2015 April	110	23	133
2015 March	136	59	195
2015 February	82	33	115
2015 January	77	16	93
2014 December	79	8	87
2014 November	63	17	80
2014 October	67	12	79
2014 September	76	17	93
2014 August	69	19	88
2014 July	57	21	78
2014 June	97	60	157
2014 May	113	36	149

REVISTA ESPAÑOLA DE

CARDIOLOGÍA

Editorial
Basis for the Interpretation of Noninferiority Studies: Considering the ROCKET–AF, RE-LY, and ARISTOTLE Studies

Bases para la interpretación de los estudios de no inferioridad: a propósito de los estudios ROCKET–AF, RE-LY y ARISTOTLE

Table of contents

Options

Editorial Basis for the Interpretation of Noninferiority Studies: Considering the ROCKET–AF, RE-LY, and ARISTOTLE Studies

Bases para la interpretación de los estudios de no inferioridad: a propósito de los estudios ROCKET–AF, RE-LY y ARISTOTLE

Table of contents

Options

Editorial
Basis for the Interpretation of Noninferiority Studies: Considering the ROCKET–AF, RE-LY, and ARISTOTLE Studies