Most clinical trials go like this: you take a group of people with some condition, say lower back pain, and you randomly assign each of those people to one of two (or more) treatment groups. For the purposes of illustration, let’s say there are three: group 1 receives a new experimental painkiller; group 2 receives an older, more established painkiller; group 3 receives placebo. No one – not the researcher, not the patients – knows who is in which group until the end of the study.

The Limitations of Parallel-Group Trials

The study I’ve just described is called a parallel-group trial, and the vast majority of randomized controlled trials (RCTs) follow this design. In 2010, researchers analyzed 616 clinical trials published within a single month. Of these, 477 (77.4%) were parallel-group trials.

Participating in a parallel-group trial requires considerable altruism on the part of the patient. Assuming the treatment groups above are of equal size, a third of patients will receive no drug whatsoever to alleviate their pain, and another third will receive something they easily could have obtained without the hassle of a clinical trial. For this reason, patients are normally required to sign forms stating that they expect to receive no benefit from a trial and are participating purely out of a desire to contribute to scientific knowledge.

Ethical considerations aside, there is also a lack of clarity surrounding the application of evidence from parallel-group trials to the treatment of individual patients. A trial may conclude that one treatment is superior to another on average, but this result may not apply to every individual patient in the trial. For example, consider the two distributions of outcome scores shown in Figure 1. Despite the experimental agent’s performing better on average, some individual patients taking placebo did better than others who were on the experimental agent. It’s often impossible for a physician to estimate the most likely outcome for a specific patient based on trial data, especially if the patient has particular features that do not resemble the study population (for example, if the study population is mostly of European descent and the patient is not).

Figure 1 Figure 1: Hypothetical parallel-group result that would lead to a conclusion of superiority for the test agent over placebo, despite the fact that some of the patients receiving placebo actually did better than patients receiving the test agent. This plot is based on simulated data.

Researchers have been aware of the parallel-group design’s limitations for decades. The second most popular study design, representing about 16% of trials, was meant to address some of these issues. It is called the crossover design.

In the simplest variant of a crossover design 1, each patient receives all treatments. Because each patient encounters every treatment, all trial participants are - in an important sense - equal2. Randomization comes into play only when deciding treatment order (see Figure 2). The treatment ordering may be individually randomized, or there may be fixed treatment orderings (e.g. ABBA and BAAB) to which patients are randomly assigned.

Figure 2 Figure 2: Comparing parallel-group and crossover study designs with two treatments. The number of people shown for each is the average sample size of studies with that design from Hopewell 2010. The gray arrows shown for the crossover design refer to “washout” periods, which are necessary to avoid carryover effects between treatment blocks.

Crossover designs have a significant methodological advantage. The goal of a clinical trial is to estimate the individual treatment effect, which is defined as “the difference between a person’s outcome on treatment A and [that same person’s] outcome on treatment B”. In a crossover design, this effect can be measured directly since the same patients experience both A and B. In a parallel-group design, this is not the case; the treatment groups contain different patients who may vary based on factors like age, gender, ethnicity, comorbidities, etc. Randomization is supposed to balance this out, but in practice the groups usually differ, and the effects of these additional factors must be accounted for in subsequent statistical analyses. This has the practical effect of driving up the number of patients needed for a parallel-group study (the sample size) relative to a similar study that employs a crossover approach, with a corresponding increase in study cost. The average sample size of studies employing each design is shown in Figure 2.

Not So Fast: The Limitations of Crossover Designs

Crossover designs are a valuable component of the clinical study design arsenal. They are especially relevant to the study of chronic conditions such as neuropsychiatric disorders (depression, ADHD), arthritis, hypertension, asthma, heart disease, obesity, and type II diabetes, which affect over half of all U.S. adults (117 million people as of 2012).

However, they do not work in all cases. Here are some examples of situations where crossover designs fail:

  • Acute conditions. Because crossover studies include multiple treatment blocks and the disease state should remain relatively consistent from block to block, they are generally not appropriate for studying diseases with a short time course. This includes, for example, colds and many other infectious diseases.
  • Outcomes like “death” or “first heart attack”. An outcome that takes a long time to observe or that can occur only once is inappropriate for a crossover design, for obvious reasons (e.g. a patient who dies during treatment 1 probably will not respond well to treatment 2, no matter how well it works).
  • Long-term carryover effects. Washout periods are normally included between treatment blocks to reduce the probability that an earlier treatment will impact the perceived effect of a later treatment (a “carryover effect”). If a treatment permanently modifies a patient’s state, it’s impossible to eliminate its effect to enable fair testing of a different treatment on the same patient. Surgery, for example, is generally not a good candidate treatment for a crossover study.

A crossover trial will also take considerably longer, from the perspective of a patient, than a corresponding parallel-group trial, since each patient needs to encounter every treatment (Figure 2).

One-Size-Fits-All Evidence in the Era of Precision Medicine

Modern clinical evidence depends on a mixture of parallel-group (majority) and crossover (minority) trials, with a few other designs (factorial, cluster, split-body, etc.) thrown in. Regardless of design specifics, however, all of this evidence suffers from the same key limitation: treatment effects are estimated at the population level, leading to one-size-fits-all recommendations that are not necessarily appropriate for individual patients. The extent to which results from clinical trials are relevant to “real-life” patient populations is an open question.

As the dialogue within the U.S. government and medical community increasingly shifts toward precision medicine, the limitations of population-based studies become increasingly difficult to ignore. Restricting a trial to patients with a particular genetic mutation, clinical history, or set of demographic features dramatically escalates the time and resources necessary for recruitment, and renders many trials impossible due to limited patient numbers.

Here are some situations where both parallel-group and crossover trials often fail:

  • Rare diseases. If a disease affects only a few hundred people per year, it may take too long to recruit enough patients to adequately power a study.
  • Patients with complex treatment histories. Due to the unpredictable effects of earlier treatments, patients with chronic illness who have already received treatment are often excluded from trials.
  • Diseases that differ substantially from case to case, such as brain injuries, chronic pain, and mental illness. It is difficult to make a claim about the success or failure of a treatment at the population level when a disease varies considerably from patient to patient.
  • Patient demographics that don’t match study populations. For example, an Southeast Asian patient with high cholesterol may want to know if he should take a statin. Unfortunately, most relevant studies are based on parallel-group trials in people of European descent.

Precision medicine demands that we reexamine our standards for clinical evidence, finding ways to produce robust, trustworthy clinical data outside the bounds of traditional, population-based RCTs.

Edit: The thrilling conclusion to this story can be found here!

Notes and Further Reading

The statistics about the relative frequencies of different study designs in the clinical trials literature came from this paper, which also includes information about the average numbers of treatment arms, average patients per study, etc.

The statistics on chronic diseases and health care spending came from the Centers for Disease Control.

  1. The design and analysis of crossover studies is a field with a rich history, most of which we will gloss over here. If you’re interested in a more complete discussion, a good reference is Jones, B., & Kenward, M. G. (2014). Design and analysis of cross-over trials. CRC Press. 

  2. Although this may seem to imply that crossover designs demand less patient altruism than parallel-group designs, the treatment effect is still determined at a population level and patients do not typically learn if a particular treatment is better for them, personally.