This appendix briefly describes the complex issue of selecting a comparison group using a randomized or non-experimental design. The goal is to identify a group of comparison practices that are as similar as possible to the intervention practices. To better understand your design options, it is often most efficient to consult with an experienced evaluator. You can also obtain background from a good textbook (such as Orr14 or Shadish, Cook, and Campbell15).
When Possible, Use a Randomized Design
The most rigorous and credible way to develop a counterfactual is to randomize practices interested in participating in the intervention to an intervention or control group.b The control group will then provide a good proxy of what would have happened to intervention practices had they not adopted the model. However, many stakeholders believe they cannot conduct a randomized trial for ethical or fairness reasons. In such cases, a key question is: Are there more practices interested in transforming than resources to transform them? If the answer is yes, two pragmatic ways to randomize practices are available—both of which provide a strong randomized design to study the effects of a primary care intervention.
The first approach to selecting practices to participate is to conduct a lottery among all practices that volunteer. A lottery is a randomized controlled trial in which practices selected by lottery receive the intervention, and practices that are not selected serve as a control group.
Another approach is to allow all practices that volunteer to participate, but stagger the rollout of implementation across them. This is called a staggered randomized or stepped wedge design. The late starters serve as a control group—before they begin the intervention—for the early starters.16 The advantages of this design are (1) all interested practices have the opportunity to participate, and (2) operational support can be provided to small groups of practices at a time, reducing resource demands on the system. The disadvantage is that the late starters can only serve as a pure control group until they begin the intervention. For example, if they begin 1 year later than the early starters, your evaluation will have only 1 year of data to use to compare outcomes between the intervention and control groups—which might be too short a period to realize many potential improvements associated with primary care transformation.17 However, you can also use a staggered randomized design to examine outcomes at different stages, such as comparing practices with 2 years of experience with the intervention and practices with only 1 year of experience.
If your evaluation uses a randomized design by lottery or by staggered rollout, it is critical to select practices at random. Picking a practice for the intervention group because it seemed to have the strongest physician commitment, or because it had better or worse patient outcomes, makes it difficult to disentangle the effects of the intervention from those of the practice’s existing performance or motivation. Similarly, in a staggered randomized design, be sure to randomize practices into rollout periods, avoiding the urge to start with practices that are more sophisticated or more eager to begin the intervention.
If stakeholders want to introduce the intervention in all practices, another option would be to analyze the effectiveness of different approaches to implementing the components of the intervention within the practices. At the outset of the study, each practice could be randomized to receive a combination of different approaches to implementing the intervention. For example, the practices could be randomly assigned to use either a social worker or nurse to coordinate care, and randomly assigned to follow up with patients within 2 days of a hospital discharge, either in person or by telephone. This approach, called orthogonal design, enables every practice to test at least some of the components (that is, no practice would be a pure control), while generating important operational lessons about the best ways to deliver the different components.18,19
When Randomized Designs Are Not Feasible, Use a Strong Comparison Group Design
Sometimes randomized designs are not feasible. In this case, it is critical to determine how the participating practices chose (or were chosen) to participate in the intervention and mimic those factors to the extent possible when selecting a non-experimental comparison group. The factors driving participation include formal and informal selection criteria by the organization and decisions made by practices. For example, if the organization selects all practices in a particular city to test the intervention, the comparison group should contain practices in a city with a comparable market and patient mix. If only practices that had certain health IT in place were chosen, practices with similar health IT, as well as size, patient mix, and outcomes—before the intervention—should be selected for the comparison group. Ideally, the group of comparison practices should have the same characteristics as the intervention practices. Two popular options for selecting a comparison group are regression discontinuity (RD) designs, and propensity score matching (PSM) designs. However, both PSM and RD designs may not have sufficient power for interventions with a small number of practices.