See the Resource Collection for resources on designing and conducting an evaluation.
You've planned your evaluation, and now the work of design, data collection, and analysis begins. There are two broad categories of evaluations: studies of implementation and studies of impact. Using both types of studies together provides a comprehensive set of findings. However, if the number of practices is small, it is difficult to detect a statistically significant "impact" effect on outcomes such as cost and utilization. In such cases, you could review larger, published studies to learn about the impact of the intervention, and focus on designing and conducting an implementation study to understand how best to implement the intervention in your setting.
Design and conduct a study of implementation. Some evaluators may want to focus exclusively on measuring an intervention's effects on cost, quality, and experiences of patients, families, clinicians, and staff. However, you can learn a great deal from a study of how the intervention was implemented, the degree to which it was implemented according to plan in each practice, and the factors explaining both purposeful and unintended deviations. This includes collecting and analyzing information on: the practices participating in and patients served by the intervention; how the intervention changed the way that practices delivered care, how this varied in intended and unintended ways, and why; and any barriers and facilitators to successful implementation and achieving the outcomes of interest.
Whenever possible, data collection should be incorporated into existing workflows at the practice to minimize the burden on clinicians and other staff. You should consider the cost of collecting each data source in terms of burden on respondents and cost to the evaluation.
We recommend including both quantitative and qualitative data when studying how your intervention is being implemented. Although implementation studies tend to rely heavily on qualitative data, using some quantitative data sources (a mixed-methods approach) can amplify the usefulness of the findings.10,11,12
An implementation study can provide invaluable insights and often can be done inexpensively.
If resources do not permit intensive data collection, a streamlined approach to studying implementation might rely on discussions with practice clinicians, staff, and patients and their families involved in or affected by the intervention; analysis of any data already being collected in tracking systems or medical charts; and review of available documents, as follows:
- Interviews and informal discussions with patients and their families provide their perceptions of and experiences with care.
- Interviews and discussions with practice clinicians and staff over the course of the intervention (including clinicians, care managers, nurses, medical assistants, front office staff, and other staff) using semi-structured discussion guides will provide information on how (and how consistently) they implemented the intervention, their general perceptions of it, how it changed their interactions and work with patients, whether they think it improved patient care and other outcomes effectively, whether it gained buy-in from practice leadership and staff, its financial viability, and its strengths and areas for improvement.
- Data from a tracking system are typically inexpensive to gather and analyze, if a system is already in place for operational reasons. The tracking system might document whether commonly used approaches in new primary care models such as care management, patient education, and transitional support are implemented as intended (for example, after patients are discharged from the hospital or diagnosed with a chronic condition). If a tracking system is in place, modifying it to greatly enhance its usefulness for research is often relatively easy.
- Medical record reviews can be inexpensive to conduct for a small sample. Such reviews can illustrate whether and how well certain aspects of the primary care intervention have been implemented. For example, you might review electronic charts to determine the proportion of patients for whom the clinician provided patient education or developed and discussed a care plan. You could also look more broadly at the effects of various components of the intervention on different patients with different characteristics. You could select cases to review randomly or focus on patients with specific characteristics, such as those who have chronic illness; no health problems; or a need for education about weight, smoking, or substance abuse. Unwanted variation in care for patients, as well as differences in care across providers, might also be of interest.
- Review of documents, including training manuals, protocols, feedback reports to practices, and care plans for patients, among others, can provide important details on the components of the intervention.
This information can be relatively inexpensive to collect and can provide insights about how to improve the intervention and why some outcome goals were achieved but others were not.
With more resources, your implementation study might also collect data from the following data sources:
- Surveys with patients and their families can be used to collect data from a large sample of patients. The surveys might ask about the care patients receive in their primary care practices (including accessibility, continuity, and comprehensiveness); the extent to which it is patient-centered and well coordinated across the medical neighborhood of other providers; and any areas for improvement.
- Focus groups with patients and families allow for active and engaged discussion of perspectives, issues, and ideas, with participants building on one another’s thinking. These can be particularly useful for testing out hypotheses and developing possible new approaches to patient care challenges.
- Site visits to practices can enable you to directly observe team functioning, workflow, and interactions with patients to supplement interviews with practice staff.
- Surveys of practice clinicians and staff can provide data from a large number of clinicians and staff about how the intervention affects the experience of providing care.
- Medical record reviews of a larger sample of patients can provide a more comprehensive assessment of how the team provided care.
Your analysis should synthesize data from multiple sources to answer each research question. Comparing and contrasting information across sources strengthens the findings considerably and yields a more complete understanding of implementation. Organizing the information by the question you are trying to answer rather than by data source will be most useful for stakeholders.
Depending on the duration of the intervention, you may be able to use interim findings from an implementation study to improve and refine the intervention. Although such refinements allow for midcourse corrections and improvements, they complicate the study of the intervention's impact— given that the intervention itself is changing over time.
Most small pilots will not be able to detect effects on cost because they do not have enough practices to detect such effects. Devoting resources to an impact study with a small number of practices is not a good investment.
Design and conduct a study of impacts. Driving questions for most stakeholders are: What are the intervention's impacts on health care cost; quality; and patient, family, clinician, and staff experience? These are critical questions for a study of impacts. Unfortunately, most studies of practice-level interventions in a small number of practices would be wise to not invest resources in answering them, due to the statistical challenges inherent in evaluating the impacts of such interventions. If your organization can support a large-scale test of this kind, or has sufficient statistical power with fewer practices because their practice patterns are very similar, this section of the Guide provides some pointers. We begin by explaining how to assess whether you are transforming enough practices to be able to conduct a study of impacts.
Assess whether the sample size is adequate to detect effects that are plausible to generate and substantial enough to encourage adoption. If you are considering conducting a study of impacts, you should first calculate whether the sample is large enough to detect effects that are moderate enough in size to be plausible, but large enough that stakeholders would consider adopting the intervention if the effects were demonstrated. Your assessment of statistical power must account for clustering of patient outcomes within practices. In most cases, evaluations of primary care interventions require surprisingly large numbers of practices, regardless of how many patients are served, to be confident that plausible and adequate-sized effects on cost and utilization measures will be shown to be statistically significant (described in more detail in Appendix A). Your evaluation would likely need to include more than 50 intervention practices (unless the practice patterns are very similar) to be confident that observed differences in outcomes are true effects of the intervention.7
For most studies, power estimates will show that it will not be possible to detect the effects of an intervention in a small number of practices unless the effects are much larger than could plausibly be generated. (Exceptions are practices with similar practice patterns.) In these cases, we advise against evaluating and estimating program effects (impacts). Doing so is likely to lead to erroneous conclusions that the intervention did not work (if the analysis accurately accounts for clustering of patient outcomes within practices), when the evaluation may not have had a sufficient number of practices to differentiate between real program effects and natural variation in outcomes. In those cases, we recommend that the evaluation focus on conducting an implementation study.
Comparing the intervention group to a comparison group that is similar before the intervention is critical. The evaluation can select the comparison group using a randomized or non-experimental design. If possible, try to use a randomized design.
Consider these pointers for your impact study. If you have sufficient statistical power to include an impact study, here are things to consider.
- Have a method for estimating the outcomes patients would have experienced in the absence of the intervention. Merely looking at changes in trends over time is unlikely to correctly identify the effects of the intervention because trends and external factors unrelated to the intervention affect outcomes. Skeptics will find such studies dubious because changes over time in health care costs may have affected all practices. For example, if total costs for patients treated by a PCMH declined by 5 percent, but health care costs of all practices in that geographic region declined by 5 percent over the same period, the evaluation should conclude that the PCMH is unlikely to have had a meaningful effect on costs.
You should, therefore, consider what would have happened to the way intervention practices delivered care and to patients' outcomes if the practice had not adopted the intervention—that is, the "counterfactual." Comparing changes in outcomes between the intervention practices and a group of comparable practices helps to isolate the effect of the intervention from the effects of other factors. If you can, select the comparison group of practices using a randomized or experimental design. A randomized design will give you more confidence in your results and should be used whenever possible. Appendix B contains a few additional details on different approaches to selecting a comparison group, but we caution that the appendix does not cover the many considerations that go into selecting a comparison group. - Make sure comparison practices are as similar as possible to intervention practices before the intervention begins. If the intervention and comparison groups are similar before the intervention begins, you can be more confident that the intervention caused any subsequent differences in outcomes between the two groups. If available, your evaluation should select comparison practices with similar patient panels, including age, gender, and race; insurance source; chronic conditions; and prior expenditures and use of hospitalizations, ER visits, and skilled nursing facility stays. Ideally, practice-level variables such as practice size; whether the practice is independent or part of a larger system; the number, types, and roles of non-physician staff; and urban/rural location should also be similar. To improve confidence further, if data are available, you should examine how similar outcomes were in both groups for several years before the intervention began to ensure patients in the two groups had a similar trajectory of costs. Moreover, if there are preexisting differences in cost trends, you can control for them. You can examine the comparability of the intervention and comparison practices along as many of these dimensions as possible even if you cannot use all of them to select the comparison group.
- Use solid analytical methods to estimate program impacts. If you have selected a valid comparison group and included enough practices, appropriate analytical methods will generate accurate estimates of program effects. These include using a difference-in-difference approach (which compares changes in outcomes before and after the intervention began for the intervention group to changes in outcomes over the same time period for the comparison group), controlling for patient- and practice-level variables ("risk adjustment"), and adjusting standard errors for clustering and multiple comparisons (see Appendix C).
- Conserve resources by using different samples to measure different outcomes. Calculating statistical power for each outcome can help you decide which sample to use to collect data on various outcomes. It is generally costly to collect survey data. For most survey-based outcomes, evaluations typically need data from 20 to 100 patients per practice to be confident that they can detect a meaningful effect. Collecting survey data from more patients might increase the precision of estimates of a practice's average outcome for its own patients, but it will only slightly improve the precision of the estimated effect for the intervention as a whole (that is, increase your ability to detect a small effect). It typically adds relatively little additional cost to analyze data from claims or electronic health records (EHRs) on all of the practices' patients rather than just a sample, so we recommend analyzing claims- and EHR- based outcomes using data for as many patients as you can. On the other hand, some interventions can be expected to generate bigger effects for high-risk patients, so knowing how you will define "high-risk" and separately analyzing outcomes for patients who meet those criteria may improve your ability to detect specific effects.13
The Patient Centered Outcomes Research Institute also provides useful recommendations for study methodology.
Synthesize findings from the implementation and impact analyses. Most evaluations generate a lot of information. If your evaluation includes both implementation and impact analyses, using both types of findings together will provide a considerably more sophisticated understanding about the effects of the model being tested than either alone. Studying the connections between findings from both—arrayed according to their appearance in the logic model—can help illuminate how a primary care intervention is working, suggest refinements to it, and, if it is successful, consider how to spread it to other practices.
Ideally, you will be able to integrate your implementation and impact work so that they will inform one another on a regular and systematic basis. This type of integrated approach can provide insights about practice operations, and barriers and facilitators to success. It can also help generate hypotheses to test with statistical models of impact, as well as explanations for differences in impacts across geographic areas or types of practices or patients. This information, in turn, can be used to improve the interventions being implemented and inform practices about the effectiveness of changes they are making. If you collect implementation and impact results at the same time, you can use them to validate findings and strengthen the evidence for the evaluation's conclusions. Moreover, information from implementation and impact analyses is useful for understanding how to refine and spread successful interventions.