Ongoing program evaluation is just as important for self-management support programs as it is for other service delivery programs. Purchasers and builders will want to routinely receive information that allows them to assess the program's operation and performance, especially whether it ultimately benefits patients with chronic illnesses. Yet there is at present no standard format for such information, and a purchaser or builder of a self-management support program will find a broad array of possible evaluation measures. To some extent, the choice depends on the program's main goals, but a selection of endpoints may lead to the best understanding of how the program is working. In the following section, "Evaluation Measures," we describe the range of evaluation measures revealed by our literature review and expert interviews and discuss the key issues that were raised. Additional methodological issues are discussed in the next section, "Evaluation Methodology."
Evaluation Measures
Program success may be assessed at many points along the chain of effects presented in Figure 1. One can examine whether:
- Program structure matches what was called for in the contract.
- Coaches are engaging eligible patients and performing the self-management support activities.
- Patients' knowledge and self-efficacy have increased.
- Patients' health-related behaviors have changed.
- Rates of provider adherence to guidelines have increased.
- Disease control has improved.
- Patient health outcomes have improved.
- Patient satisfaction has improved.
- Utilization has declined, and patient productivity has improved.
- Health care costs have declined.
Each of these interim and long-term goals is important and provides a possible endpoint for evaluation. They are discussed in more detail below.
Tables 3a, 3b, and 3c provide an overview of the endpoints that are examined in the recent literature on existing self-management support programs. The columns in the table parallel the boxes in Figure 1. It can be seen that endpoints all along the chain of effect (following program structure) have been utilized by researchers. Among our interviewees, some use of each category was reported. Neither our literature review nor our interviews identified research on programs that measured endpoints in all categories. Utilization, costs, provider behavior, and disease control endpoints received the greatest emphasis, most likely reflecting considerations of data availability, reliability, and other tradeoffs associated with different data sources.48
Measures of Program Structure
Commonly, purchasers of external services look for accreditation of a program as a measure of the program's structural soundness. While few studies in the literature used structural measures for program evaluation, a purchaser or developer might want to monitor whether the program has the components and features that were called for in the original plan or contract. One might also question if the features are plausibly capable of supporting the kinds and extent of self-management support activities desired. Are the staff and caseload as expected? Do staff members have the qualifications and training to perform their duties? Can they reasonably be expected to support the kinds of activities envisioned for the intended number of patients? Are procedures and protocols in place to ensure that coaching tasks, as well as education, are performed? Structural measures also may be used to assess whether claims about the program's success based on other measures are plausible.
Most structural measures rely on information from the program's management. Patient self-report is another data source to consider. The Disease Management Association of America (DMAA) Participant Satisfaction Survey, for example, asks patients if their program has a toll-free number they can call.49
Self-management Support Process Measures
Another approach to evaluation is to monitor the performance of the program staff by examining the extent to which they perform the tasks and activities intended; i.e., how well the process of providing support to chronically ill patients actually works. Not only are program process measures critical for program supervision and management, they also can tell purchasers and developers if the program is being implemented according to plan. Measures of reach and implementation can help reveal factors that contribute to success or failure and be useful for monitoring of staff performance and program improvement. As shown in Table 3, process measures in the program literature have focused on the reach of the program, processes for assessing patients' self-management skills and needs, education processes, and coaching processes.
Measures of "reach" are intended to assess the extent to which the self-management support program reaches the people it is intended to serve. Examples include the percent of the eligible population successfully contacted with an offer of self-management support services, enrollment rates (or opt-in or opt-out rates), completion rates, and drop out rates. As an example, the study of the asthma call center program described earlier reported that 474 of the 1,303 member population with asthma actually enrolled in the program, and 196 of these enrollees were stratified to the high risk subset for telephone support.50 An intermediate implementation measure of the program's attempts to engage patients (e.g., the number of attempted calls that resulted in the reach figure) also may be helpful for monitoring program performance in relation to reach.
One informant emphasized that it is important for all aspects of evaluation to take into account at the beginning the number of people who are eligible for the program—that is, the opportunity for making an impact based on the total number of people the program might serve and what portion of the total the program plans to enroll or engage. Different data sources may lead to different definitions of the total population. Programs that rely on claims data will limit the total to individuals who have been diagnosed (or received some specific treatment). Yet, a significant proportion of individuals with some conditions go undiagnosed. If the intent is to capture all people with the condition, as advocated by many proponents of the chronic care model, then health risk assessments or other screening mechanisms may be necessary to identify the total population.
Who is being reached is another consideration. It is important that the people reached were, in fact, part of the target population. Another critical issue is whether the people reached are disproportionately those who already are most likely to self-manage, a difficult question to assess from most datasets. Some evaluation research has utilized propensity scores based on analysis of predictors of program enrollment,43,50 but it is not clear if propensity scores have been incorporated into program performance reporting.
In addition to reach, other program process measures assess the extent to which the self-management support interventions were implemented as intended. They essentially assess the program's performance of the self-management support processes called for in the program protocols. Examples of implementation measures reported in the program literature include the:
- Number of education sessions provided in person or by telephone.
- Frequency of coaching telephone calls.
- Duration of the telephone calls.
- Content of the telephone calls.
An evaluation of a depression telecare program sponsored by an employee coalition, for example, reported that 100 of 102 eligible enrollees received at least one nurse call. They averaged 11.1 calls per patient (with a range of 0-22), and calls lasted 6.5 minutes on average.51 An evaluation of a diabetes disease management program reported the number of educational mailings distributed, the number and average duration of telephone interactions, the average number of telephone interactions with individuals in the highest severity category, and the number of patients who could not be reached.52 Many studies assessed implementation by measuring the documentation of self-management support processes, such as patient education provided, action plan completed, patient goals collaboratively agreed on, smoking cessation counseled, referrals suggested, blood glucose self-monitoring training provided, and spacers and peak flow meters distributed. The asthma program described as a call-center model regularly reported to its sponsors the percentage of patients with a care plan, but the program did not use this measure in their study.50
Staff also may be surveyed about the performance of self-management support processes. One study53 reported the results of a survey that asked staff to report the frequency with which the following self-management support processes occurred:
- Support was promoted through problem-solving and empowerment methodologies.
- Patient self-management needs were assessed.
- Individualized written care plans were prepared.
- A written care plan was made available to primary care and urgent care staff.
- Spacer techniques were taught.
- Low-cost peak flow meters, spacers, and nebulizers (self-care tools) were made available to patients.
Alternatively, similar measures may be based on reports by patients that they have received specified self-management support services. The DMAA Participant Satisfaction Survey, for example, includes items soliciting patients' reports of the frequency of different types of contact with program staff (such as receipt of educational materials, scheduled calls, and face-to-face meetings), which lifestyle changes have been emphasized by program staff (e.g., improving diet, taking medications as prescribed, getting annual check-ups, weight management), and whether specific biometric monitoring devices were provided (such as glucometers, peak flow meters, digital weight scales, or home cholesterol screening monitors).49 Inventory counts also may be used to measure the distribution of patient education materials or self-management support tools.
Measures of Patient Self-efficacy and Knowledge
The principal goal of a self-management support program is to increase patients' knowledge and self-efficacy for self-management, and it makes sense to measure its ability to do so. Measures of self-efficacy assess people's confidence in their ability to perform or adhere to specific behaviors such as exercise, diet, or stress management or to overcome obstacles to the performance of these behaviors. Measures of knowledge and self-efficacy were used in some of the studies we reviewed. For example, when an independent delivery system implemented a 7-week, small-group self-management support program for patients with one or more chronic diseases, the evaluation included a self-efficacy measure of perceived adaptability to manage pain, fatigue, emotional distress, and other aspects of chronic illness.45,46 The patient's readiness to change also was assessed in a number of studies. Self-management, problem-solving skills, and self-management barriers were also assessed.
It is important to identify measures that are validated, but it may be more difficult for this program area. More than one informant mentioned patient activation measures currently being developed and validated. According to interviewees, some programs use patient surveys to collect information on the patients' knowledge and self-efficacy. For several programs, routinely collecting such information is part of their program, e.g., the responses are recorded in the self-management support software or entered into the interactive computer programs. Patient surveys often are expensive, but relying on patients' responses to their coaches raises the potential for added bias, since patients may be less likely to be truthful with their coaches. One administrator (in a primary care model program) said they anonymously administer a small written survey to a portion of patients each month to measure their confidence in their ability to self-manage. They switched to the anonymous questionnaire after becoming suspicious of the very high levels of confidence reported by patients in response to in-person queries by coaches.
Measures of Patient Behavior
The American Association of Diabetes Educators (AADE) considers changes in patient behavior to be the outcome most sensitive to its diabetes self-management support. Diabetes educator-researchers recommend that measures of seven self-care behaviors be used to determine the effectiveness of self-management education at the individual and population levels. These behaviors include monitoring blood glucose, problem solving, taking medicine, psychosocial adaptation, reducing risks of complications, being active, and eating. Table 4 shows the AADE's specific recommendations for measures and methods of measurement for assessing these intermediate outcomes.
The DMAA recommends that disease management programs evaluate change in medication adherence and lifestyle behaviors (diet, exercise, and smoking status, at a minimum.) Its recent Outcomes Guidelines Report states:
Disease management programs frequently measure whether patients receive prescriptions for medications identified in evidence-based guidelines for specific conditions (e.g., beta-blockers for patients who have had an acute myocardial infarction), but the prescription will not promote better health outcomes or reduced costs unless the patient takes the medication as prescribed. Accordingly, medication adherence is a component of patients' self-management of their chronic conditions, an important target for disease management patient education efforts and thus, an important metric to be assessed in evaluation of these programs.54
In the literature on real world programs, evaluations have measured a variety of behaviors. Examples have included measuring self-monitoring of dietary intake and physician activity, attendance at self-efficacy classes, patient-initiated telephone contact, foot care, glucose monitoring, self-management strategy use, use of self-management tools, insulin dose adjustment, medication compliance, controller use, physician visits, communication with the physician, diabetes health exams, eating and dietary behaviors, physical activity, frequency of exercise, and tobacco use. In the example of a call center asthma program described earlier, pharmacy data were used to create measures of use of beta agonists, inhaled corticosteroids, leukotrine modifiers, and oral steroids.50 An evaluation of a disease management program sponsored by a pharmacy benefit management company used measures of medication compliance (84 days of therapy in a 114-day period in the acute phase and 180 days of therapy during a 231-day period in the continuation phase), persistency of medication therapy (lack of a 90-day or longer gap in prescription refills during the 7-month observation period), and patient refill timeliness (time to first refill).55 In the evaluation of a 7-week, small group self-management support program implemented within an independent delivery system, the measures of patient behavior focused on exercise (e.g., minutes per week of aerobic exercise and range-of-motion exercise), cognitive symptom management, and communication with physician.46 Smoking behavior, quit rates, and daily weighing were some of the behavior measures mentioned in the key informant interviews.
Most of these measures rely on pharmacy data or patient self-report. If use of pharmacy data is feasible, then it is sensible to use these data for evaluating the program's effect on patients' medication behavior. Administrative data might provide a similar source of data on clinic visit behavior, although attention should be given to the extent of the time lag evident in the data reporting. Reliable sources of data on other aspects of patient behavior (e.g., physical activity or diet) are less available. As with measures of patient knowledge and self-efficacy, patient self-report may be the only feasible source of data on many of the patient behaviors targeted by the self-management support. To the extent possible, care should be taken to minimize potential bias from a patient's recall difficulties or the desire to please.
Not only is changing certain behavior important for many patients with chronic conditions, but sustaining behavior change is critical as well. Several experts stressed the need to evaluate whether behavior change is sustained over time.
Measures of Provider Behavior and Guideline Conformance
A number of the studies that we reviewed assessed programs by investigating changes in provider behavior or conformance with guidelines. Many studies examined physicians' medication prescribing, diagnosis documentation, referrals, and rates at which they performed various procedures (HbA1c tests, eye exam rates, foot exams, allergen immunotherapy, and pulmonary lab procedures). One study of an asthma program, for example, compared the performance of allergen immunotherapy, pulmonary lab procedure, ventilation and perfusion imaging, influenza immunization, and pneumococcal immunization.50 In the interviews, respondents mentioned using such clinical process indicators as performance of clinical diagnostic tests, lab tests, medication prescribing, or, more generally, processes called for in clinical practice guidelines. Some respondents said their organizations use measures from the Health Plan Employer Data and Information Set (HEDIS) developed by the National Committee for Quality Assurance for use in accreditation and certification of health care organizations.57
To the extent that changing provider behavior is a target of the self-management support program (e.g., if coaches assist and encourage diabetes patients to remind clinicians that they are in need of a foot exam), these measures may be appropriate for evaluating effectiveness. Many of the measures are based on administrative data and may be readily accessible for numerous programs. Much of self-management support, however, targets patient behavior, and patient behavior alone does not determine whether these clinical processes are performed.
Measures of Disease Control
Researchers have used measures of HbA1c, lipids, blood pressure, weight gain, chest pain, cough, dizziness, shortness of breath, peak flow readings, asthma symptom scores, nighttime symptoms, self-reported severity of symptoms, and body mass index to assess disease or symptom control. Several of these measures also were mentioned in the interviews. Symptom control measures, along with clinical process measures, are emphasized by major national measure sets. The National Committee for Quality Assurance, American Diabetes Association Provider Recognition Program (ADA PRP), Diabetes Quality Improvement Project (DQIP), and National Quality Forum (NQF) diabetes measurement sets all include HbA1c and lipid control indicators as well as HbA1c, lipid, urine protein, and eye testing.56 Clinical data (laboratory or medical record data) are needed for a number of these measures, however, and such data are difficult to collect, particularly for external model providers of self-management support, unless an electronic medical record is available. While patient self-report is reasonable for a number of these measures, such as chest pain or shortness of breath, it is unlikely to be reliable for other disease control measures such as cholesterol levels or other lab values.
Health Outcome Measures
Researchers have used a variety of health outcome measures, including functional status, complications such as organ damage or lower extremity amputations, physical and mental functioning, quality of life, mortality, disability, pain, restricted activity days, days in bed, and self-reported health status. Fewer outcome measures were mentioned by the interview respondents. These measures included global health scores, days sick at home, quality of life, and measures of physical functioning. The DMAA recommends use of one of the short-form health status surveys (SF-8, SF-12, or SF-36) to evaluate change in patients' health status.54
Improved health outcomes are unquestionably a prime goal for self-management support programs; however, a serious problem with using health outcomes for evaluation purposes is that it may take years for many of these outcomes to show the effects of improved self-management.58 An assessment that uses a relatively short followup period (a year, for example), as most do, is unlikely to be able to detect improvement in such outcomes.
Patient Satisfaction Measures
Measures of patient satisfaction with care and quality of life were utilized in research and mentioned by a number of interview respondents. The DMAA recently released a new assessment tool for measuring participant satisfaction with disease management. This tool includes a number of items designed to evaluate patients' experience with the program staff, the usefulness of the services received, access to program services, and satisfaction with the information received.49
Utilization and Productivity Measures
Measures of health care utilization included hospital admissions, emergency room visits, inpatient days, lengths of stay, outpatient visits, readmissions, and cardiac procedure rates. As an example, the asthma call center study used inpatient admissions, inpatient bed days, emergency room visits, asthma inpatient admissions, asthma inpatient bed days, and asthma emergency room visits.50 Utilization may be impacted by a self-management support program if the patient's health outcomes improve or if he or she feels more confident and able to handle an exacerbation of symptoms without using clinical services such as emergency rooms.
With most of these measures, program success is assessed in terms of reduced utilization. However, in some cases, outpatient visits may be expected to increase from better self-management. Utilization measures frequently are used to evaluate self-management support programs, partly because they rely on readily accessible administrative data. To the extent that the reduced utilization is expected to result from an outcome that improves over a long time period, these utilization measures will miss detecting benefits in a short followup period
Measures of productivity included days lost from work, days absent from school, and days less productive. Patient-reported productivity items included in the DMAA Participant Satisfaction Survey, for example, focus on days missed from work and normal activities due to health problems related to the medical condition being managed and health-related limitations affecting work (e.g., overall effectiveness, ability to concentrate, ability to handle the workload).49
Measures of Cost
The literature reported that various financial variables were used, including the dollar amount of claims in 1 year per patient, encounter costs, pharmacy costs, inpatient costs, outpatient costs, emergency room visit costs, radiology costs, home health care costs, charges for health care services, and costs for the program. An article on a plan's diabetes self-management support program reported per member per month paid claims, inpatient admissions per-patient per-year, inpatient days per patient per year, emergency room visits per patient per year, primary care visits per patient per year, and HEDIS scores for HbA1c tests and lipid, eye, and kidney screenings.59
Stakeholders generally are interested in financial outcomes. Most interviewees focused on return on investment, and many mentioned the need for a standard methodology for calculating return on investment. Most interviewees also reported utilization statistics, such as emergency room visits, hospital admissions, hospital days, length of stay, neonatal intensive care unit days, readmissions, and/or prescription drug use. While utilization data often are used to project savings, at least one expert argued that actual changes in utilization costs should be reported. Actuarial models for evaluating cost savings from disease management programs have been utilized in the disease management field.60-62 In its recently released consensus guidelines for measuring disease management outcomes, the DMAA recommended that financial impact be assessed in terms of health care cost outcomes, and that such outcomes be measured by changes in total dollars (or per-member-per-month charges) using medical and pharmacy claims data.54 Possible cost measures for which some benchmarks exist include hospital claims (total dollar amount of hospital claims paid), pharmacy claims (total dollar amount of pharmacy claims paid), and total expenditures (total dollar amount in claims paid).48
Evaluations using short followup timeframes will miss savings that result from long-term benefits in health outcomes or utilization.
Combination of Measures
When selecting measures, it is important to consider their sensitivity to the changes targeted by the program goals. A recent review of disease management program indicators found that, in a substantial portion of studies, the indicators used did not conceptually link to the aims of the program as described in the articles. The authors recommended that selection of evaluation indicators be based on their expected sensitivity to the specific design and goals of the intervention. For intermediate endpoints, such as patient knowledge and self-efficacy, patient behavior change, and improved disease control, indicators should be ones that might plausibly be expected to be influenced by the program components and that are associated with the expected changes in outcomes.63
Purchasers and providers of new programs will want to be sure that the structure of the program and the services actually provided match what the contract stipulates. In the mid-term, providers of the programs will want to know if patient self-efficacy and provider behavior have changed. In the longer term, it will be important to evaluate whether the program has improved patient health outcomes and well-being and whether it has reduced costs.
Outcome measures alone should be interpreted with caution, particularly given the usual methodological constraints in real-world program evaluation. The absence of significant change in outcomes may not indicate program failure, for example, if the followup period necessary to show improvement in outcomes is longer than the evaluation timeframe. Monitoring change in multiple dimensions (including such intermediary links in the chain of effect such as change in patient behavior) offers more opportunity to assess the plausibility of assumptions about effect and to increase confidence in judging program success or failure. Analysis of these data can lead to better understanding of if and how different dimensions mediate outcomes. Measuring multiple dimensions also provides more comprehensive data for improving the program and providing performance feedback to staff.
Evaluation Methodology
Measures are just one component of a program evaluation. In evaluations of program impact, a number of other methodological issues, such as the overall evaluation design and sample selection, require careful consideration because they affect the ability to attribute any changes found to the program itself. Self-management support program evaluation and disease management program evaluation share many of these methodological issues, as well as some of the same challenges, such as selection bias and regression to the mean.
One controversy in disease management evaluation has resulted from measuring performance based on biased samples of patients.64 Bias occurs if an unobservable or unmeasured characteristic of a patient makes it more or less likely that this patient has a positive (or negative) treatment effect and a higher (or lower) probability of responding well to the program intervention. A sample that only includes patients who agreed to participate in the program risks selection bias, for example, because these participants are likely to have greater motivation to take care of their health and, therefore, are more likely to benefit from the program intervention than the overall patient population. If so, analysis of the difference between that group and a comparison group will overstate the true effect of the intervention.65
Another evaluation problem results if disease management programs concentrate their efforts on patients whose health costs were high in the baseline period (e.g., those recently hospitalized) and only include such patients in their analysis. Since high-cost events in medical care tend to be non-recurring, a certain proportion of those patients would end up with lower costs in the next period without program intervention, and the analysis will again overstate the effect of the intervention.65
Including all eligible patients in the analysis ("intent to treat" analysis) addresses these problems. For further information on these methodological issues and guidance in how to address them, see Arnold, et al, 2007.48