Overview of Data Infrastructure Development
The data infrastructure includes:
- The data mart, which contains the actual data files created for analyses;
- The Data Infrastructure Development Manual, which describes the development of the data mart;
- The User Manual, which contains information required to conduct data analyses (e.g., names and definitions of variables); and
- The SAS statistical software programs used to develop the data mart.
The data mart was created using data extracts from three types of IHS electronic data that were stored on different platforms:
- National Data Warehouse (NDW),
- Contract Health Services (CHS), and
- CMS Cost Reports (Cost Reports).
These data were supplemented by information from IHS/tribal personnel on project site health service provision, data quality, and use of some NDW codes, as well as by data on county characteristics from the U.S. Census Bureau and IHS data on the costs of purchasing medications and supplies. Below we describe these data sources.
National Data Warehouse
The project’s primary data source, the NDW contains administrative and clinical information on I/T-provided services. The NDW contains four registration tables, which include data on demographic characteristics, and 28 service use tables. Separated by content focus, the service tables include data on provider characteristics, use of services (e.g., inpatient, outpatient, pharmacy) and diagnostic codes.
Contract Health Services
CHS services include specialty services not offered by I/T providers. CHS data from the IHS fiscal intermediary included information for services provided by non-IHS providers but paid for by IHS (e.g., service and provider type, service dates, diagnostic codes, paid amounts).
CMS Cost Reports
Since the NDW data do not include the costs of IHS service provision, IHS Cost Report data were used to estimate the costs of providing I/T services. The IHS Cost Reports were prepared by a financial consulting firm, the Eighteen Nineteen Group, Inc., using a cost accounting process developed by the U.S. Office of Management and Budget to establish Medicare and Medicaid reimbursement rates for I/T-provided services.3 The Cost Report data extracts included data on all costs for providing services, such as those for all types of personnel, ancillary services, pharmacy services, supplies, capital and operating expenditures, and administrative services.
U.S. Census Bureau
American Community Survey (ACS) five-year county estimates for 2006-2010 on AI/AN educational attainment and income4 were included in the data mart. Data on county population density were from Census 2010.5 These county data were merged with population data by means of county identifiers.
Other IHS Data
IHS provided data on the number of hospital beds in each project site. The IHS Procurement Center has electronic data on IHS payments for prescribed and over-the-counter medications and health supplies. We obtained a data extract with information on IHS costs for the purchase of medications and supplies from the IHS National Supply Service Center at the end of the project period. We did not obtain the extract in time to link the cost data with utilization data and formally include the cost data in the data mart. However, this information may be used in future analyses.
Data were for AI/AN children and adults (i.e., persons aged 18 years and older) who were active users, as defined by IHS, during each fiscal year. An AI/AN active user is someone who had their community of residence in one of the 14 project sites and used services in one of the past three fiscal years (i.e., an FY2010 active user accessed services at least once during FY2008-2010). The FY2010 project population included almost 440,000 AI/ANs.g The entire data mart includes data for multiple fiscal years and for more than 540,000 persons.
The NDW extracts did not include each person’s chart number(s) or other identifiable information such as names and addresses. OIT provided CAIANH a computer-generated number that was assigned to each person based on OIT’s assigned Integrity Identification number. OIT has an algorithm for assigning each person an Integrity Identification number; IHS uses this number to determine the unique number of active users in each Service Unit. To validate our population counts for each Service Unit, we compared our population numbers for each Service Unit to the OIT population counts. The identified differences were minimal and were primarily due to registration updates that occurred over time, and to mortality. For persons assigned to more than one Service Unit by IHS, we reviewed data on their use of primary care and pharmacy services and assigned them to the Service Unit where they obtained the greater number of services.
Development of Specific Data Files for the Data Mart
The data mart includes a set of files for each fiscal year. The annual data include encounter files for different types of health services and a summary file for each person.
Examples of encounter files include:
- Hospital inpatient services (one record per admission),
- Outpatient services (one record per visit),
- ECM services (one record per visit), and
- Pharmacy services (one record per medication or supply).
The summary person file includes demographic, third-party health coverage, health status, and summary health service utilization measures derived from the encounter files, and IHS total treatment costs data for each person.
Due to the project timeline, we were not able to fully address two issues in creating the data mart. First, OIT provided each project site a data linking file that included the computer-generated identification number and chart number for each person in the project site. Project sites could use the data linking file to provide data to CAIANH if the NDW data for a project site did not include data for specific types of health services (e.g., behavioral health, home visits). Although efforts were made to obtain such data from the project sites, time constraints limited our ability to finalize these efforts during the project timeframe. As mentioned earlier, we did not have time to process the pharmacy cost data. In lieu of using data on the actual cost of specific medications, we used data on project site pharmacy costs and the number of medications prescribed to calculate the average cost of a dispensed medication in each project site.
Data Measures
In this section, we describe the data measures included in the data mart files. Additional information on the measures is provided in Appendix A.
Demographic characteristics:
NDW demographic information included each person’s age, gender, and geographic location (i.e., IHS Area, project site/IHS Service Unit, county).
Health coverage:
NDW information on third-party health coverage included information on Medicaid, Medicare, and private insurance coverage. Data on Medicare, Medicaid, and private insurance coverage start dates and end dates were used to determine eligibility during each fiscal year. Persons were classified as having Medicaid and private insurance coverage during a fiscal year if they had at least one day of coverage during the year. Those with Medicare coverage during one year were considered to be enrolled in Medicare during subsequent years.
Health status measures:
Three different types of measures were used to describe the health status of the project population. They included
- prevalence of specific conditions;
- health risk; and
- clinical measures of glycemic, blood pressure, and cholesterol control.
A nationally recognized risk-adjustment software program (Risksmart™) was used to identify conditions for which patients were treated and to assign persons a health risk score.6 The software classifies the ICD-9 diagnostic codes recorded in the IHS and CHS utilization records into categories called Diagnostic Cost Groups (DCGs). Persons may be identified as having one or more health conditions (e.g., diabetes, hypertension, different types of CVD, renal disease or failure) based on their assigned DCGs. The health risk score is an indicator of morbidity burden and summarizes a person’s risk for health resource consumption based on age, gender, and the presence of acute and chronic conditions. The score is a continuous variable typically ranging from a very small positive value for the healthiest individuals to more than 100 for the sickest. A higher risk score indicates higher morbidity burden or expected health resource use. The risk score correlates with a person’s expected health service utilization and health spending for a 12-month period. The health risk score is benchmarked to a U.S. commercial population for whom the average health risk is 1.0. This software is described in greater detail in Appendix A.
Although optimal levels for glycemic control (HbA1c), blood pressure, and low-density lipoprotein (LDL) cholesterol are determined by a patient’s health status, age, and other factors, IHS general treatment guidelines were used to report on these clinical measures using data on blood pressure and laboratory outcomes.
Health service utilization:
Measures were created to assess inpatient and outpatient service use.
Inpatient services:
We developed NDW and CHS inpatient service measures, including the number of admissions, length of stay, and type of admission (e.g., obstetric, general). The NDW includes records for obstetrical admissions and newborn admissions. We defined a normal newborn admission as an admission with a length of stay of three days or less and excluded such admissions from analysis, counting only the obstetrical admissions, similar to non-IHS administrative data. Admissions that occurred within 30 days of a previous inpatient discharge were identified as readmissions. A limited number of CHS admissions were to non-acute stay hospitals. These admissions were included with other CHS admissions.
An AHRQ algorithm for identifying hospital admissions that evidence suggests are sensitive to ambulatory services was used to identify such admissions. According to AHRQ, these admissions are sensitive to access and use of outpatient care and may be considered preventable.7 The algorithm is described in greater detail in Appendix A.
Outpatient services:
The 116 NDW outpatient clinic codes were categorized into 18 outpatient Service Categories to report on I/T outpatient utilization, except for I/T ECM services. The categories included Emergency department (ED), Urgent, Primary/general, Diabetes clinic, Specialty (i.e., Endocrinology, Cardiology, Nephrology, Other specialty services), Dental, Eye care (i.e., Optometry, Ophthalmology, Diabetic retinopathy and Retinopathy services), Podiatry/diabetes foot clinic, Behavioral health, Physical therapy, Other rehabilitation services, Home, Public health nursing, and Other office visits. Several project sites had a Diabetes clinic that provided primary care for persons with diabetes; the clinic is one of the service categories. A patient may use more than one type of service during a day (e.g., primary and dental services). For CHS services, we counted the number of dental and all other outpatient visits. The full list of outpatient clinic codes can be seen in Appendix A.
Although IPC was not fully implemented in FY2010, one IPC aim is to provide coordinated primary care services. Primary care services, as compared to specialty care services, include services provided during general office or diabetes clinic visits with physicians, physician assistants, and nurse practitioners; ECM visits by nutritionists, nurse educators, case managers, and clinical pharmacy specialists; and home visits. For this project, we define ECM services as visits conducted specifically for ECM, while recognizing that education and case management are also provided during general primary care and diabetes clinic visits, and by telephone.
The provision and documentation of ECM services varied across the project sites. In collaboration with the project’s Steering and Health Information Committees, an algorithm was developed to identify five types of I/T-provided ECM visits:
- diabetes education,h
- nutrition education,
- clinical pharmacy (i.e., visits conducted by clinical pharmacy specialists who provide services using an advanced practice pharmacy model,8
- case management, and
- other types of health education (e.g., obesity, smoking cessation).
It is important to note that ECM visits in categories 2-4 may have been provided for individuals or patient groups in a diabetes education clinic. For this reason, we also report utilization of diabetes education clinic services regardless of provider type. The process used to classify ECM visits is described in greater detail in Appendix A.
Before the project was initiated, there was limited information on pharmacist documentation of provided ECM services. Based on data analyses and information obtained from the 14 project sites, we concluded that the provision of clinical pharmacy services varied among facilities and ranged from the basic provision of education, related to medication management, to a focused disease management program provided by clinical pharmacy specialists using an advanced practice pharmacy model. The coding and documentation of these services were found to be variable and the data footprint left by pharmacists could include any combination of the three IHS pharmacist provider codes, 116 IHS clinic codes, three pharmacy-specific CPT codes, and 1100 IHS patient education codes. This variation and lack of standardization among the levels of service made it difficult to define pharmacy management ECM services. Due to project time constraints, we developed an algorithm to identify clinical pharmacy services yet realized the algorithm most likely undercounts the provision of these services. Thus, its use may not accurately reflect the total impact of advanced pharmacy services on health outcomes and disease management.
Pharmacy data on dispensed prescribed and over-the-counter medications and supplies included the medication or supply name, date dispensed, and National Drug Code (NDC) or IHS supply code. The data were used to create three pharmacy measures:
- the total number of dispensed medications and supplies;
- the number dispensed by Veterans Administration Therapeutic Medication Class; and
- the number dispensed that were diabetes-related (e.g., insulin), blood pressure-related, and cholesterol-related.
NDC codes were used to create the last two measures.
IHS treatment costs:
IHS treatment costs included costs associated with providing I/T services and CHS.
I/T cost estimates:
Algorithms were developed using the Cost Report data, NDW utilization data, project site fiscal information and utilization data not in the NDW, and expert opinion to estimate site-specific I/T service costs for 13 Cost Service Categories.i All Service Categories map to one Cost Service Category. The process is described in greater detail in Appendix A. The average IHS cost for providing I/T services was estimated for each person in the project population based on his or her service utilization and the estimated average cost of providing those services in the project site.
CHS service costs:
IHS-paid amounts for CHS services were used to estimate these costs. It is recognized the CHS may pay a percentage of the service costs.
IHS total treatment cost estimates:
We estimated IHS total treatment costs for each person by summing his or her cost estimates for I/T-provided and CHS provided services.
Health service system:
Project site health system indicators include organizational type (i.e., IHS or tribal), size of the population living in the project site, and number of I/T hospital beds in the project site. Population size and number of beds serve as indicators of the range of available I/T services, as sites with more people or hospital beds may provide a wider array of services.
County statistics:
U.S. Census data was used to create county measures of AI/AN educational attainment (e.g., percentage of adults with a high school degree or more years of education), AI/AN income (e.g., percentage of households living at or below the poverty level), and population density (e.g., urban, nonurban).
g. There were 119 FY2010 active users who were excluded whose date of birth date was missing or inaccurate. The FY2010 project population excluded persons who died during FY2008 and FY2009.
h. Diabetes education visits are visits that occurred in diabetes education clinics and are not counted in categories 2-4. The majority of these visits were conducted by nurses and health educators.
i. In two of the 14 Service Units, other fiscal data were included in the estimation process. These Service Units do not include IHS or Tribally operated inpatient services and Cost Reports were not compiled for the I/T facilities in the Service Units.