Health outcomes research is done from clinical registries or administrative databases. The aim of this work was to evaluate the concordance of the Minimum Basic Data Set (MBDS) with the DIOCLES (Descripción de la Cardiopatía Isquémica en el Territorio Español) registry and to analyze the implications of use of the MBDS in the study of acute coronary syndrome in Spain.
MethodsThrough indirect identifiers, DIOCLES was linked with MBDS and unique matches were selected. Some of most relevant variables for risk adjustment of in-hospital mortality due to acute myocardial infarction were considered. Kappa coefficient was used to evaluate the concordance; sensitivity, specificity and positive and negative predictive values to measure the validity of the MBDS, and the area under ROC (receiver operating characteristic) curve to calculate its discrimination. The results were compared among hospitals quintiles according to their contribution to DIOCLES. The influence of unmatched episodes on results was assessed by a sensitivity analysis, using looser linking criteria.
ResultsOverall, 1539 (60.85%) unique matches were achieved. The prevalence was higher in DIOCLES (acute myocardial infarction: 71.09%; Killip 3-4: 9.17%; cerebrovascular accident: 0.97%; thrombolysis: 8.64%; angioplasty: 61.92% and coronary bypass: 1.75%) than in the MBDS (P < .001). The agreement level observed was almost perfect (κ = 0.863). The MBDS showed a sensitivity of 85.10% and a specificity of 98.31%. Most results were confirmed by using sensitivity analysis (79.95% episodes matched).
ConclusionsThe MBDS can be a useful tool for outcomes research of acute coronary syndrome in Spain. The contrast of DIOCLES and MBDS with medical records could verify their validity.
Keywords
Cardiovascular disease causes more than 4 million deaths annually in Europe, mostly due to coronary heart disease,1 and although the mortality rate from ischemic heart disease has declined in recent decades in developed countries, it is still responsible for about one-third of all deaths in the population older than 35 years.2 In Spain, a considerable increase has been predicted in the incidence of acute coronary syndrome (ACS) over the next 35 to 40 years in parallel with population aging.3
Because the prevalence of ACS and the costs involved in its care represent a substantial economic and health care burden, it is important to evaluate clinical practice outcomes and investigate the factors involved. In Spain, the study of ACS has been addressed using clinical registries (CRs)4–9 and administrative databases,10–14 mainly the Minimum Basic Data Set (MBDS) of the Spanish National Health System.15 However, the concordance between the 2 sources has not been examined.
The advantages and limitations of the use of administrative databases to evaluate health outcomes have been extensively analyzed.16–18 These databases are relatively easy to obtain,19 significantly less expensive than collection of primary data via medical record review or registries specifically designed and maintained for secondary uses,20 which can suffer from systematic biases due to irregular patient inclusion,21 and offer long-term uniform information on large populations.22 However, their quality depends on the accuracy of diagnostic and procedural coding, usually performed via discharge reports,23 and different types of biases can compromise their use: some do not allow a distinction between complications and comorbidities24 or between chronic and acute diseases, and others do not include relevant clinical information, such as medications administered or the results of laboratory tests25; nonetheless, in contrast, numerous studies have confirmed their reliability26–28 and others have not found significant differences in risk-adjusted rates of mortality and readmissions due to acute myocardial infarction (AMI) or heart failure upon comparison of the results obtained with administrative databases and CRs.29
Although both sources are subject to strict anonymization rules and do not include direct patient identifiers, robust linkage methods have been developed using indirect identifiers, which allow the records corresponding to the same episode to be correlated in each data source.30–33 Using procedures of this nature, the aim of this study was to evaluate the agreement between the DIOCLES (Descripción de la Cardiopatía Isquémica en el Territorio Español [Description of Ischemic Heart Disease in the Spanish Territory]) registry, the most recent large registry performed in Spain that used random hospital selection to study the in-hospital and 6-month mortality of patients admitted for suspected ACS and to characterize their management,9 and the MBDS, in order to analyze the implications of the use of CRs and administrative databases in the study of ACS in Spain.
METHODSData SourcesThe DIOCLES is a multicenter, cross-sectional, quality controlled, observational study conducted in the first half of 2012, with the participation of 44 centers from 13 autonomous communities in Spain. The study prospectively collected data from patients aged ≥ 18 years consecutively admitted with suspected ACS. Demographic variables, risk factors, medical history, clinical presentation, complications, in-hospital mortality, and prehospital, in-hospital, and discharge management were recorded. At 6 months, the occurrence of death and its date and cause were determined by telephone interview. The study population comprised 2557 patients out of the 3059 evaluated, after 502 were excluded for various reasons.9
The 2012 MBDS contains 400 861 hospitalization episodes recorded in hospitals of the Spanish National Health System with a primary diagnosis of cardiovascular diseases or without that diagnosis and discharged from a cardiology or cardiac surgery department. The MBDS includes information on patients’ demographic characteristics and on variables related to the health care and clinical process, to patients’ diseases and conditions, and to the procedures performed during their care, coded by the International Classification of Diseases, Ninth Revision, Clinical Modification.15
Linkage ProcedureThe DIOCLES registry and the MBDS differ in their scope–DIOCLES includes 6-month follow-up after discharge, whereas the MBDS includes only the hospitalization episode–and in their data models. Given that these models lack common attributes that allow the records corresponding to the same episode to be unequivocally matched (direct identifiers), different indirect identifiers were tested to link them.32 The identifier found to obtain the most unique matches was selected. This model comprised the hospital code, dates of admission and discharge, and patient age and sex, because date of birth was not recorded in DIOCLES.
Statistical AnalysisA descriptive analysis was performed of the variables studied; discrete variables are expressed as No. (%), and quantitative variables, whose normality was assessed by the Shapiro-Wilk test, as the mean ± standard deviation. To compare discrete variables, the chi-square test or Fisher exact test was applied, as necessary, with the Student t test or Mann-Whitney U test used for quantitative variables.
The Cohen kappa coefficient was used to evaluate interobserver agreement. Sensitivity, specificity, and positive and negative predictive values were calculated to measure the validity of the MBDS. Among the variables common to the 2 sources, this analysis included some of those most relevant for the adjustment of the risk of hospital mortality due to AMI12 (Table 1). To estimate the discrimination power of the MBDS, considering the values of the analyzed variables together, the area under the ROC (receiver operating characteristic) curve was used; the homogeneity of κ and of the area under the ROC curve were examined with the chi-square test. The areas under the ROC curve were compared by hospitals grouped into quintiles according to the number of patients included in DIOCLES. The dependence between observations in the same patient was analyzed by the intraclass correlation coefficient (ρ) and the variance inflation factor.34
Diagnoses and Procedures Analyzed. Prevalence According to the Linkage Result and Data Source
Diagnoses and procedures | ICD-9-CM codes | DIOCLES | P | MBDS (matched) | P | |
---|---|---|---|---|---|---|
Matched | Unmatched | |||||
Age, y | 66.95 ± 12.99 | 66.58 ± 13.03 | .486 | |||
Women | 375 (24.37) | 271 (27.37) | .099 | |||
AMI | 410.*1 | 1094 (71.09) | 652 (65.86) | .006 | 1038 (67.45) | .032 |
Worse Killip (3 or 4) | 427.41, 427.42, 427.5, 518.4, 518.5, 518.51, 518.52, 518.53, 518.81, 518.82, 518.83, 518.84,785.50, 785.51, 798.0, 798.1, 798.2, 798.9, 799.01, 799.02, 998.01 | 141 (9.17) | 99 (10.00) | .527 | 110 (7.15) | < .001 |
Stroke | 094.87, 430, 431, 432.0, 432.1, 432.9,433.01, 433.11, 433.21, 433.31, 433.81, 433.91, 434.01, 434.11, 434.91, 436 | 15 (0.97) | 11 (1.11) | .896 | 12 (0.78) | < .001 |
Thrombolysis | V45.88, 99.10 | 133 (8.64) | 89 (8.99) | .818 | 106 (6.89) | < .001 |
Angioplasty | 00.66, 36.01, 36.02, 36.05, 36.06, 36.07 | 953 (61.92) | 542 (54.75) | .004 | 836 (54.32) | < .001 |
Coronary bypass | 36.1* | 27 (1.75) | 40 (4.04) | .008 | 25 (1.62) | < .001 |
Episodes, No. | 1539 | 990 |
AMI, acute myocardial infarction; DIOCLES, Descripción de la Cardiopatía Isquémica en el Territorio Español (Description of Ischemic Heart Disease in the Spanish Territory); ICD-9-CM, International Classification of Diseases, Ninth Revision, Clinical Modification; MBDS, Minimum Basic Data Set; Worse Killip (3 or 4), severe heart failure with acute pulmonary edema or cardiogenic shock during admission.
Unless otherwise indicated, the data represent No. (%) or mean ± standard deviation.
The degree of agreement was interpreted according to the Landis and Koch scale.35 Statistical significance was defined by P < .05 and analyses were performed with Epidat v3.1 and STATA 13.0.
Sensitivity AnalysisThe impact of match failures on the results was evaluated through a sensitivity analysis by allowing in the linkage differences of up to 2 days in the admission and discharge dates between DIOCLES and the MBDS and by selecting, among multiple matches, the episodes showing fewer discrepancies among the conditions studied.
In addition, as an alternative linkage procedure, MBDS episodes with ACS as the primary diagnosis and an admission date in the first half of 2012 were selected, and the hospital code, age, sex, and dates of admission and discharge were used as indirect identifiers to match these episodes with those recorded in DIOCLES. On the results obtained through this procedure, a sensitivity analysis was also carried out following the same criteria as in the original procedure.
RESULTSOf the 2557 patients registered in DIOCLES, 28 were excluded (1.1%), 17 due to a lack of at least 1 of the variables used for the linkage and 11 corresponding to a private hospital not included in the MBDS of the Spanish National Health System. Of the remaining 2529 patients, 1539 (60.85%) were unequivocally matched (Figure 1). These patients constituted the study population, had an average age of 66.95 ± 13.03 years, and comprised 375 women (24.4%).
Linkage results for DIOCLES and the MBDS. Number of episodes excluded and matched in each phase of the linkage process. DIOCLES, Descripción de la Cardiopatía Isquémica en el Territorio Español (Description of Ischemic Heart Disease in the Spanish Territory); MBDS, Minimum Basic Data Set.
Matched DIOCLES records showed a higher proportion of AMI and angioplasties and a lower proportion of coronary bypass than nonmatched records, and no significant differences were found in age, the proportion of women, or the other conditions studied. Among the matched episodes, the prevalence of all conditions was significantly higher in DIOCLES than in the MBDS (Table 1).
Overall, the degree of agreement between the 2 data sources was almost perfect (κ = 0.863), with 8766 matches observed (94.93%), and ranged from moderate for worst Killip (classes 3 and 4), substantial for stroke, thrombolysis, and angioplasty, and almost perfect for AMI and coronary bypass. The MBDS, considering all of the variables, showed a sensitivity of 85.10% and a specificity of 98.31%, with a positive predictive value of 94.55% and a negative predictive value of 95.05% (Table 2); there was no dependence between observations in the same patient (ρ ≤ 0.04, variance inflation factor ≤ 1.06). The 95% confidence intervals of the crude general estimates and those adjusted by the methods used (ρ and variance inflation factor) are shown in Table 1 of the supplementary material.
the DIOCLES (Descripción de la Cardiopatía Isquémica en el Territorio Español [Description of Ischemic Heart Disease in the Spanish Territory]) vs the Minimum Basic Data Set
Diagnoses and procedures | κ (95%CI) | Sensitivity, % (95%CI) | Specificity, % (95%CI) | PPV, % (95%CI) | NPV, % (95%CI) |
---|---|---|---|---|---|
AMI | 0.814 (0.783-0.846) | 91.86 (90.20-93.53) | 92.58 (90.04-95.13) | 96.82 (95.71-97.94) | 82.24 (78.79-85.68) |
Worse Killip (3 or 4) | 0.45 (0.37-0.53) | 43.97 (35.42-52.52) | 96.57 (95.58-97.56) | 56.36 (46.64-66.09) | 94.47 (93.25-95.69) |
Stroke | 0.739 (0.552-0.925) | 66.67 (39.48-93.86) | 99.87 (99.65-100.00) | 83.33 (58.08-100.00) | 99.67 (99.35-99.99) |
Thrombolysis | 0.742 (0.678-0.806) | 68.42 (60.15-76.70) | 98.93 (98.36-99.51) | 85.85 (78.74-92.96) | 97.07 (96.16-97.98) |
Angioplasty | 0.8 (0.77-0.83) | 85.94 (83.68-88.20) | 97.10 (95.65-98.54) | 97.97 (96.95-98.98) | 80.94 (77.96-83.91) |
Coronary bypass | 0.922 (0.845-0.998) | 88.89 (75.18-100.00) | 99.93 (99.77-100.00) | 96.00 (86.32-100.00) | 99.80 (99.54-100.00) |
Total (all 6 conditions) | 0.863 (0.850-0.875) | 85.10 (83.64-86.56) | 98.31 (98.00-98.62) | 94.55 (93.56-95.53) | 95.05 (94.54-95.56) |
95%CI, 95% confidence interval; AMI, acute myocardial infarction; NPV, negative predictive value; PPV, positive predictive value; Worse Killip (3 or 4), severe heart failure with acute pulmonary edema or cardiogenic shock during admission.
According to their participation in DIOCLES, the cutoff points for hospital grouping were as follows: first quintile = 41 patients; second quintile = 64; third quintile = 80; and fourth quintile = 100. The area under the ROC curve was significantly different by quintile (P = .007); the highest corresponded to quintiles 5 and 3 and the lowest to quintiles 1 and 4 (Figure 2).
Comparison of the areas under the ROC (receiver operating characteristic) curve by hospital quintile (Q) according to the number of patients registered in DIOCLES (Descripción de la Cardiopatía Isquémica en el Territorio Español [Description of Ischemic Heart Disease in the Spanish Territory]).
In the original sensitivity analysis (Table 3), 2022 unique matches (79.95%) were obtained, selected from 2352 coincidences, and there were no significant differences in the interobserver agreement for the conditions analyzed vs the results obtained with the original linkage procedure, except for AMI and angioplasty, or in either their prevalence in DIOCLES or in the MBDS, except also for AMI and angioplasty (Table 4).
Sensitivity Analysis. Comparison of the Diagnoses and Procedures Recorded in DIOCLES (Descripción de la Cardiopatía Isquémica en el Territorio Español [Description of Ischemic Heart Disease in the Spanish Territory]) vs the Minimum Basic Data Set
Diagnoses and procedures | κ (95%CI) | Sensitivity, % (95%CI) | Specificity, % (95%CI) | PPV, % (95%CI) | NPV, % (95%CI) |
---|---|---|---|---|---|
AMI | 0.740 (0.709-0.770) | 86.29 (84.46-88.11) | 93.17 (91.06-95.27) | 96.77 (95.75-97.78) | 74.14 (70.95-77.33) |
Worse Killip (3 or 4) | 0.45 (0.37-0.53) | 41.62 (34.25-48.99) | 97.01 (96.20-97.81) | 58.33 (49.94-67.12) | 94.29 (93.21-95.36) |
Stroke | 0.659 (0.473-0.846) | 60.00 (36.03-83.97) | 99.85 (99.66-100.00) | 80.00 (56.42-100.00) | 99.60 (99.30-99.90) |
Thrombolysis | 0.686 (0.624-0.747) | 60.45 (52.97-67.94) | 99.02 (98.55-99.50) | 85.60 (79.05-92.15) | 96.31 (95.44-97.18) |
Angioplasty | 0.753 (0.725-0.781) | 81.56 (79.34-83.77) | 97.01 (95.77-98.25) | 97.64 (96.66-98.62) | 77.57 (74.94-80.20) |
Coronary bypass | 0.871 (0.788-0.954) | 81.58 (67.94-95.22) | 99.90 (99.73-100.00) | 93.94 (84.28-100.00) | 99.65 (99.36-99.93) |
Total (all 6 conditions) | 0.826 (0.814-83.79) | 79.98 (78.55-81.41) | 98.42 (98.16-98.69) | 94.48 (93.58-95.38) | 93.57 (93.08-94.07) |
95%CI, 95% confidence interval; AMI, acute myocardial infarction; NPV, negative predictive value; PPV, positive predictive value; Worse Killip (3 or 4), severe heart failure with acute pulmonary edema or cardiogenic shock during admission.
Results of the Comparison of Concordances and Prevalences Between DIOCLES and the MBDS
Diagnoses and procedures | κ coefficient, P | Coefficient of prevalence in the DIOCLES, P | Coefficient of prevalence in the MBDS, P |
---|---|---|---|
AMI | .009 | .649 | .004 |
Worse Killip (3 or 4) | .907 | .99 | .467 |
Stroke | .558 | .965 | .897 |
Thrombolysis | .218 | .907 | .397 |
Angioplasty | .023 | .336 | .02 |
Coronary bypass | .379 | .783 | .986 |
AMI, acute myocardial infarction; DIOCLES, Descripción de la Cardiopatía Isquémica en el Territorio Español (Description of Ischemic Heart Disease in the Spanish Territory); MBDS, Minimum Basic Data Set; Worse Killip (3 or 4), severe heart failure with acute pulmonary edema or cardiogenic shock during admission.
The results of the alternative linkage procedure and its sensitivity analysis are shown in Table 2 of the supplementary material and in Table 3 of the supplementary material.
DISCUSSIONOur results indicate an almost perfect agreement between DIOCLES and the MBDS and a high ability of the latter to discriminate episodes with the same diagnoses and procedures as the corresponding DIOCLES record (area under the ROC curve > 0.9). However, significant differences were found when hospitals were grouped by quintiles according to their contribution to DIOCLES, indicating an inverse association between registry quality and patient volume.
The concordance was greater than that published by Ribera et al.36 for the diagnoses and procedures included in the ARCA study (evaluation of the risk of coronary surgery in Catalonia), which varied widely from cardiogenic shock (κ = 0.16) to the use of extracorporeal circulation (κ = 0.79),37 the findings of Cavero-Carbonell et al.38 for the identification of congenital anomalies (κ = 0.70), and those of Hernández Medrano et al.39 for cerebrovascular disease (concordance rate, 81.87%); the last 2 examples used medical record review as the reference model. The prevalence was higher in DIOCLES than in the MBDS for all the conditions analyzed, and a κ < 0.7 was only found for the worst Killip classes, which include cardiogenic shock. This finding is similar to that of Lambert et al.40 in their evaluation of the accuracy of cardiovascular disease coding in a Canadian administrative database vs medical records, although their result (κ = 0.667) showed higher agreement than ours.
The use of administrative databases to investigate health outcomes offers important advantages, but their usefulness ultimately depends on the clinical validity and consistency of the data, which, as far as we know, have not previously been analyzed in relation to the study of ACS in Spain. The observed variability indicates that certain conditions (AMI, angioplasty, and coronary bypass) were coded with greater accuracy than others. Thus, there seems to be room for improvement in hospital coding and, consequently, in the quality of the MBDS, although the lack of concordance may also result from registration errors in DIOCLES.21
LimitationsMore than 85% of the diagnoses and procedures analyzed had a probability of being true positives if present in the MBDS and more than 98% had a probability of being true negatives if not. Thus, the present work indicates a threshold for the validity of the MBDS as a data source for the study of ACS in Spain, although its main limitation is the percentage of matches that could not be resolved.
Because the use of alternative linkage procedures did not reduce the percentage of failed matches, which ranged between 55% and 80%, the difficulty of identifying the episodes included in DIOCLES in the MBDS must be due to registration inaccuracies in either of the 2 sources. To our knowledge, the inclusion of age instead of date of birth in DIOCLES is a significant factor, but there may be others.
Dissociation is a legal requirement for personal data protection that prevents the use of direct identifiers to link CRs and administrative databases. To resolve this restriction, indirect procedures have been developed that achieved unique matches in proportions ranging from 58.1% (Setoguchi et al.33) to 87.5% (Pasquali et al.31), 87.9% (Austin et al.30), and 90.8% (Hammill et al.32). The reasons for the match failures32 have been described; although most are determined by the study context and cannot be applied to our situation, it is likely that, in DIOCLES, the date of patient arrival at or release from the emergency department or cardiology department was sometimes recorded as the date of patient admission or discharge, which might not coincide with that of the MBDS (which reflects when the patient is admitted to the hospital and occupies a hospital bed41). The reasons for the discrepancies found by Sarkies et al.42 between the use of administrative data and observational registries for the calculation of the average stay per hospitalization episode would support this explanation, consistent with the increase in the percentage of matches to practically 80% when registry differences of up to 2 days are allowed for the dates of admission and discharge.
In this case, the sensitivity analysis revealed that an increase in the number of cases by almost 32% did not significantly modify the concordance between DIOCLES and the MBDS, except for AMI and angioplasty. However, the concordance is still almost perfect, and there are only differences in the proportion of patients also with AMI and angioplasty in the MBDS. Thus, the risk of a selection bias as a result of mismatches seems unlikely and, even so, the sensitivity would still be around 80% and the specificity above 98%.
If the completeness of the sample is deemed important, the failure to detect all of the episodes recorded in the DIOCLES would affect the use of the MBDS in the study of ACS-related health care and could be due to match failures, an issue that should be resolved using the original clinical documentation.
The present work has other limitations. One limitation might be due to the participation in DIOCLES of a private hospital, whose episodes could not be included because they were not registered in the MBDS. Another might be due to the selection of the variables studied; however, because some of the main risk factors for the adjustment of in-hospital mortality due to AMI were considered as diagnoses and procedures, there seems to be no reason to believe that a different selection would have altered our observations. Another, more important, limitation is the result of not having used the primary information on the patients included in DIOCLES as reflected in the medical records, whose future comparison with the data sources used in this work is considered essential to verify the validity of the MBDS.
Irrespective of these considerations, the linkage of the CRs with the MBDS, as discussed here, points more to complementarity rather than to antagonism, to the extent that CRs can systematically benefit from the inclusion of previously validated administrative data (eg, patients’ date of birth instead of their age), setting a course that, due to the rapid growth in information technologies, will soon increase the availability of large repositories of clinical and administrative information for secondary uses related both to biomedical research and health care management.
CONCLUSIONSWith substantial agreement, a modest probability of false positives, and a low rate of false negatives, the MBDS can be a useful tool for investigating ACS results in Spain. Nonetheless, there seems to be room for improvement in both hospital coding quality and the agreement between the observational registry and administrative data for admission and discharge dates and in the design of the CRs. A comparison of the DIOCLES and MBDS with medical records could verify its validity.
FUNDINGThe research for this article was funded through an unconditional grant from the Fundación Interhospitalaria para Investigación Cardiovascular (Interhospital Foundation for Cardiovascular Research). The DIOCLES study was funded by an unrestricted grant from Daiichi-Sankyo-Lilly Laboratories.
CONFLICTS OF INTERESTNone declared.
- –
The use of administrative databases has been questioned, due to the existence of biases that could compromise their use for outcomes research, although other studies have shown their validity.
- –
The study of ACS in Spain has been addressed using CRs and administrative databases, but the concordance between the 2 information sources has not yet been assessed.
- –
Although the available data are anonymized, there are robust methods to link CRs and administrative databases through indirect identifiers allowing their comparison.
- –
The agreement between DIOCLES and the MBDS was almost perfect and the latter showed a strong ability to discriminate episodes with the same diagnoses and procedures as DIOCLES.
- –
The presence of differences in the discrimination power of the MBDS between hospital quintiles according to their contribution to DIOCLES indicates an inverse association between registry quality and patient volume.
- –
The MBDS can be a useful tool for the investigation of ACS outcomes in Spain. A comparison of DIOCLES and the MBDS with medical records could verify its validity.
The Spanish Society of Cardiology has supported the performance of this study. The directors of the DIOCLES project provided the registry data, after anonymization of patient information. The Spanish Ministry of Health, Social Services, and Equality, through the General Directorate of Public Health, Quality, and Innovation, transferred the MBDS database to the Spanish Society of Cardiology.