AJCC
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


American Journal of Critical Care. 2009;18: 58-64 doi:10.4037/ajcc2009757
Copyright © 2009 by the American Association of Critical-Care Nurses.
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Respond to This Article
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Donahoe, L.
Right arrow Articles by Cook, D. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Donahoe, L.
Right arrow Articles by Cook, D. J.

Increasing Reliability of APACHE II Scores in a Medical-Surgical Intensive Care Unit: A Quality Improvement Study

By Laura Donahoe, BHSc, MD, Ellen McDonald, RN, Michelle E. Kho, BHSc(PT), MSc, Margaret Maclennan, RN, Paul W. Stratford, MSc and Deborah J. Cook, MD, MSc, FRCPC. Laura Donahoe is a first-year general surgery resident at Dalhousie University in Halifax, Nova Scotia, Canada. Ellen McDonald is a registered nurse and critical care research coordinator at St Joseph’s Healthcare, an academic teaching hospital in Hamilton, Ontario, Canada. Michelle E. Kho is a registered physical therapist and a PhD candidate in the Clinical Health Sciences, Health Research Methodology Program, at McMaster University, Hamilton, Ontario, Canada. Margaret Maclennan is a registered nurse and project leader in Clinical Informatics at St Joseph’s Healthcare in Hamilton, Ontario, Canada. Paul W. Stratford is a professor of physiotherapy in the School of Rehabilitation Sciences and an associate member of the Department of Clinical Epidemiology and Biostatistics at McMaster University. Deborah J. Cook is a practicing intensivist, clinical trialist, and professor of medicine and clinical epidemiology and biostatistics at McMaster University.

Corresponding author: Michelle Kho, McMaster University, Program in Health Research Methodology, Department of Clinical Epidemiology and Biostatistics, 1200 Main Street West, MDCL 3200, Hamilton, ON, Canada, L8N 3Z5 (e-mail: khome{at}mcmaster.ca).


    Abstract
 Top
 Abstract
 Materials and Methods
 Results
 Discussion
 Conclusion
 References
 
Background Given their clinical, research, and administrative purposes, scores on the Acute Physiology and Chronic Health Evaluation (APACHE) II should be reliable, whether calculated by health care personnel or a clinical information system.

Objective To determine reliability of APACHE II scores calculated by a clinical information system and by health care personnel before and after a multifaceted quality improvement intervention.

Methods APACHE II scores of 37 consecutive patients admitted to a closed, 15-bed, university-affiliated intensive care unit were collected by a research coordinator, a database clerk, and a clinical information system. After a quality improvement intervention focused on health care personnel and the clinical information system, the same methods were used to collect data on 32 consecutive patients. The research coordinator and the clerk did not know each other’s scores or the information system’s score. The data analyst did not know the source of the scores until analysis was complete.

Results APACHE II scores obtained by the clerk and the research coordinator were highly reliable (intraclass correlation coefficient, 0.88 before vs 0.80 after intervention; P = .25). No significant changes were detected after the intervention; however, compared with scores of the research coordinator, the overall reliability of APACHE II scores calculated by the clinical information system improved (intraclass correlation coefficient, 0.24 before intervention vs 0.91 after intervention, P < .001).

Conclusions After completion of a quality improvement intervention, health care personnel and a computerized clinical information system calculated sufficiently reliable APACHE II scores for clinical, research, and administrative purposes.


Electronic medical records are ubiquitous today, and many include patients’ severity-of-illness scores. In intensive care units (ICUs), the Acute Physiology and Chronic Health Evaluation (APACHE) II is one of the most widely used scoring systems to describe illness severity.1 The APACHE II score consists of 3 summed components: the acute physiology score (APS), age, and the chronic health index (CHI). The APS includes clinical and laboratory measures and the score on the Glasgow Coma Scale (GCS). Total scores range from 0 to 71; higher scores reflect more severe illness. APACHE II scores are widely used for clinical, research, and administrative purposes. Previous studies indicated that using diverse raters,27 with different forms of instruction and training,37 resulted in variable interrater reliability of APACHE II scores.8,9 We were unable to find any reports of analysis of the reliability of APACHE II scores calculated by a commercially available electronic medical record system.

In a previous study,9 we documented that baseline APACHE II scores collected by 2 research clerks and an ICU research coordinator had excellent reliability (intraclass correlation coefficient [ICC], 0.90). However, 2 APACHE II components, the CHI, and the verbal component (GCS-V) of the GCS score were less reliable (ICC, 0.65 and 0.40, respectively). Polderman et al7 improved the reliability of APACHE II scores from 0.71 to 0.85 through standardized data collection and specific training sessions. Using principles similar to those applied by Polderman et al,7 we sought to improve the less reliable components of the APACHE II score in our ICU.

This prospective before-and-after study had 3 objectives: (1) document the reliability of APACHE II scores recorded by a clinical information system, a database clerk, and a research coordinator, (2) implement a multifaceted, multidisciplinary quality improvement intervention to improve the reliability of APACHE II scores, and (3) reevaluate the reliability of APACHE II scores after the intervention.


    Materials and Methods
 Top
 Abstract
 Materials and Methods
 Results
 Discussion
 Conclusion
 References
 
This study was conducted in a university-affiliated, 15-bed, medical-surgical ICU at St Joseph’s Healthcare in Hamilton, Ontario. In this setting, APACHE II scores are calculated automatically by the bedside clinical information system for clinical purposes and by other personnel for research and administrative purposes.


The APACHE II score consists of the acute physiology score, age, and the chronic health index.

 

Baseline Data Collection (2 Months)
We previously reported the data collection methods for the baseline phase of the study.9 Briefly, we recorded APACHE II scores calculated for consecutive patients admitted to the ICU by a database clerk and a research coordinator. We excluded patients if their ICU stay was less than 24 hours. In addition, we collected APACHE II scores from our bedside clinical information system, Care-Vue Classic (CareVue, Philips, Andover, Massachusetts), which provides new baseline data for this report. CareVue is an electronic medical record system for critically ill patients that collects data on vital signs, ventilation settings, intravenous infusions, nursing and medical assessments, and laboratory values. Data are uploaded hourly unless otherwise specified by bedside nurses. APACHE II data elements were set to autocalculate daily.

Quality Improvement Intervention (5 Months)
The focus of the intervention was improving data collected by health care personnel and the clinical information system. We divided the intervention into 6 categories (Table 1Go), including reconfiguration of the CareVue system and 5 interventions10 specifically targeted to nursing staff, who are responsible for ensuring that APACHE II data are available in the clinical information system as part of the nurses’ routine assessment and charting. In the CareVue reconfiguration, we changed the visibility of the CHI questions on the APACHE II to the initial default screen, on the basis of input from our ICU working group, and reset the timing of the score to run 24 hours after the patient was admitted to the ICU. As a second part of the Care-Vue reconfiguration, we modified the calculation variables. Once we identified significant differences in reliability between the CareVue scores and the research coordinator’s scores, we systematically examined each of the APACHE II data points contributing to the calculation in CareVue. Through this process, we identified 7 specific data elements that required modifications in CareVue (Table 1Go).


View this table:
[in this window]
[in a new window]

 
Table 1 Overview of quality improvement interventions to improve reliability of scores on the Acute Physiology and Chronic Health Evaluation (APACHE) II

 

The intervention focused on improving data collected by health care personnel and the clinical information system.

 

We implemented 5 different multimodal interventions aimed at nursing staff to improve documentation of the CHI and the GCS-V score: education, point-of-care electronic reminders, prompts from local opinion leaders, provision of audit and feedback, and policy dissemination.10 Our nurse educator and nurse informatician conducted in-service training sessions on how to document CHI components and GCS-V scores in intubated patients. Further, we posted information sheets and electronic resources at each computer workstation to reinforce the training sessions. We programmed point-of-care electronic reminders to be sent twice daily to reinforce CHI documentation. For each new admission, local nurse opinion leaders and charge nurse champions prompted bedside nurses to complete CHI documentation, and the nurse informatician provided individual audit and feedback to each bedside nurse. Finally, our ICU working group codified the documentation requirements for the APACHE II score as a formal policy through the hospital’s intranet.


Interventions targeted at chronic health and GCS items did not result in significant changes.

 

After 5 months, we gradually decreased the frequency of all interventions until they ceased. The APACHE II autocalculation continued to provide point-of-care scores, per our written ICU policy. During the intervention, APACHE II scores were calculated as usual by CareVue and by the database clerk; however, the research coordinator collected APACHE II scores only for patients enrolled in clinical studies. For the purposes of this study, while the intervention was occurring, we did not analyze any APACHE II scores.

Reevaluation (3 Months)
The data collection methods during the reevaluation phase were the same as those used before the intervention. A different data clerk, who was blinded to the APACHE II score calculations and source, entered information from the 3 different raters into a database. The database clerk and research coordinator had no knowledge of each other’s scores or of the CareVue scores before and after the intervention. Because human performance can improve when people are aware that their behavior is being observed (the Hawthorne effect) or evaluated (the sentinel effect), both before and after the intervention, the bedside nurses were unaware of the conduct of the study. However, because the purpose of the intervention was to improve APACHE II documentation, we explicitly exposed the bedside nurses to the 6 components of the quality improvement intervention.

Patient care was at the discretion of the ICU team throughout the study. This study was approved by the St Joseph’s Health Care Research Ethics Board, which waived the need for informed consent because the study did not affect patient care.

Sample Size Calculation and Data Analysis
We calculated interrater reliability by using the ICC, and we calculated ICCs for the APACHE II (total score, APS, age, and CHI) and GCS score (total, verbal, motor, and eyes) components. For each phase, we calculated a sample size of 32 patients to test whether an obtained reliability of 0.90 exceeded a reliability of 0.80, given 3 raters, a 1-tailed {alpha}=.05, and a power of 80%.11 To ensure we had sufficient observations, we enrolled an additional 5 patients. Reliability was classified as follows: slight, 0.0–0.20; fair, 0.21–0.40; moderate, 0.41–0.60; substantial, 0.61–0.80; and almost perfect, 0.81–1.00.12

We compared ICCs between each pair of raters before and after the intervention.13 We explored differences in ICC from before to after the intervention by using the Bonferroni correction (for 10 comparisons, our critical P value was .005). All tests were 2 sided. We calculated 95% confidence intervals where appropriate.

We calculated descriptive statistics and used t tests and Wilcoxon rank-sum tests to compare continuous data and {chi}2 tests to compare proportions. We used SPSS (version 14, SPSS Inc, Chicago, Illinois) for all analyses. The data analyst had no knowledge of the source of the scores until analyses were complete.


    Results
 Top
 Abstract
 Materials and Methods
 Results
 Discussion
 Conclusion
 References
 
We enrolled 37 patients before the intervention and 32 patients after the intervention. We detected no significant differences in patients’ characteristics from before to after the intervention (Table 2Go). Both before and after the intervention, the reliability of the APACHE II scores generated by the database clerk and the research coordinator remained almost perfect, with no significant change over time. However, we did detect initial deficiencies in the reliability of the APACHE II scores generated by the CareVue system. Before the intervention, the reliability of the CareVue APACHE II scores was fair compared with the reliability of the scores generated by the database clerk (ICC [95% confidence interval], 0.29 [0, 0.61]) and the research coordinator (0.24 [0, 0.59]); however, the reliability between the scores from the database clerk and the research coordinator was classified as almost perfect12 at 0.88 (0.77, 0.94). After the intervention, the reliability of the CareVue APACHE II scores compared with the scores from the research coordinator was almost perfect and significantly improved at 0.91 (0.82, 0.95). Compared with the scores of the database clerk, the CareVue APACHE II scores also improved in reliability (P =.03; Table 3Go).


View this table:
[in this window]
[in a new window]

 
Table 2 Patients’ characteristics

 

View this table:
[in this window]
[in a new window]

 
Table 3 Interrater reliability before and after the interventiona

 
When the database clerk and the research coordinator were compared, we did not detect any significant improvements in reliability of either the CHI or the GCS-V subscales of the APACHE II scores after our multifaceted interventions (Table 3Go). The reliability of the CHI before the intervention was 0.65 (0.42, 0.80), whereas the reliability after the intervention was 0.35 (0.01, 0.62). The reliability of the GCS-V score before the intervention was 0.44 (0.11, 0.67), which improved to 0.59 (0.31, 0.77), although this difference was not significant.

The remaining major subscales of the APACHE II score, age, APS, and GCS, had no significant changes in reliability between the database clerk and the research coordinator from before to after the intervention (Table 3Go). Age scores were almost perfect, and although the APS and GCS reliability scores were somewhat lower after the intervention, this difference was not significant. Compared with before the intervention, the Eye subcomponent of the GCS score had significantly worse reliability after the intervention at 0.51 (0.20, 0.73); however, the reliability was still moderate. Following data collection, we examined the distribution of the CHI and each of the GCS components across patients and found little variability in scores, a situation that might decrease the ability to detect change over time.


    Discussion
 Top
 Abstract
 Materials and Methods
 Results
 Discussion
 Conclusion
 References
 
After a multifaceted, multidisciplinary intervention, we detected significant improvement in the reliability of APACHE II scores calculated by a clinical information system. We also found that total APACHE II scores obtained by a database clerk and a research coordinator are highly reliable and consistent over time. Likewise, all of the major components of the APACHE II score except the CHI were reasonably reliable and consistent over time. After our specific interventions targeted at improving the CHI component and GCS-V subscales, however, we did not detect significant changes.

Numerous strategies to change behavior have been suggested to improve the quality of health care.10 We selected interventions that were most likely to address the problems we observed, building on previous work on behavior change in our ICU as well as the published literature on practice improvement as summarized in several systematic reviews and adapted to our limited budget and setting. Our quality improvement intervention focused primarily on bedside nurses, because they directly influence patient care and are heavily involved in documenting patients’ illness. Key components of successful quality improvement projects are leaders and champions.14 In our study, leadership was provided by a nurse informatician, nurse manager, and nurse educator; champions were charge nurses who encouraged and modeled accurate and timely documentation of the CHI component of the APACHE II score, which required input from bedside nurses. Our multifaceted approach included educational meetings and materials, point-of-care electronic reminders, local opinion leaders, prompts, auditing, and feedback. The goals of the project were encoded in a formal unit policy that was posted and endorsed by the multidisciplinary CareVue quality team and ICU working group.

Our study has limitations. In any multifaceted intervention of this type, it is difficult to determine which component was responsible for the greatest change in behavior. We hypothesize that the reconfiguration of CareVue had the greatest impact, and of the other components, we think that the reminders, prompts from peer leaders, auditing, and feedback had the most important role in increasing the completion of the CHI questions. Certainly, changing the CareVue system calculations through automation was important. In this study we did not use a clinical decision support system, a powerful method of changing behavior,15 because we were not using an information system to support clinical decision making for patient care. Neither the CHI nor the GCS-V subcomponents of the APACHE II score were designed to discriminate among patients; thus, documenting significant improvements in the reliability of these subcomponents may not be possible because of the minimal variation across patients. Because we used a computerized clinical information system, our results are not applicable to paper-based bedside records of measures of illness severity, and the reliability of APACHE II scores would most likely be lower among newly hired personnel. Finally, although our results are generalizable to similar medical-surgical ICUs with a wide variety of admission diagnoses, they may not necessarily be generalizable to exclusively neurosurgery, cardiac surgery, or trauma ICUs.


Computerized charting systems are integral to health care institutions and must perform reliably.

 

Strengths of our project include the consistent team that participated in all 3 phases of the research program. We involved professionals from many disciplines, including staff from nursing informatics, management, physicians, and research personnel, thereby ensuring that we incorporated diverse suggestions from a broad range of perspectives in designing our intervention. We minimized selection bias by enrolling consecutive patients who met entry criteria before and after the intervention. We conducted this study prospectively, thus avoiding errors and incomplete records associated with retrospective chart review. No data were missing. We used blinded data abstraction, entry, and analysis. The implementation strategies we used ranged from simple to complex and were readily available, well accepted, and easily applied in the usual practice setting, and thereby enhancing the feasibility of these interventions elsewhere for similar initiatives related to quality of care.

Today, many members of the health care team depend on computerized devices and systems that collect, transform, display, and analyze data for multiple purposes. As computerized charting systems are now integral to health care institutions, the functions that they perform must be reliable. We found that the clinical information system initially generated APACHE II scores that were insufficiently reliable. After a multifaceted intervention designed to promote accurate and complete charting by health care personnel on a reconfigured CareVue system, APACHE II scores generated by the clinical information system became sufficiently reliable for clinical, research, and administrative purposes, compared with values obtained by health care personnel. Thus, time that data clerks and research personnel would usually spend calculating APACHE II scores could be freed for other activities.


    Conclusion
 Top
 Abstract
 Materials and Methods
 Results
 Discussion
 Conclusion
 References
 
After a multifaceted, multidisciplinary educational intervention, including reconfiguration of a clinical information system, we showed that personnel and computerized charting systems can calculate APACHE II scores with suitable reliability for multiple purposes.


    ACKNOWLEDGMENTS
 
We thank Ms Eley Wisniewski and Mr Bruce Preston for assistance with data extraction from the Hamilton Regional Critical Care Database and Mr Michael Fazio for data entry.

FINANCIAL DISCLOSURES
This study was supported by the Father Sean O’Sullivan Research Center. L. Donahoe was funded by a McMaster University Bachelor of Health Sciences Research Scholarship. M. Kho is funded by a Canadian Institutes of Health Research Fellowship Award through the Clinical Research Initiative. D. Cook is a research chair of the Canadian Institutes of Health Research.

eLetters
Now that you’ve read the article, create or contribute to an online discussion on this topic. Visit www.ajcconline.org and click "Respond to This Article" in either the full-text or PDF view of the article.

To purchase electronic or print reprints, contact The InnoVision Group, 101 Columbia, Aliso Viejo, CA 92656. Phone, (800) 809-2273 or (949) 362-2050 (ext 532); fax, (949) 362-2049; e-mail, reprints{at}aacn.org.


    REFERENCES
 Top
 Abstract
 Materials and Methods
 Results
 Discussion
 Conclusion
 References
 

  1. Knaus WA, Draper EA, Wagner DP, Zimmerman JE. APACHE II: a severity of disease classification system. Crit Care Med. 1985;13(10):818–829.[Medline]
  2. Chen LM, Martin CM, Morrison TL, Sibbald WJ. Interobserver variability in data collection of the APACHE II score in teaching and community hospitals. Crit Care Med. 1999; 27(9):1999–2004.[CrossRef][Medline]
  3. Damiano AM, Bergner M, Draper EA, Knaus WA, Wagner DP. Reliability of a measure of severity of illness: Acute Physiology of Chronic Health Evaluation—II. J Clin Epidemiol. 1992;45(2):93–101.[CrossRef][Medline]
  4. Goldhill DR, Sumner A. APACHE II, data accuracy and outcome prediction. Anaesthesia. 1998;53(10):937–943.[CrossRef][Medline]
  5. Holt AW, Bury LK, Bersten AD, Skowronski GA, Vedig AE. Prospective evaluation of residents and nurses as severity score data collectors. Crit Care Med. 1992;20(12):1688–1691.[Medline]
  6. Polderman KH, Christiaans HM, Wester JP, Spijkstra JJ, Girbes AR. Intra-observer variability in APACHE II scoring. Intensive Care Med. 2001;27(9):1550–1552.[CrossRef][Medline]
  7. Polderman KH, Jorna EM, Girbes AR. Inter-observer variability in APACHE II scoring: effect of strict guidelines and training. Intensive Care Med. 2001;27(8):1365–1369.[CrossRef][Medline]
  8. Bernard GR, Vincent JL, Laterre PF, et al. Efficacy and safety of recombinant human activated protein C for severe sepsis. N Engl J Med. 2001;344(10):699–709.[Abstract/Free Full Text]
  9. Kho ME, McDonald E, Stratford PW, Cook DJ. Interrater reliability of APACHE II scores for medical-surgical intensive care patients: a prospective blinded study. Am J Crit Care. 2007;16(4):378–383.[Abstract/Free Full Text]
  10. Cook DJ, Meade MO, Hand LE, McMullin JP. Toward understanding evidence uptake: semirecumbency for pneumonia prevention. Crit Care Med. 2002;30(7):1472–1477.[CrossRef][Medline]
  11. Walter SD, Eliasziw M, Donner A. Sample size and optimal designs for reliability studies. Stat Med. 1998;17(1):101–110.[CrossRef][Medline]
  12. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–174.[CrossRef][Medline]
  13. Stratford PW, Spadoni GF. Sample size estimation for the comparison of competing measures’ reliability coefficients. Physiother Can. 2003;55:225–229.[CrossRef]
  14. Curtis JR, Cook DJ, Wall RJ, et al. Intensive care unit quality improvement: a "how-to" guide for the interdisciplinary team. Crit Care Med. 2006;34(1):211–218.[CrossRef][Medline]
  15. Garg AX, Adhikari NK, McDonald H, et al. Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review. JAMA. 2005;293(10):1223–1238.[Abstract/Free Full Text]




This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Respond to This Article
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Donahoe, L.
Right arrow Articles by Cook, D. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Donahoe, L.
Right arrow Articles by Cook, D. J.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS