Evaluating inter-rater reliability of indicators to assess performance of medicines management in health facilities in Uganda
Abstract
Background: To build capacity in medicines management, the Uganda Ministry of Health introduced a nationwide
supervision, performance assessment and recognition strategy (SPARS) in 2012. Medicines management supervisors
(MMS) assess performance using 25 indicators to identify problems, focus supervision, and monitor improvement in
medicines stock and storage management, ordering and reporting, and prescribing and dispensing. Although the
indicators are well-recognized and used internationally, little was known about the reliability of these indicators. An
initial assessment of inter-rater reliability (IRR), which measures agreement among raters (i.e., MMS), showed poor IRR;
subsequently, we implemented efforts to improve IRR. The aim of this study was to assess IRR for SPARS indicators at
two subsequent time points to determine whether IRR increased following efforts to improve reproducibility.
Methods: IRR was assessed in 2011 and again after efforts to improve IRR in 2012 and 2013. Efforts included targeted
training, providing detailed guidelines and job aids, and refining indicator definitions and response categories. In the
assessments, teams of three MMS measured 24 SPARS indicators in 26 facilities. We calculated IRR as a team agreement
score (i.e., percent of the MMS teams in which all three MMS had the same score). Two sample tests for proportions
were used to compare IRR scores for each indicator, domain, and overall for the initial assessment and the following
two assessments. We also compared the IRR scores for indicators classified as simple (binary) versus complex
(multi-component). Logistic regression was used to identify supervisor group characteristics associated with
domain-specific and overall IRR scores.
Results: Initially only five (21%) indicators had acceptable reproducibility, defined as an IRR score ≥ 75%. At the initial
assessment, prescribing quality indicators had the lowest and stock management indicators had the highest IRR. By the
third IRR assessment, 12 (50%) indicators had acceptable reproducibility, and the overall IRR score improved from 57%
to 72%. The IRR of simple indicators was consistently higher than that of complex indicators in the three assessment
periods. We found no correlation between IRR scores and MMS experience or professional background.
Conclusions: Assessments of indicator reproducibility are needed to improve IRR. Using simple indicators is recommended.