Skip to main content

Inter-rater reliability of the QUADAS-2 for assessing screening accuracy studies

Date and Location




Saturday 21 September 2013 - 10:30 - 12:00


Presenting author and contact person

Presenting author

Bing Guo

Contact person

Bing Guo
Abstract text
Background: Published empirical evidence is scarce on the inter-rater reliability of the latest version of the Quality Assessment of Diagnostic Accuracy Studies―the QUADAS-2. Objectives: To evaluate the inter-rater reliability of the QUADAS-2 used in a systematic review to assess the methodological quality of 34 screening accuracy studies of transcutaneous bilirubin meters used to test neonatal hyperbilirubinemia. Methods: A review-specific guidance was developed for the 11 signaling questions; possible answers include “yes”, “unclear” or “no”. Two reviewers independently piloted the QUADAS-2 tool in four studies. One question in the domain of reference standard was removed from the tool, resulting in 10 questions. Two independent reviewers assessed 34 studies using the modified QUADAS-2 tool along with the refined guidance. Agreement between the two reviewers for each of the 10 questions was measured by proportion of agreement (po), Kappa coefficients (κ), and prevalence adjusted bias adjusted kappa (PABAK) which takes into account the effects of prevalence and bias―the two paradoxes associated with the Kappa statistic. Results: For the 10 questions, po ranged from 41% to 97%; κ ranged from -0.03 to 0.57, indicating poor agreement for 2 questions, slight to fair for 3 questions, and moderate for 5 questions. PABAK ranged from -0.18 to 0.94, indicating poor agreement for 2 questions, fair for 1 question, and moderate to almost perfect for 7 questions. After adjusting for bias and prevalence, κ values increased for six questions. In one question, κ increased from 0 (poor agreement) to 0.94 (almost perfect agreement), mainly due to a large prevalence effect (prevalence index of 0.97). Conclusions: The low prevalence of certain items may result in a substantial reduction in κ values, which can be misleading. When measuring inter-rater reliability for accuracy studies using the QUADAS-2, PABAK should be measured when a significant discrepancy exists between po and κ.