Covidence website will be inaccessible as we upgrading our platform on Monday 23rd August at 10am AEST, / 2am CEST/1am BST (Sunday, 15th August 8pm EDT/5pm PDT)

What is inter-rater reliability?

Academy

Home | Blog | Academy | What is inter-rater reliability?

Reliable collection of data is important for assessing the accuracy of a systematic or other type of literature review. This academy article explains how inter-rater reliability ensures consistent and accurate assessments in systematic reviews and research.

Inter-rater reliability (IRR) is a measure of the consistency and agreement between two or more raters or observers in their assessments, judgments, or ratings of a particular phenomenon or behaviour. In other words, IRR refers to the degree to which different raters or observers produce similar or consistent results when evaluating the same thing.

IRR is used in various academic fields, including psychology, sociology, education, medicine, and others, to ensure the validity and reliability of the research or evaluation. It can be used to assess the reliability of screening between two reviewers.

IRR can be reported as the percentage agreement (number of agreement scores/total number of scores). It can also be measured using statistical methods such as Cohen’s Kappa coefficient, intraclass correlation coefficient (ICC), or Fleiss’ kappa; which take into account the number of raters, the number of categories or variables being rated, and the level of agreement among the raters.

High inter-rater reliability indicates that the raters are consistent in their judgments, while low inter-rater reliability suggests that the raters have different interpretations or criteria for evaluating the same phenomenon. The Kappa scores range from -1 to 1, where 0 represents agreement by chance and 1 represents 100% agreement between screeners.