Reliable collection of data is important for assessing the accuracy of a systematic or other type of literature review. This academy article explains how inter-rater reliability ensures consistent and accurate assessments in systematic reviews and research.
Inter-rater reliability (IRR) is a measure of the consistency and agreement between two or more raters or observers in their assessments, judgments, or ratings of a particular phenomenon or behaviour. In other words, IRR refers to the degree to which different raters or observers produce similar or consistent results when evaluating the same thing.
IRR is used in various academic fields, including psychology, sociology, education, medicine, and others, to ensure the validity and reliability of the research or evaluation. It can be used to assess the reliability of screening between two reviewers.
IRR can be reported as the percentage agreement (number of agreement scores/total number of scores). It can also be measured using statistical methods such as Cohen’s Kappa coefficient, intraclass correlation coefficient (ICC), or Fleiss’ kappa; which take into account the number of raters, the number of categories or variables being rated, and the level of agreement among the raters.
High inter-rater reliability indicates that the raters are consistent in their judgments, while low inter-rater reliability suggests that the raters have different interpretations or criteria for evaluating the same phenomenon. The Kappa scores range from -1 to 1, where 0 represents agreement by chance and 1 represents 100% agreement between screeners.
Achieving high inter-rater reliability is crucial for ensuring the validity and generalisability of research findings or evaluation results.
Click here to find out How to export inter-rater reliability data from Covidence.
Click here to go back to Covidence Academy.
Already have an account? Sign in and start screening!
Explore more resources.