Fleiss' Kappa

Fleiss' Kappa is a statistical measure used to evaluate how consistently multiple raters classify or assess items, beyond what would be expected by chance. It quantifies the level of agreement among raters on categorical decisions, with a value ranging from -1 to 1. A value of 1 indicates perfect agreement, 0 suggests agreement is no better than random chance, and negative values indicate less agreement than expected by chance. Fleiss’ Kappa is especially useful when more than two raters are involved, providing a standardized way to assess reliability in subjective evaluations or classifications across different observers.