Most measures of agreement are chance-corrected. They differ in three dimensions: their definition of chance agreement, their choice of disagreement function, and how they handle multiple raters. Chance agreement is usually defined in a pairwise manner, following either Cohen’s kappa or Fleiss’s kappa. The disagreement function is usually a nominal, quadratic, or absolute value function. But how to handle multiple raters is contentious, with the main contenders being Fleiss’s kappa, Conger’s kappa, and Hubert’s kappa, the variant of Fleiss’s kappa where agreement is said to occur only if every rater agrees. More generally, multi-rater agreement coefficients can be defined in a g-wise way, where the disagreement weighting function uses g raters instead of two. This paper contains two main contributions. (a) We propose using Fréchet variances to handle the case of multiple raters. The Fréchet variances are intuitive disagreement measures and turn out to generalize the nominal, quadratic, and absolute value functions to the case of more than two raters. (b) We derive the limit theory of g-wise weighted agreement coefficients, with chance agreement of the Cohen-type or Fleiss-type, for the case where every item is rated by the same number of raters. Trying out three confidence interval constructions, we end up recommending calculating confidence intervals using the arcsine transform or the Fisher transform.

Moss, Jonas (2023)

Measuring Agreement Using Guessing Models and Knowledge Coefficients

Several measures of agreement, such as the Perreault–Leigh coefficient, the AC1 , and the recent coefficient of van Oest, are based on explicit models of how judges make their ratings. To handle such measures of agreement under a common umbrella, we propose a class of models called guessing models, which contains most models of how judges make their ratings. Every guessing model have an associated measure of agreement we call the knowledge coefficient. Under certain assumptions on the guessing models, the knowledge coefficient will be equal to the multi-rater Cohen’s kappa, Fleiss’ kappa, the Brennan–Prediger coefficient, or other less-established measures of agreement. We provide several sample estimators of the knowledge coefficient, valid under varying assumptions, and their asymptotic distributions. After a sensitivity analysis and a simulation study of confidence intervals, we find that the Brennan–Prediger coefficient typically outperforms the others, with much better coverage under unfavorable circumstances.

Moss, Jonas & Grønneberg, Steffen (2023)

Partial Identification of Latent Correlations with Ordinal Data

The polychoric correlation is a popular measure of association for ordinal data. It estimates a latent correlation, i.e., the correlation of a latent vector. This vector is assumed to be bivariate normal, an assumption that cannot always be justified. When bivariate normality does not hold, the polychoric correlation will not necessarily approximate the true latent correlation, even when the observed variables have many categories. We calculate the sets of possible values of the latent correlation when latent bivariate normality is not necessarily true, but at least the latent marginals are known. The resulting sets are called partial identification sets, and are shown to shrink to the true latent correlation as the number of categories increase. Moreover, we investigate partial identification under the additional assumption that the latent copula is symmetric, and calculate the partial identification set when one variable is ordinal and another is continuous. We show that little can be said about latent correlations, unless we have impractically many categories or we know a great deal about the distribution of the latent vector. An open-source R package is available for applying our results.

Moss, Jonas (2022)

Infinite diameter confidence sets in Hedges’ publication bias model

Meta-analysis, the statistical analysis of results from separate studies, is a fundamental building block of science. But the assumptions of classical meta-analysis models are not satisfied whenever publication bias is present, which causes inconsistent parameter estimates. Hedges’ selection function model takes publication bias into account, but estimating and inferring with this model is tough for some datasets. Using a generalized Gleser–Hwang theorem, we show there is no confidence set of guaranteed finite diameter for the parameters of Hedges’ selection model. This result provides a partial explanation for why inference with Hedges’ selection model is fraught with difficulties.

The tetrachoric correlation is a popular measure of association for binary data and estimates the correlation of an underlying normal latent vector. However, when the underlying vector is not normal, the tetrachoric correlation will be different from the underlying correlation. Since assuming underlying normality is often done on pragmatic and not substantial grounds, the estimated tetrachoric correlation may therefore be quite different from the true underlying correlation that is modeled in structural equation modeling. This motivates studying the range of latent correlations that are compatible with given binary data, when the distribution of the latent vector is partly or completely unknown. We show that nothing can be said about the latent correlations unless we know more than what can be derived from the data. We identify an interval constituting all latent correlations compatible with observed data when the marginals of the latent variables are known. Also, we quantify how partial knowledge of the dependence structure of the latent variables affect the range of compatible latent correlations. Implications for tests of underlying normality are briefly discussed.

Moss, Jonas (2019)

univariateML: An R package for maximum likelihood estimation of univariate densities