In the social sciences, measurement scales often consist of ordinal items and are commonly analyzed using factor analysis. Either data are treated as continuous, or a discretization framework is imposed in order to take the ordinal scale properly into account. Correlational analysis is central in both approaches, and we review recent theory on correlations obtained from ordinal data. To ensure appropriate estimation, the item distributions prior to discretization should be (approximately) known, or the thresholds should be known to be equally spaced. We refer to such knowledge as substantive because it may not be extracted from the data, but must be rooted in expert knowledge about the data-generating process. An illustrative case is presented where absence of substantive knowledge of the item distributions inevitably leads the analyst to conclude that a truly two-dimensional case is perfectly one-dimensional. Additional studies probe the extent to which violation of the standard assumption of underlying normality leads to bias in correlations and factor models. As a remedy, we propose an adjusted polychoric estimator for ordinal factor analysis that takes substantive knowledge into account. Also, we demonstrate how to use the adjusted estimator in sensitivity analysis when the continuous item distributions are known only approximately.

In factor analysis and structural equation modeling non-normal data simulation is traditionally performed by specifying univariate skewness and kurtosis together with the target covariance matrix. However, this leaves little control over the univariate distributions and the multivariate copula of the simulated vector. In this paper we explain how a more flexible simulation method called vine-to-anything (VITA) may be obtained from copula-based techniques, as implemented in a new R package, covsim. VITA is based on the concept of a regular vine, where bivariate copulas are coupled together into a full multivariate copula. We illustrate how to simulate continuous and ordinal data for covariance modeling, and how to use the new package discnorm to test for underlying normality in ordinal data. An introduction to copula and vine simulation is provided in the appendix.

Foldnes, Njål & Grønneberg, Steffen (2021)

Non-normal Data Simulation using Piecewise Linear Transforms

We present PLSIM, a new method for generating nonnormal data with a pre-specified covariance matrix that is based on coordinate-wise piecewise linear transformations of standard normal variables. In our presentation, the piecewise linear transforms are chosen to match pre-specified skewness and kurtosis values for each marginal distribution. We demonstrate the flexibility of the new method, and an implementation using R software is provided.

Foldnes, Njål & Grønneberg, Steffen (2021)

The sensitivity of structural equation modeling with ordinal data to underlying non-normality and observed distributional forms

Structural equation modeling (SEM) of ordinal data is often performed using normal theory maximum likelihood estimation based on the Pearson correlation (cont-ML) or using least squares principles based on the polychoric correlation matrix (cat-LS). While cont-ML ignores the categorical nature of the data, cat-LS assumes underlying multivariate normality. Theoretical results are provided on the validity of treating ordinal data as continuous when the number of categories increases, leading to an adjustment to cont-ML (cont-ML-adj). Previous simulation studies have concluded that cat-LS outperforms cont-ML, and that it is quite robust to violations of underlying normality. However, this conclusion was based on a data simulation methodology equivalent to discretizing exactly normal data. The present study employs a new simulation method for ordinal data to reinvestigate whether ordinal SEM is robust to underlying non-normality. In contrast to previous studies, we include a large set of ordinal distributions, and our results indicate that ordinal SEM estimation and inference is highly sensitive to the interaction between underlying non-normality and the ordinal observed distributions. Our results show that cont-ML-adj consistently outperforms cont-ML, and that cat-LS is less biased than cont-ML-adj. The sensitivity of cat-LS to violation of underlying normality necessitates the need for a test of underlying normality. A bootstrap test is found to reliably detect underlying non-normality.

Grønneberg, Steffen; Moss, Jonas & Foldnes, Njål (2020)

Partial identification of latent correlations with binary data

The tetrachoric correlation is a popular measure of association for binary data and estimates the correlation of an underlying normal latent vector. However, when the underlying vector is not normal, the tetrachoric correlation will be different from the underlying correlation. Since assuming underlying normality is often done on pragmatic and not substantial grounds, the estimated tetrachoric correlation may therefore be quite different from the true underlying correlation that is modeled in structural equation modeling. This motivates studying the range of latent correlations that are compatible with given binary data, when the distribution of the latent vector is partly or completely unknown. We show that nothing can be said about the latent correlations unless we know more than what can be derived from the data. We identify an interval constituting all latent correlations compatible with observed data when the marginals of the latent variables are known. Also, we quantify how partial knowledge of the dependence structure of the latent variables affect the range of compatible latent correlations. Implications for tests of underlying normality are briefly discussed.

Sucarrat, Genaro & Grønneberg, Steffen (2020)

Risk Estimation with a Time-Varying Probability of Zero Returns

The probability of an observed financial return being equal to zero is not necessarily zero, or constant. In ordinary models of financial return, however, e.g. ARCH, SV, GAS and continuous-time models, the zero-probability is zero, constant or both, thus frequently resulting in biased risk estimates (volatility, Value-at-Risk, Expected Shortfall, etc.). We propose a new class of models that allows for a time varying zero-probability that can either be stationary or non-stationary. The new class is the natural generalisation of ordinary models of financial return, so ordinary models are nested and obtained as special cases. The main properties (e.g. volatility, skewness, kurtosis, Value-at-Risk, Expected Shortfall) of the new model class are derived as functions of the assumed volatility and zero-probability specifications, and estimation methods are proposed and illustrated. In a comprehensive study of the stocks at New York Stock Exchange (NYSE) we find extensive evidence of time varying zero-probabilities in daily returns, and an out-of-sample experiment shows that corrected risk estimates can provide significantly better forecasts in a large number of instances.

Foldnes, Njål & Grønneberg, Steffen (2019)

Pernicious Polychorics: The Impact and Detection of Underlying Non-normality

Ordinal data in social science statistics are often modeled as discretizations of a multivariate normal vector. In contrast to the continuous case, where SEM estimation is also consistent under non-normality, violation of underlying normality in ordinal SEM may lead to inconsistent estimation. In this article, we illustrate how underlying non-normality induces bias in polychoric estimates and their standard errors. This bias is strongly affected by how we discretize. It is therefore important to consider tests of underlying multivariate normality. In this study we propose a parametric bootstrap test for this purpose. Its performance relative to the test of Maydeu-Olivares is evaluated in a Monte Carlo study. At realistic sample sizes, the bootstrap exhibited substantively better Type I error control and power than the Maydeu-Olivares test in ordinal data with ten dimensions or higher. R code for the bootstrap test is provided.

Foldnes, Njål & Grønneberg, Steffen (2019)

On Identification and Non-normal Simulation in Ordinal Covariance and Item Response Models

A standard approach for handling ordinal data in covariance analysis such as structural equation modeling is to assume that the data were produced by discretizing a multivariate normal vector. Recently, concern has been raised that this approach may be less robust to violation of the normality assumption than previously reported. We propose a new perspective for studying the robustness toward distributional misspecification in ordinal models using a class of non-normal ordinal covariance models. We show how to simulate data from such models, and our simulation results indicate that standard methodology is sensitive to violation of normality. This emphasizes the importance of testing distributional assumptions in empirical studies. We include simulation results on the performance of such tests.

The assessment of model fit has received widespread interest by researchers in the structural equation modeling literature for many years. Various model fit test statistics have been suggested for conducting this assessment. Selecting an appropriate test statistic in order to evaluate model fit, however, can be difficult as the selection depends on the distributional characteristics of the sampled data, the magnitude of the sample size, and/or the proposed model features. The purpose of this paper is to present a selection procedure that can be used to algorithmically identify the best test statistic and simplify the whole assessment process. The procedure is illustrated using empirical data along with an easy to use computerized implementation.

Grønneberg, Steffen & Foldnes, Njål (2019)

A problem with discretizing Vale-Maurelli in simulation studies

Previous influential simulation studies investigate the effect of underlying non-normality in ordinal data using the Vale–Maurelli (VM) simulation method. We show that discretized data stemming from the VM method with a prescribed target covariance matrix are usually numerically equal to data stemming from discretizing a multivariate normal vector. This normal vector has, however, a different covariance matrix than the target. It follows that these simulation studies have in fact studied data stemming from normal data with a possibly misspecified covariance structure. This observation affects the interpretation of previous simulation studies.

We introduce and evaluate a new class of approximations to common test statistics in structural equation modeling. Such test statistics asymptotically follow the distribution of a weighted sum of i.i.d. chi-square variates, where the weights are eigenvalues of a certain matrix. The proposed eigenvalue block averaging (EBA) method involves creating blocks of these eigenvalues and replacing them within each block with the block average. The Satorra–Bentler scaling procedure is a special case of this framework, using one single block. The proposed procedure applies also to difference testing among nested models. We investigate the EBA procedure both theoretically in the asymptotic case, and with simulation studies for the finite-sample case, under both maximum likelihood and diagonally weighted least squares estimation. Comparison is made with 3 established approximations: Satorra–Bentler, the scaled and shifted, and the scaled F tests.

Foldnes, Njål & Grønneberg, Steffen (2017)

The asymptotic covariance matrix and its use in simulation studies

We propose a new and flexible simulation method for non-normal data with user-specified marginal distributions, covariance matrix and certain bivariate dependencies. The VITA (VIne To Anything) method is based on regular vines and generalizes the NORTA (NORmal To Anything) method. Fundamental theoretical properties of the VITA method are deduced. Two illustrations demonstrate the flexibility and usefulness of VITA in the context of structural equation models. R code for the implementation is provided.