An Advanced Study of Methods for Categorical and Continuous Data with Many Zeros


McTernan, Melissa L. (2017). An Advanced Study of Methods for Categorical and Continuous Data with Many Zeros.


This dissertation is focused on analytic approaches for non-normally distributed response data in psychological research. Non-normal data are ubiquitous in psychology and behavior studies and may present themselves in many forms. Of particular interest to this program of research is how to statistically handle various forms of zero-inflated data. Study 1 compares methods for the analysis of longitudinal ordinal outcomes that are asymmetrically distributed across categories, with a large proportion of responses falling in the zero-category. The methods are applied to an empirical dataset. Compared models include models that treat the data as continuous, categorical, or as a two-part problem in which the zero/non-zero part is categorical and the non-zero values are treated as continuous. The findings suggest that ignoring the categorical nature of the data is generally not preferred but ultimately support a two-part model. Although a two-part model handles the zero-inflation problem, it does not adequately address the asymmetric nature of the positive values. Thus, Study 2 focuses on positive and positively skewed continuous data for a single variable to understand how such a distribution may be best represented. The data in this study are simulated and generated from a gamma distribution. The competing models in this study either assume a gamma response distribution or a normal response distribution and fitted by an intercept-only model that assumes the response is normally distributed as well as a model that assumes the response is gamma distributed. Results from Study 2 suggest that with small sample sizes and severely skewed gamma-distributed data, Type I errors in tests of the intercept can be inflated due to underestimated standard errors of the estimate. Finally, Study 3 extends Study 2 to a longitudinal context in which a random-effects model is used to address the repeated measures. Previous research has shown that applications of linear mixed-effects models that assume a normal response can result in biases in the estimated variance components of the model if the response is not normally distributed. Study 3 is an exploration of the extent of the bias in the standard error in the fixed effects and in the variance of the random intercept across different sample sizes, both with regard to the number of repeated measures (level 1) as well as the number of subjects (level 2). Thus, the overarching purpose of this series of studies is to inform researchers about how to handle zero-inflated data that is also asymmetric in the positive values.



Pure sciences Psychology Gamma Generalized linear mixed model Mixed-effects models Two-part models Zero-inflated Statistics Quantitative psychology 0463:Statistics 0632:Quantitative psychology


Copyright - Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works. Last updated - 2017-06-15

Reference Type


Book Title



McTernan, Melissa L.

Series Author(s)

Blozis, Shelley A.

Year Published


Volume Number





University of California, Davis

City of Publication

Ann Arbor





Reference ID