Category Archives: statistical modelling

My new paper on multiple imputation

Multiple imputation for bounded variables

Missing data are a common issue in statistical analyses. Multiple imputation is a technique that has been applied in countless research studies and has a strong theoretical basis. Most of the statistical literature on multiple imputation has focused on unbounded continuous variables, with mostly ad hoc remedies for variables with bounded support. These approaches can be unsatisfactory when applied to bounded variables as they can produce misleading inferences. In this paper, we propose a flexible quantile-based imputation model suitable for distributions defined over singly or doubly bounded intervals. Proper support of the imputed values is ensured by applying a family of transformations with singly or doubly bounded range. Simulation studies demonstrate that our method is able to deal with skewness, bimodality, and heteroscedasticity and has superior properties as compared to competing approaches, such as log-normal imputation and predictive mean matching. We demonstrate the application of the proposed imputation procedure by analysing data on mathematical development scores in children from the Millennium Cohort Study, UK. We also show a specific advantage of our methods using a small psychiatric dataset. Our methods are relevant in a number of fields, including education and psychology.



My new paper on additive quantile mixed models

Additive quantile mixed models

Additive models are flexible regression tools that handle linear as well as nonlinear terms. The latter are typically modelled via smoothing splines. Additive mixed models extend additive models to include random terms when the data are sampled according to cluster designs (e.g., longitudinal). These models find applications in the study of phenomena like growth, certain disease mechanisms and energy consumption in humans, when repeated measurements are available. In this paper, we propose a novel additive mixed model for quantile regression. Our methods are motivated by an application to physical activity based on a dataset with more than half million accelerometer measurements in children of the UK Millennium Cohort Study. In a simulation study, we assess the proposed methods against existing alternatives..

Additive quantile mixed models

Additive quantile mixed models

My new book chapter on PCA with missing data

Principal component analysis in the presence of missing data

The aim of this chapter is to provide an overview of recent developments in principal component analysis (PCA) methods when the data are incomplete. Missing data bring uncertainty into the analysis and their treatment requires statistical approaches that are tailored to cope with specific missing data processes (i.e., ignorable and nonignorable mechanisms). Since the publication of the classic textbook by Jolliffe, which includes a short, same-titled section on the missing data problem in PCA, there have been a few methodological contributions that hinge upon a probabilistic approach to PCA. In this chapter, we unify methods for ignorable and nonignorable missing data in a general likelihood framework. We also provide real data examples to illustrate the application of these methods using the R language and environment for statistical computing and graphics.


My new paper on nonlinear quantile mixed models

Nonlinear quantile mixed models

In regression applications, the presence of nonlinearity and correlation among observations offer computational challenges not only in traditional settings such as least squares regression, but also (and especially) when the objective function is non-smooth as in the case of quantile regression. In this paper, we develop methods for the modeling and estimation of nonlinear conditional quantile functions when data are clustered within two-level nested designs. This work represents an extension of the linear quantile mixed models of Geraci and Bottai (2014, Statistics and Computing). We develop a novel algorithm which is a blend of a smoothing algorithm for quantile regression and a second order Laplacian approximation for nonlinear mixed models. To assess the proposed methods, we present a simulation study and two applications, one in pharmacokinetics and one related to growth curve modeling in agriculture.

Nonlinear quantile mixed models

Nonlinear quantile mixed models

My new paper on mixed-effects models using the normal and the Laplace distributions

Mixed-effects models using the normal and the Laplace distributions: A 2×2 convolution scheme for applied research

In statistical applications, the normal and the Laplace distributions are often contrasted: the former as a standard tool of analysis, the latter as its robust counterpart. I discuss the convolutions of these two popular distributions and their applications in research. I consider four models within a simple 2×2 scheme which is of practical interest in the analysis of clustered (e.g., longitudinal) data. In my view, these models, some of which are less known than others by the majority of applied researchers, constitute a ‘family’ of sensible alternatives when modelling issues arise. In three examples, I revisit data published recently in the epidemiological and clinical literature as well as a classic biological dataset.

My new paper on the decomposition of the indirect effect in mediation analysis using quantiles

A novel quantile-based decomposition of the indirect effect in mediation analysis with an application to infant mortality in the US population by M Geraci and A Mattei

In mediation analysis, the effect of an exposure (or treatment) on an outcome variable is decomposed into two components: a direct effect, which pertains to an immediate influence of the exposure on the outcome, and an indirect effect, which the exposure exerts on the outcome through a third variable called mediator. Our motivating example concerns the relationship between maternal smoking (the exposure, X), birthweight (the mediator, M), and infant mortality (the outcome, Y), which has attracted the interest of epidemiologists and statisticians for many years. We introduce new causal estimands, named u-specific direct and indirect effects, which describe the direct and indirect effects of the exposure on the outcome at a specific quantile u of the mediator, 0 < u < 1. Under sequential ignorability we derive an interesting and novel decomposition of u-specific indirect effects. The components of this decomposition have a straightforward interpretation and can provide new insights into the complexity of the mechanisms underlying the indirect effect. We illustrate the proposed methods using data on infant mortality in the US population. We provide analytical evidence that supports the hypothesis that the risk of sudden infant death syndrome is not predicted by changes in the birthweight distribution.

Identifying profiles of physical activity behaviours in the presence of non-ignorable missing data

Physical activity and inactivity are two independent dimensions over which children aggregate into distinct behavioural profiles. Read my new article ‘Probabilistic principal component analysis to identify profiles of physical activity behaviours in the presence of non-ignorable missing data’ in the Journal of the Royal Statistical Society: Series C at