Category Archives: robust statistics

My new paper on multiple imputation

Multiple imputation for bounded variables

Missing data are a common issue in statistical analyses. Multiple imputation is a technique that has been applied in countless research studies and has a strong theoretical basis. Most of the statistical literature on multiple imputation has focused on unbounded continuous variables, with mostly ad hoc remedies for variables with bounded support. These approaches can be unsatisfactory when applied to bounded variables as they can produce misleading inferences. In this paper, we propose a flexible quantile-based imputation model suitable for distributions defined over singly or doubly bounded intervals. Proper support of the imputed values is ensured by applying a family of transformations with singly or doubly bounded range. Simulation studies demonstrate that our method is able to deal with skewness, bimodality, and heteroscedasticity and has superior properties as compared to competing approaches, such as log-normal imputation and predictive mean matching. We demonstrate the application of the proposed imputation procedure by analysing data on mathematical development scores in children from the Millennium Cohort Study, UK. We also show a specific advantage of our methods using a small psychiatric dataset. Our methods are relevant in a number of fields, including education and psychology.



My new paper on additive quantile mixed models

Additive quantile mixed models

Additive models are flexible regression tools that handle linear as well as nonlinear terms. The latter are typically modelled via smoothing splines. Additive mixed models extend additive models to include random terms when the data are sampled according to cluster designs (e.g., longitudinal). These models find applications in the study of phenomena like growth, certain disease mechanisms and energy consumption in humans, when repeated measurements are available. In this paper, we propose a novel additive mixed model for quantile regression. Our methods are motivated by an application to physical activity based on a dataset with more than half million accelerometer measurements in children of the UK Millennium Cohort Study. In a simulation study, we assess the proposed methods against existing alternatives..

Additive quantile mixed models

Additive quantile mixed models

My new paper on nonlinear quantile mixed models

Nonlinear quantile mixed models

In regression applications, the presence of nonlinearity and correlation among observations offer computational challenges not only in traditional settings such as least squares regression, but also (and especially) when the objective function is non-smooth as in the case of quantile regression. In this paper, we develop methods for the modeling and estimation of nonlinear conditional quantile functions when data are clustered within two-level nested designs. This work represents an extension of the linear quantile mixed models of Geraci and Bottai (2014, Statistics and Computing). We develop a novel algorithm which is a blend of a smoothing algorithm for quantile regression and a second order Laplacian approximation for nonlinear mixed models. To assess the proposed methods, we present a simulation study and two applications, one in pharmacokinetics and one related to growth curve modeling in agriculture.

Nonlinear quantile mixed models

Nonlinear quantile mixed models

My new paper on the decomposition of the indirect effect in mediation analysis using quantiles

A novel quantile-based decomposition of the indirect effect in mediation analysis with an application to infant mortality in the US population by M Geraci and A Mattei

In mediation analysis, the effect of an exposure (or treatment) on an outcome variable is decomposed into two components: a direct effect, which pertains to an immediate influence of the exposure on the outcome, and an indirect effect, which the exposure exerts on the outcome through a third variable called mediator. Our motivating example concerns the relationship between maternal smoking (the exposure, X), birthweight (the mediator, M), and infant mortality (the outcome, Y), which has attracted the interest of epidemiologists and statisticians for many years. We introduce new causal estimands, named u-specific direct and indirect effects, which describe the direct and indirect effects of the exposure on the outcome at a specific quantile u of the mediator, 0 < u < 1. Under sequential ignorability we derive an interesting and novel decomposition of u-specific indirect effects. The components of this decomposition have a straightforward interpretation and can provide new insights into the complexity of the mechanisms underlying the indirect effect. We illustrate the proposed methods using data on infant mortality in the US population. We provide analytical evidence that supports the hypothesis that the risk of sudden infant death syndrome is not predicted by changes in the birthweight distribution.

Session on Robust Methods – Call for papers, European Survey Research Association, July 2015, Reykjavik, Iceland

ESRA 2015, Reykjavik: Call for Papers – Closing date 15 January 2015

The 6th Conference of the European Survey Research Association (ESRA) will take place 13th-17th July 2015 in Reykjavik, Iceland.


Paper proposals are invited for the session on “Robust Methods in Survey Design and Analysis with Applications”

The violation of the assumptions that underlie parametric statistical methods is potentially a serious issue when drawing inferences about a population. Resulting bias in the estimates may lead to incorrect conclusions. Typical problems include, but are not limited to, the presence of outliers, untenable normality assumptions, and model misspecification.

This session aims at showcasing recent developments in robust methods for survey design and survey data analysis with emphasis on applications. Submissions on topics such as semi- and non-parametric modelling, estimation of distribution functions and quantiles, variance estimation and methods for missing data are particularly welcome. The presentations will illustrate the application of robust methods to studies in the life, social and natural sciences. Examples on the usage of related statistical software are also encouraged.

Session organizer: Marco Geraci <>