Back
Report numberRA-MOW-2011-002
TitleMissing data treatment
SubtitleOverview of possible solutions
AuthorsBrenda Wilmots
Yongjun Shen
Elke Hermans
Da Ruan
Published byPolicy Research Centre for Mobility and Public Works, track Traffic Safety 2007-2011
Number of pages33
Date01/03/2011
ISBN
Document languageEnglish
Partner(s)Universiteit Hasselt
Work packageOther: Risk assessment
Summary

Real world data sets are almost always accompanied by missing data due to various uncertainties, which to a great extent restrict researchers from performing classical analyses as complete data matrices are required in most cases. To solve this pervasive problem in data analysis, a number of alternative methods have been developed during the last five decades.

 

Specifically, a simple and common strategy for handling missingness is to delete cases containing any missing values, and the analysis is then carried out on the data that remain. Although simple to implement and being the default for the major statistical packages, this approach has serious drawbacks in terms of elimination of useful information in the data and resulting in serious biases if data are not missing completely at random (MCAR).

 

Later, interest has centered on performing data imputation, the process by which missing values in a data set are estimated by appropriately computed values, thus constructing a complete data set. Unconditional mean imputation, regression imputation, the indicator method and so on are all related to this strategy, known as traditional single imputation. However, even if the missing values could be imputed in such a way, they still have a problem in accounting for missing data uncertainty. Therefore, from the late 70’s on, substantial progress has been made in developing statistical procedures for missing data, and two most important approaches, i.e., maximum likelihood estimation and multiple imputation, have become available, and are being included as useful options in the mainstream software programs.

 

More recently, with the development of computer science and technology, some artificial intelligence and machine learning techniques have arisen in the area of missing data treatment, such as decision trees, neural networks, fuzzy logic systems, rough sets, and so on, which push the missing data research forward to a new stage.
In this report, we outline the key ideas of all these approaches, address their main strengths and limitations, discuss the software programs currently available, and provide guidance on how to select such approaches in practice.

DownloadPDF icon RA-MOW-2011-002.pdf
Lijn

Mission

The Policy Research Centre for Traffic Safety carries out policy relevant scientific research under the authority of the Flemish Government. The Centre is the result of a

cooperation between Hasselt University, KU Leuven and VITO, the Flemish Institute for Technological Research.

Partners

Leuven vito