Services on Demand
Journal
Article
Indicators
Cited by SciELO
Related links
Similars in SciELO
Share
SaberEs
Print version ISSN 1852-4418On-line version ISSN 1852-4222
Abstract
ROSATI, Germán. Development of an imputation model for income variables with lost values using ensamble learning methods: Application to the permanent household survey (EPH). SaberEs [online]. 2017, vol.9, n.1, pp.91-111. ISSN 1852-4418.
This paper aims to present some advances made in the development of a missing values and non-response imputation model for income variables in household surveys. The general methodological propose is exposed and the results of some tests. Two imputation methods are evaluated: 1) hot deck (widely used in mayor surveys such as Encuesta Permanente de Hogares and Encuesta Anual de Hogares of the Buenos Aires City) and 2) a LASSO regression model ensamble. The ensamble is generated using the bagging algorithm. The first and second part of the document reviews the main missing data generation mechanisms and its implications for the use of imputation methods. In the third section, several imputation methods are reviewed, emphasizing its assumptions, advantages and limitations. The fourth part analyzes the theoretical and methodological foundations of LASSO and ensamble learning. Finally, the fifth section presents some results of the application of this method to Encuesta Permanente de Hogares data.
Keywords : Regularization; LASSO; Non response.