versión On-line ISSN 1668-7027
AGUERRI, María Ester et al. Measuring differential item functioning in the item response theory. Interdisciplinaria [online]. 2007, vol.24, n.1, pp. 95-110. ISSN 1668-7027.
In the study of differential item functioning (DIF), measuring its size is of great relevance. An easily interpreted measure is the Mantel-Haenszel Log Odds Ratio (MH-LOR): its sign shows the group which the item favors and its value is zero when the item does not show DIF. This research also considers a measure of DIF named LOR-IRT because it is linked to the log odds ratio and is formulated on the basis of the item parameters within the item response theory (IRT) framework. In order to study the similarity between the LOR-IRT measure according to the number of parameters of the adjusted model and MH-LOR, the DIF was analyzed through real data as well as non-DIF simulated data. The real data consists of a 20-item verbal reasoning test taken by 349 senior high school students and 865 sophomore students from the School of Psychology in the University of Buenos Aires. The simulated data includes answers to a 20-item test based on the three-parameter logistic model for two samples of 1,000 participants from a normal standard population. The parameters of the 20 items under study stem from the combination of four discrimination levels (0.4, 0.8, 1.2 and 1.6) and five difficulty levels (-2, -1, 0, 1 and 2). In order to replicate the conditions of the DIF analysis on the basis of real data, the value of the guessing parameter was set at 0.25 for all the items. Therefore, the chosen design was a 4 X 5 type with 100 repetitions. After analyzing the DIF of the verbal reasoning items on the basis of real data, we concluded that the LOR-IRT obtained upon the adjustment of the one-parameter logistic model (the Rasch model) led to results similar to those of MH-LOR. This statement holds true in the light of the following three facts: there is a 94.44% coincidence in the decisions about the presence of DIF, and both the lower sum of the squared differences and the higher correlation are obtained when compared with the results of the adjustment of the two or three-parameter model. The similarity between the corresponding standard errors is outstanding, the sum of the squared differences is almost zero, and the correlation is remarkably higher than that of the two or three-parameter logistic model. Considering that the verbal reasoning test presents four alternatives of which only one is correct, the items can be modeled according to the three-parameter logistic model, with a non-null guessing parameter. However, the LOR-IRT results are similar to those of MH-LOR in terms of magnitude and standard error when the one-parameter logistic model is adjusted. These results remained the same in the simulation study. In fact, the adjustment of the one-parameter logistic model led to LOR-IRT values which are, on average, similar to those of MH-LOR, and that both the lower sum of the squared differences and the higher correlation are obtained. As for the real data, the similarity between the corresponding standard errors is also outstanding. The sum of the squared differences is almost zero and the regression line is similar to the identity line when the Rasch model is adjusted. The purpose of future research will be to not only study similarities between LOR-IRT and MH-LOR on other designs in terms of test length, group sample size and impact presence, but also to assess their performance in the correct identification of items that show DIF.
Palabras clave : Differential item functioning; MH-LOR; Item response theory; Three parameter logistic mode; The Rasch model.