English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Journal Article

perccalc: an R package for estimating percentiles from categorical variables

MPS-Authors
There are no MPG-Authors in the publication available
External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Cimentada, J. (2019). perccalc: an R package for estimating percentiles from categorical variables. Journal of Open Source Software, 4(44), 1-3. doi:10.21105/joss.01796.


Cite as: https://hdl.handle.net/21.11116/0000-0006-B0E4-1
Abstract
Social science research makes extensive use of categorical variables. This means that most variables in model definitions are a combination of categorical and ordered categorical variables, which sometimes are proxies of continuous variables such as income or years of education. The seriousness of this phenomena can be best exemplified by the surge and usage of techniques tailored specifically for this type of analysis in social science research (Agresti, 2007, 2010). In particular, educational research, where there’s a maturing literature on calculating inequality gaps, categorical data are essential for estimating inequality. For example, the income of a person is often asked in income brackets rather than the exact amount of money; researchers would prefer the exact amount but to avoid non-response accumulation and privacy concerns, income brackets are a partial solution. This solution gives the income information of respondents but at the same time in a limited fashion given that we cannot estimate traditional statistics such as the differences of percentiles from the income brackets. One example of this is calculating the gap in cognitive abilities between the top (e.g 90th percentiles) and bottom (e.g 10th percentiles) groups in the income distribution. This paper introduces the perccalc package, which is a direct implementation of the theoretical work of Reardon (2011) where it is possible to estimate the difference between two percentiles from an ordered categorical variable. More concretely, by specifying an ordered categorical variable and a continuous variable, this method can estimate differences in the continuous variable between percentiles of the ordered categorical variable. This bring forth a relevant strategy to contrast ordered categorical variables which usually have alternative continuous measures to the percentiles of the continuous measures. Moreover, this opens an avenue for calculating percentile distributions and percentile differences for ordered categorical variables which don’t necessarily have an alternative continuous measure such as job occupation classifications; one relevant example being the classification from Erikson, Goldthorpe, & Portocarero (1979).