ausblenden:
Schlagwörter:
ORBIS, big data, data quality, research ethics
Zusammenfassung:
Since around 2010, large off-the-shelf firm datasets covering a wide variety of variables for millions of firms are for sale by commercial providers. Such datasets are the basis for high impact scientific works, and have also beyond that been used frequently e.g. in economics, finance, political economy, economic sociology, geography, and other disciplines. One example of such an offered dataset is the ORBIS firm data, claiming to cover more than 400m firms globally in 2022. After examining and using the dataset in a four-year research project, I make the case that there are significant problems to use the data for research, depending on the question to be answered, as well as the available capacity and skills to make the data usable. Some of these are already known from the literature, others are not. But the most pressing point is that the magnitude of these problems is still understated. I gather (references to) solutions and applications to overcome a few examples of the identified problems. However, overall, I argue that the complexity of the data, existing errors, and inconsistencies, missing relevant information, the interrelation of these problems, and insufficient documentation and support provided with the data only allow for confidence in the validity of results under very specific conditions. These conditions will in most cases only apply to a much smaller sample than the claimed hundreds of millions of firms covered which undermines the main reason to work with the data in the first place. Evaluating whether conditions for valid inferences are met requires extensive knowledge of the data. ORBIS in its current state should only be used for research if sufficient time, resources, and skills are available to fully understand the data, and overcome its problems – if possible - convincingly and demonstratable. Furthermore, these issues have serious implications for the feasibility to review and evaluate research using the ORBIS data, and therefore reproducibility, unless significant additional effort is made by researchers to document the data preparation process and its implied decisions. This article gives some insights to reviewers as well as potential authors to better evaluate the use and preparation of ORBIS data. This added required effort for researchers, together with the price of getting access to the data in the first place, further speaks against its scientific use.