English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
  Generating Useful Test Data for Complex Linked Employer-Employee Datasets

Dorner, M., Drechsler, J., & Jacobebbinghaus, P. (2012). Generating Useful Test Data for Complex Linked Employer-Employee Datasets. In J. Domingo-Ferrer, & I. Tinnirello (Eds.), Privacy in Statistical Databases (pp. 165-178). Berlin; Heidelberg: Springer.

Item is

Basic

hide
Genre: Contribution to Collected Edition

Files

show Files

Locators

hide
Description:
-
OA-Status:

Creators

hide
 Creators:
Dorner, Matthias1, Author           
Drechsler, Jörg1, Author
Jacobebbinghaus, Peter1, Author
Affiliations:
1External Organizations, ou_persistent22              

Content

hide
Free keywords: linked employer-employee data – test data – dummy data – swapping – noise addition
 Abstract: When data access for external researchers is difficult or time consuming it can be beneficial if test datasets that mimic the structure of the original data are disseminated in advance. With these test data researchers can develop their analysis code or can decide whether the data are suitable for their planned research before they go through the lengthly process of getting access at the research data center. The aim of these data is not to provide any meaningful results. Instead it is important to maintain the structure of the data as closely as possible including skip patterns, logical constraints between the variables, and longitudinal relationships so that any code that is developed using these test data will also run on the original data without further modifications. Achieving this goal can be challenging for complex datasets such as linked employer-employee datasets (LEED) where the links between the establishments and the employees also need to be maintained. Using the LEED of the Institute for Employment Research we illustrate how useful test data can be developed for such complex datasets. Our approach mainly relies on traditional statistical disclosure control (SDC) techniques such as data swapping and noise addition for data protection. Since statistical inferences need not be preserved, high swapping rates can be applied to sufficiently protect the data. At the same time it is straightforward to maintain the structure of the data by adding some constraints on the swapping procedure.

Details

hide
Language(s): eng - English
 Dates: 2012
 Publication Status: Issued
 Pages: -
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: DOI: 10.1007/978-3-642-33627-0_13
 Degree: -

Event

show

Legal Case

show

Project information

show

Source 1

hide
Title: Privacy in Statistical Databases
Source Genre: Collected Edition
 Creator(s):
Domingo-Ferrer, Josep1, Editor
Tinnirello, Ilenia1, Editor
Affiliations:
1 External Organizations, ou_persistent22            
Publ. Info: Berlin ; Heidelberg : Springer
Pages: - Volume / Issue: - Sequence Number: - Start / End Page: 165 - 178 Identifier: ISBN: 978-3-642-33626-3

Source 2

hide
Title: Lecture Notes in Computer Science
Source Genre: Series
 Creator(s):
Affiliations:
Publ. Info: -
Pages: - Volume / Issue: 7556 Sequence Number: - Start / End Page: - Identifier: -