ETL: From the German Health Data Lab data formats to the OMOP Common Data Model
2025

Transforming German Health Data for Research

Sample size: 3432000 publication 10 minutes Evidence: moderate

Author Information

Author(s): Melissa Finster, Maxim Moinat, Elham Taghizadeh

Primary Institution: Fraunhofer Institute for Digital Medicine MEVIS

Hypothesis

How can the German Health Data Lab's claims data be standardized into a Common Data Model for better research access?

Conclusion

The ETL process successfully standardizes health data, improving usability for research and facilitating cross-border studies in Europe.

Supporting Evidence

  • Field coverage of 92.7% was achieved for Format 1.
  • Data Quality Dashboard showed 100.0% conformance for Format 1.
  • Mapping coverage for the Condition domain was low at 18.3% due to invalid codes.
  • Format 3 achieved a field coverage of 86.2%.

Takeaway

This study shows how to change health data into a common format so researchers can use it easily, helping them work together better.

Methodology

An Extract, Transform, and Load (ETL) pipeline was developed to convert health data from two formats into the OMOP Common Data Model.

Potential Biases

The use of fictional data may introduce biases that do not exist in real datasets.

Limitations

The study used mock data, which may not accurately reflect real-world scenarios, and some information was lost during the transformation process.

Participant Demographics

The study involved health data from approximately 3.4 million insured individuals in Germany.

Digital Object Identifier (DOI)

10.1371/journal.pone.0311511

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication