Harmonisation of variables names prior to conducting statistical analyses with multiple datasets: an automated approach
2011

Automated Approach to Harmonise Variable Names in Datasets

Sample size: 241 publication 10 minutes Evidence: high

Author Information

Author(s): Xavier Bosch-Capblanch

Primary Institution: Swiss Tropical and Public Health Institute

Hypothesis

How can inconsistencies in variable names, labels, values, and value labels across datasets be solved to create fully harmonised datasets in an automated way?

Conclusion

Efficient and tested automated algorithms should be used to support the harmonisation process needed to analyse multiple datasets.

Supporting Evidence

  • The algorithm achieved 100% sensitivity and specificity after a second iteration.
  • The automated approach identified a DTP3 variable that was missing in other surveys.
  • The program can process one variable in three to five seconds.

Takeaway

This study shows a way to automatically fix names and labels in data from different surveys so they can be compared easily.

Methodology

The study used automated algorithms to search for and harmonise variable names across multiple datasets.

Limitations

The algorithm relies on user-defined key terms and may miss variables if not properly defined.

Digital Object Identifier (DOI)

10.1186/1472-6947-11-33

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication