Detecting Role Errors in the Gene Hierarchy of the NCI Thesaurus
2008

Detecting Role Errors in the Gene Hierarchy of the NCI Thesaurus

Sample size: 1786 publication Evidence: high

Author Information

Author(s): Min Hua, Cohen Barry, Halper Michael, Oren Marc, Perl Yehoshua

Primary Institution: Fox Chase Cancer Center

Hypothesis

The probability of a given concept having a role error is higher in small p-areas than in large p-areas.

Conclusion

About 75% of the concepts in the Gene hierarchy exhibit role errors.

Supporting Evidence

  • Errors in gene terminologies are common due to rapid growth in genomic knowledge.
  • The Gene hierarchy of the NCIT comprises 1,786 concepts.
  • About 75% of concepts in the Gene hierarchy exhibit role errors.
  • Small p-areas have a higher error percentage (83%) compared to larger p-areas (15%).
  • Auditing methodology was adapted specifically for the Gene hierarchy's structure.

Takeaway

This study looked at how to find mistakes in gene information used in cancer research, and it found that many genes are missing important details.

Methodology

A multiphase auditing methodology focusing on detecting role errors in the Gene hierarchy.

Potential Biases

The reliability of publications reporting gene involvement may vary, affecting the accuracy of role assignments.

Limitations

Errors reported should be considered potential until confirmed by NCIT curators.

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication