Detecting Role Errors in the Gene Hierarchy of the NCI Thesaurus
Author Information
Author(s): Min Hua, Cohen Barry, Halper Michael, Oren Marc, Perl Yehoshua
Primary Institution: Fox Chase Cancer Center
Hypothesis
The probability of a given concept having a role error is higher in small p-areas than in large p-areas.
Conclusion
About 75% of the concepts in the Gene hierarchy exhibit role errors.
Supporting Evidence
- Errors in gene terminologies are common due to rapid growth in genomic knowledge.
- The Gene hierarchy of the NCIT comprises 1,786 concepts.
- About 75% of concepts in the Gene hierarchy exhibit role errors.
- Small p-areas have a higher error percentage (83%) compared to larger p-areas (15%).
- Auditing methodology was adapted specifically for the Gene hierarchy's structure.
Takeaway
This study looked at how to find mistakes in gene information used in cancer research, and it found that many genes are missing important details.
Methodology
A multiphase auditing methodology focusing on detecting role errors in the Gene hierarchy.
Potential Biases
The reliability of publications reporting gene involvement may vary, affecting the accuracy of role assignments.
Limitations
Errors reported should be considered potential until confirmed by NCIT curators.
Want to read the original?
Access the complete publication on the publisher's website