Improving Genome Annotation with Negative Rule Mining
Author Information
Author(s): Irena I. Artamonova, Goar Frishman, Dmitrij Frishman
Primary Institution: Institute for Bioinformatics, GSF – National Research Center for Environment and Health
Hypothesis
Can negative association rule mining help identify erroneous protein annotations?
Conclusion
Negative rule mining is effective in flagging potentially erroneous protein annotations for further inspection.
Supporting Evidence
- The study identified 9591 negative rules from the PEDANT annotation dataset.
- 96% of manually verified exceptions from negative rules contained at least one annotation error.
- Negative rule mining flagged 0.6% of features as suspicious, which were enriched in errors.
Takeaway
This study shows that using special rules can help find mistakes in how proteins are labeled, making it easier to fix them.
Methodology
The study used negative association rule mining on a dataset of protein annotations from the PEDANT genome database.
Potential Biases
The approach may miss biologically motivated exceptions that are not errors.
Limitations
The method primarily identifies over-annotations and may not detect under-annotations.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website