Applying negative rule mining to improve genome annotation
2007

Improving Genome Annotation with Negative Rule Mining

Sample size: 55063 publication Evidence: high

Author Information

Author(s): Irena I. Artamonova, Goar Frishman, Dmitrij Frishman

Primary Institution: Institute for Bioinformatics, GSF – National Research Center for Environment and Health

Hypothesis

Can negative association rule mining help identify erroneous protein annotations?

Conclusion

Negative rule mining is effective in flagging potentially erroneous protein annotations for further inspection.

Supporting Evidence

  • The study identified 9591 negative rules from the PEDANT annotation dataset.
  • 96% of manually verified exceptions from negative rules contained at least one annotation error.
  • Negative rule mining flagged 0.6% of features as suspicious, which were enriched in errors.

Takeaway

This study shows that using special rules can help find mistakes in how proteins are labeled, making it easier to fix them.

Methodology

The study used negative association rule mining on a dataset of protein annotations from the PEDANT genome database.

Potential Biases

The approach may miss biologically motivated exceptions that are not errors.

Limitations

The method primarily identifies over-annotations and may not detect under-annotations.

Digital Object Identifier (DOI)

10.1186/1471-2105-8-261

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication