The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes
2008

The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes

Sample size: 20000 publication Evidence: high

Author Information

Author(s): Vincze Veronika, Szarvas György, Farkas Richárd, Móra György, Csirik János

Primary Institution: University of Szeged

Hypothesis

The study seeks to fill the gap in publicly available standard corpora for evaluating automatic detection and scope resolution of negation and uncertainty in biomedical texts.

Conclusion

The BioScope corpus is a valuable resource for training, testing, and comparing biomedical Natural Language Processing systems, as well as for linguistic analysis of scientific and clinical texts.

Supporting Evidence

  • The corpus consists of over 20,000 annotated sentences.
  • More than 10% of the sentences contain linguistic annotations suggesting negation or uncertainty.
  • The corpus is freely available for academic purposes.
  • Clinical documents show a higher accuracy in detecting negation and uncertainty cues compared to scientific texts.

Takeaway

The BioScope corpus helps researchers understand when medical texts say something is uncertain or negative, making it easier to analyze and process these texts.

Methodology

The corpus was annotated by two independent linguists following specific guidelines, with a focus on identifying negation and speculation in biomedical texts.

Limitations

The study highlights the complexity of detecting hedging in scientific texts compared to clinical documents, which may affect the accuracy of automated systems.

Digital Object Identifier (DOI)

10.1186/1471-2105-9-S11-S9

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication