Improving Biomedical Text Retrieval with PageRank
Author Information
Author(s): Lin Jimmy
Primary Institution: National Center for Biotechnology Information, National Library of Medicine
Hypothesis
Can related article networks be exploited for text retrieval in the same manner as hyperlink graphs on the Web?
Conclusion
The link structure of content-similarity networks can be exploited to improve the effectiveness of information retrieval systems.
Supporting Evidence
- Incorporating PageRank scores yields significant improvements in ranked-retrieval metrics.
- The study confirms that related document networks can enhance retrieval effectiveness.
- Statistical tests showed significant improvements over baseline retrieval scores.
Takeaway
This study shows that using connections between similar articles can help find the right information better, just like how links on the web work.
Methodology
Experiments were conducted using the TREC 2005 genomics track test collection, combining PageRank and HITS scores with standard retrieval engine scores.
Limitations
The study did not perform second order expansions of related documents, which could limit the network density.
Statistical Information
P-Value
0.01453
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website