Term Identification Methods for Consumer Health Vocabulary Development
2007

Identifying Terms for Consumer Health Vocabulary Development

Sample size: 1893 publication 10 minutes Evidence: high

Author Information

Author(s): Qing T Zeng, Tony Tse, Guy Divita, Alla Keselman, Jon Crowell, Allen C Browne, Sergey Goryachev, Long Ngo

Primary Institution: Brigham and Women's Hospital, Harvard Medical School

Hypothesis

We explored several term identification methods for developing a consumer health vocabulary.

Conclusion

The collaborative human review and logistic regression methods were effective for identifying terms for consumer health vocabulary development.

Supporting Evidence

  • The study identified 753 consumer terms.
  • The logistic regression model had an area under the ROC curve of 95.5%, indicating high effectiveness.
  • A total of 1893 distinct n-grams received master votes during the review process.

Takeaway

The study found ways to identify health terms that regular people use, which can help make health information easier to understand.

Methodology

The study involved collaborative human review of candidate strings and testing two automated methods: C-value formula and logistic regression.

Potential Biases

The review was conducted only by researchers, not lay consumers, which may introduce bias.

Limitations

The study relied on query logs that contained few complete sentences, leading to potential errors in analysis.

Participant Demographics

The study involved researchers from various fields including health informatics and linguistics.

Statistical Information

P-Value

p<0.05

Confidence Interval

95.5%

Statistical Significance

p<0.05

Digital Object Identifier (DOI)

10.2196/jmir.9.1.e4

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication