Identifying Terms for Consumer Health Vocabulary Development
Author Information
Author(s): Qing T Zeng, Tony Tse, Guy Divita, Alla Keselman, Jon Crowell, Allen C Browne, Sergey Goryachev, Long Ngo
Primary Institution: Brigham and Women's Hospital, Harvard Medical School
Hypothesis
We explored several term identification methods for developing a consumer health vocabulary.
Conclusion
The collaborative human review and logistic regression methods were effective for identifying terms for consumer health vocabulary development.
Supporting Evidence
- The study identified 753 consumer terms.
- The logistic regression model had an area under the ROC curve of 95.5%, indicating high effectiveness.
- A total of 1893 distinct n-grams received master votes during the review process.
Takeaway
The study found ways to identify health terms that regular people use, which can help make health information easier to understand.
Methodology
The study involved collaborative human review of candidate strings and testing two automated methods: C-value formula and logistic regression.
Potential Biases
The review was conducted only by researchers, not lay consumers, which may introduce bias.
Limitations
The study relied on query logs that contained few complete sentences, leading to potential errors in analysis.
Participant Demographics
The study involved researchers from various fields including health informatics and linguistics.
Statistical Information
P-Value
p<0.05
Confidence Interval
95.5%
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website