ngLOC: an n-gram-based Bayesian method for estimating the subcellular proteomes of eukaryotes
2007

ngLOC: A Method for Predicting Protein Localization

Sample size: 28056 publication 10 minutes Evidence: high

Author Information

Author(s): Brian R. King, Chittibabu Guda

Primary Institution: State University of New York at Albany

Hypothesis

Can an n-gram-based Bayesian method accurately predict the localization of protein sequences across multiple organelles?

Conclusion

The ngLOC method demonstrates high accuracy in predicting protein localization, achieving 89% accuracy for single-localized sequences and 82% for multi-localized sequences.

Supporting Evidence

  • The ngLOC method achieved an accuracy of 89% for single-localized sequences.
  • The method was able to predict multi-localized sequences with an accuracy of 82%.
  • A tenfold cross-validation was used to validate the performance of the ngLOC method.

Takeaway

This study created a computer program that helps scientists figure out where proteins are located in a cell, which is important for understanding how cells work.

Methodology

The ngLOC method uses n-gram peptides from protein sequences and applies a Bayesian classification approach to predict subcellular localization.

Potential Biases

Potential bias due to reliance on existing datasets that may not represent all protein types equally.

Limitations

The method may not accurately predict localizations for proteins with low representation in the training dataset.

Participant Demographics

The study analyzed protein sequences from eight eukaryotic organisms including yeast, nematode, fruitfly, mosquito, zebrafish, chicken, mouse, and human.

Statistical Information

P-Value

p<0.05

Statistical Significance

p<0.05

Digital Object Identifier (DOI)

10.1186/gb-2007-8-5-r68

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication