MotifCluster: A Tool for Clustering and Visualizing Protein Sequences
Author Information
Author(s): Micah Hamady, Jeremy Widmann, Shelley D. Copley, Rob Knight
Primary Institution: University of Colorado, Boulder, CO, USA
Hypothesis
MotifCluster aims to improve the identification of evolutionary relationships between distantly related protein families by clustering sequences based on shared motifs.
Conclusion
MotifCluster effectively clusters protein sequences based on shared motifs, demonstrating high accuracy with low false positive rates.
Supporting Evidence
- MotifCluster assigned families to the correct superfamilies with a 0.17% false positive rate.
- The tool allows users to visualize motifs on protein structures, aiding in functional analysis.
- Clustering based on motifs provides better insights into evolutionary relationships than traditional methods.
Takeaway
MotifCluster helps scientists group similar proteins by looking at tiny parts they share, making it easier to understand how they are related.
Methodology
MotifCluster uses various distance metrics to cluster sequences based on user-supplied motifs and visualizes these motifs on protein structures.
Potential Biases
Motif-finding algorithms may be biased by the presence of closely related sequences in the input set.
Limitations
The results depend on the order of sequences provided in the input set, which can affect clustering outcomes.
Participant Demographics
The study involved a diverse set of protein sequences from various families and superfamilies.
Statistical Information
P-Value
0.17%
Statistical Significance
p<0.05
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website