Title: A study of statistical methods for function prediction of protein motifs Speaker: Tao Tao, Department of Computer Science, UIUC Place: Siebel Center 3405 Time: 11am-12noon Abstract: Automatic discovery of new protein motifs (i.e., amino acid patterns) is one of the major challenges in bioinformatics. Several algorithms have been proposed that can extract statistically significant motif patterns from any set of protein sequences. With these methods, one can generate a large set of candidate motifs that may be biologically meaningful. In this paper, we study several statistical methods to automatically predict the functions of these candidate motifs, including a popularity method, a mutual information method, and statistical translation models. These methods capture, from different perspectives, the correlations between the matched motifs of a protein and its assigned Gene Ontology(GO) terms, which characterize the function of the protein. We evaluate these different methods using the known motifs in the Interpro database. Each method is used to rank candidate terms for each motif. We, then, use mean reciprocal rank(MRR) to evaluate the performance. The results show that in general, all these methods perform well, suggesting that they can all be useful for predicting an unknown motif s function. Among all the methods tested, a statistical translation model with popularity prior performs the best. Bio: Tao Tao is a Ph.D. candidate in the Computer Science Department at University of Illinois at Urbana-Champaign. He has been working on information retrieval models, text/data mining with applications to bioinformatics. |