Dr. Xinghua Lu is an Assistant Professor at the Dept of Biostatistics, Bioinformatics and Epidemiology of Medical University of South Carolina. He was trained in Pharmacology and work in the field of bioinformatics after NLM sponsored postdoctoral training in Biomedical Informatics. His research interests concentrates on applying latent variable models to simulate biological signaling system and text mining. Abstract: Knowledge related to proteins serves as a corner stone of modern biomedical knowledge. In this talk, I will discuss applying statistical approaches to identify the latent semantic topics from a corpus of MEDLINE titles and abstracts describe biological aspects of proteins. A Bayesian model selection approach was employed to determine the optimal number of topics to represent the corpus. The identified latent topics were semantically coherent and majority of them reflected the biological concepts. Furthermore, the latent semantic topics were mapped to the controlled vocabulary of the Gene Ontology. |