Title: Automated Natural Language Processing for Data Integration and Curation in the Wnt Signaling Pathway Streaming video: [Windows Media format] Abstract: We have developed a natural language processing system that is able to identify references to biological interaction networks in free text and automatically assembles a protein association and interaction map. The pipeline is fully automated, and derives key Wnt-pathway associated proteins and biological names from the literature itself using a chi-squared analysis of noun-phrases overrepresented in this literature with respect to the general signal transduction literature. Using identified terms which were over- represented in the Wnt literature with respect to MeSH-annotated signal transduction papers, we formed a base named entity dictionary to which we then appended full-parse derived protein names, and a gold standard set of proteins annotated by a Wnt-signaling review collection. The dictionary served as a name list for a full exhaustive assertion extraction step on the corpus which yielded annotations involving key Wnt-related molecules which were missing or different from those in the canonical diagram, but are described by the literature. Our results suggest that software exploiting a combination of NLP techniques for information extraction could form a valuable first-pass tool for assisting human annotation and maintenance of signal-pathway models. Bio: Carlos Santos is a Ph.D. student at the University of Michigan's Bioinformatics Program. He works in the Natural Language Processing group under the supervision of Dr. David States in collaboration with the National Center for Integrative Biomedical Informatics. His current research interests are in applying natural language processing techniques to better understand and integrate facts and data from the biomedical literature with existing bioinformatics databases in order to better understand complex disease processes. He completed his master's degree in Bioinformatics at the University of Michigan in the winter of 2003, as well as completed an undegraduate degree in Computer Science from Washington University in St. Louis in 2000. At Washington University, he collaborated with Dr. David States' lab at the Institute for Biomedical Computing.
|