NMPDR Vibrio Login | Register
NMPDR User Guide

Intro to NMPDR

Click page icon to view content in center frame


What is NMPDR? 
What genomes are in NMPDR? 
What data are in NMPDR? 
What tools are in NMPDR? 
Where do I start? 
Searching NMPDR 
Viewing NMPDR search results 
NMPDR User Guide
page  
What data are in NMPDR?
banner

What data are in NMPDR?

Complete genomes are the primary data. As such, most chromosomes are one contiguous length of DNA sequence data, or one "contig." Some genomes that are fragmented into several "contigs" are considered to be essentially complete. Genome data include DNA sequences, protein sequences, and the associated annotations. Annotations include both the accurate determination of gene boundaries and the assignment of a functional name to the encoded proteins. NMPDR curators use bioinformatics tools to correct errors in the start or stop codons of genes, and to change incorrect or ambiguous names in the annotations of protein encoding genes, which are called "pegs" in the NMPDR. A peg is equivalent to a CDS.

Populated subsystems are a data type unique to the NMPDR and its underlying annotation environment, the SEED. Subsystems are sets of functional roles grouped according to any biologically useful organizing principle. A subsystem may describe a metabolic pathway, but subsystems are not limited to pathways. For example, there are subsystems that include the ribosomal proteins, or cell division proteins, or pathogen-specific virulence factors. A subsystem may be comprised of a very few or very many proteins that are related in some functional or structural way. Each protein included in a subsystem plays a “functional role” which may be enzymatic, signaling, regulatory, structural, or other. A subsystem may exist in all genomes or be present in only a few closely related genomes. A populated subsystem is a two-dimensional integration of biological functions with genome sequences. It is presented as a spreadsheet with columns of functional roles, rows of genomes, and cells populated by the genes responsible for each function.

Functional clusters are another data type unique to NMPDR and the SEED. The functions of two proximal genes are more likely to be related when they are similarly clustered together in a large number of organisms distributed over a wide phylogenetic space, which is represented as a high functional clustering score. The score is approximately equal to the number of different species (not strains) in which the two genes are co-localized. Functional clustering provides insight to the specific roles played by proteins that may initially be assigned an ambiguous functional role, like "transporter." Functional clusters are presented graphically and by scores for every peg in the database. Additionally, if the peg of interest to you does not share conserved proximity with others, it is possible to discover whether orthologs of your protein are clustered in other genomes. The CL button will display a table of orthologs that might be clustered with other proteins. Clicking on the functional clustering score will also provide a list of orthologs paired in other genomes.

BLAST hits are pre-computed in a reciprocal analysis for all proteins in the database. The results are presented in a table of bidirectional best hits, or BBH. A comprehensive list of orthologs is thus provided for every protein in every genome. You may select orthologs from this list and generate a ClustalW alignment with one click.