|The output might take several seconds, please be patient. |
The Prokaryotic Operon DataBase (ProOpDB) constitutes one of the most precise and complete repository of operon predictions in our days. Using our novel and highly accurate operon algorithm, we have predicted the operon structures of more than 1,200 prokaryotic genomes. ProOpDB offers diverse alternatives by which a set of operon predictions can be retrieved including: i) organism name, ii) metabolic pathways, as defined by the KEGG database, iii) gene orthology, as defined by the COG database, iv) conserved protein motifs, as defined by the Pfam database, v) reference gene, vi) reference operon, among others. In order to limit the operon output to non-redundant organisms, ProOpDB offers an efficient protocol to select the more representative organisms based on a precompiled phylogenetic distances matrix. In addition, the ProOpDB operon predictions are used directly as the input data of our Gene Context Tool (GeConT) to visualize their genomic context and retrieve the sequence of their corresponding 5’ regulatory regions, as well as the nucleotide or amino acid sequences of their genes. We kindly ask you to cite ProOpDB properly.
Taboada B., Ciria R., Martinez-Guerrer C.E., Merino E., 2012, ProOpDB: Prokaryotic Operon DataBase, Nucleic Acids Research, 40(D1), D627–D631
The prediction algorithm
The algorithm is a multilayer perceptron neural network (MLP) classifier, that used as input the intergenic distances of contiguous genes and the functional relationship scores of the STRING database between the different groups of orthologous proteins, as defined in the COG database. Nevertheless, the operon prediction of our method is not restricted to only those genes with a COG assignation, since we successfully defined new groups of orthologous genes and obtained, by extrapolation, a set of equivalent STRING-like scores based on conserved gene pairs on different genomes. Since the STRING functional relationships scores are determined in an un-bias manner and efficiently integrates a large amount of information coming from different sources and kind of evidences, the prediction made by our MLP are considerably less influenced by the bias imposed in the training procedure using one specific organism. Please read the paper cited below for detail.
Taboada B., Verde C., Merino E., 2010, High accuracy operon prediction method based on STRING database scores, Nucleic Acids Research, 38(12), e130.
This work was supported by grants PAPIIT IN201714 and CONACyT 154817