1.0 Input Options
Gene Context Tool III manage two types of inputs:
  • Text. It refers to the user's textual input which is parsed, converted to SQL (Structured Query Language) and executed against our database (point 1.1 to 1.3 of this help page)
  • Configuration Parameters. Refers to the different parameters which can affect the final chart, such as number of neighbors, clustering, among other options.
    If the user did not configure any parameter, default values are taken (points 1.3 thru 1.5 of this help page).
1.1 Text Search

The application allows the user to enter one ore more common databases identifiers, protein names, text description.
Each identifier should be separated among them by using spaces, tab, comma or newline; by default the application will query each separated id|word|term (like an OR logical operator), allowing this way to enter a list of ids, common names, etc.
Following, a brief example of the supported entries:
  • COG - Clusters of Orthologous Groups of proteins. In the way of COGXXXX where X is a number.
    • COG0147 (Anthranilate/para-aminobenzoate synthases component I) [view]
    • COG0013 (Alanyl-tRNA synthetase) [view]
    • COG0448 (ADP-glucose pyrophosphorylase) [view]
  • KEGG Entry
  • KEGG Metabolism Pathway. Each pathway map is identified by 5 digit number
    • 00400 (Phenylalanine, tyrosine and tryptophan biosynthesis)[view]
    • 00790 (Folate biosynthesis) [view]
    • 00230 –(Purine metabolism) [view]
  • Pfam – Protein family names:
    • Pantoate_transf (Ketopantoate hydroxymethyltransferase) [view]
    • Anth_synt_I_N (Anthranilate synthase component I, N terminal region) [view]
    • Ribonuc_red_lgN (Ribonucleotide reductase, all-alpha domain) [view]
  • NCBI GI number.
    • 284049067 (camphor resistance protein CrcB) [view]
    • 222528606 (cation diffusion facilitator family transporter) [view]
    • 15964556 (hypothetical protein)[view]
  • Common protein names
    • trpE (anthranilate synthase component I)
    • recA (recombinase A)
    • panB (3-methyl-2-oxobutanoate hydroxymethyltransferase)
  • SwissProt [http://www.uniprot.org/uniprot/]
  • Ribosiwtch Like Element [view]. Data base with known an predicted riboswitches. RLEs are identified by 4 digit number preceded by RLE:
    • RLE0001
    • RLE0088
    • RLE0105
  • Rfam - Collection of RNA families. Each Rfam is identified by the leters RF plus 5 digit numbers:
    • RF00557 (Ribosomal protein L10 leader) [view]
    • RF00230 (T-box leader)[view]
    • RF00513 [view]
  • GCT3 ID The primary key for each gen on GCT3 DB is composed by Kegg's org identifier - Kegg's gen id.
    • rip-RIEPE_A0001
    • eco-b4427
    • sme-SMa0011
  • Description Protein description/keyword. The use of double or single quotes to group words is permitted:
    • Cobalamin
    • "Phenylalanine metabolism"
1.2 Orthologous Search
GCT3 allows performing searches of orthologous genes for a single protein. In order to perform this search, the user MUST enter one and only one gene GCT3 ID or NCBI GI (1.1 Text search), check the “search for orthologous” checkbox located beside the submit button, choose the desire parameters and submit the form.

If the ID is valid, the Phylogenetic and organism table will be displayed (3.0 Organism selector); only the selected organisms will be displayed, this rule also apply for the organism of the reference gene.
The orthologous genes are precomputed and annotated on our database, based on bidirectional best hits
1.3 Boolean Expression

It is possible to query our database using common logical operators AND, OR, NOT and parentheses, to override boolean precedence rules, as well as use single or double quotes to join separated terms. In order to use this option, user MUST select the boolean expression radio button:
It is possible to enter any of the identifiers described thru point 1.1 (COG, pfam, rfam, etc.); the application automatically identify to witch database is related the entered terms, this, applying some regular expressions, however due to the syntax of some of them, it is needed to specify the database for the following terms:
  • Pfam family:
  • Common protein names
  • Kegg identifier:
    kegg: RIEPE_A0001
This kind of input allow the user to retrieve more meaningful results; for example, the COG0147 is composed of two families of paralogous genes: trpE and pabA. trpE synthesizes tryptophan whilst pabA synthesizes folate. In order to get the genes of tryptophan biosynthesis that belong to the COG0147 two input queries could be performed:
COG0147 not gene:pabA
COG0147 not 00790
1.4 Display Clustering
This radio buttons allows the display of a specific number of neighboring genes or only those genes predicted to be in the operon. With the operon option, the genes are aligned by the first gene of the predicted transcription unit.
This option is useful to get the regulatory regions when using the 5' retrieve options.

With the “Neighborhood" option, you can select the number of neighbors upstream and downstream; in this case, the target genes resultant from the user criteria are vertically aligned to each other in order to facilitate the chart interpretation.
Please be aware that there is a limit of 3500 genes to be drawn, therefore if you use the neighborhood option and the number of target genes is to substantial, the product of this genes by the number of neighbors could result on more than 3500 genes, and the chart won’t be displayed. To avoid this issue, avoid selecting a large number of organisms, mostly if the search is based on Kegg metabolic pathway.
1.5 Display Category
In order to highlight specific characteristics of genes, This option allows the user switching between three domain color-code
  • Orthology relationships, as defined in the COG database
  • Conserved protein motifs, as defined by Pfam database
  • Metabolic pathways, as defined by the Kyoto Encyclopedia of Genes and Genomes database KEGG.

Therefore all domains belonging to the same COG (or Pfam or KEGG, depending on this radio button) will be displayed with the same color.
1.6 Graphic Options
There are three main graphical options:
  • Gene Style. The user can chose between two gene different styles
  • Arrow arrow
    Point point
  • Highlight target. In order to identify the target genes product of the user criteria, it is possible to enable or disable the gene outline
  • Highlight ON on
    Highlight OFF off
  • Size. Each gene as well as the intergenic regions, con be drawn according to three different scales:
  • Size Gene (nucleotides per pixel) Intergenic zone (nucleotides per pixel)
    Small 20 6
    Nomal 10 5
    Large 8 3
Please note that the intergenic and the gene scale for each option, are different, this in accordance to generate a lean and easier chart to interpret; in the same context, also please be aware that those genes that exceed the 3000nc will be shrunk to 3000nc (300px normal size), those genes will be outlined with a dashed line.
2.0 Search by Description
When the criteria entered doesn’t match any of the identifiers mentioned thru point 1.1, the application will search among the functional description of the Cogs, KEGG pathway, Pfam and Rfam; for this purpose it is allowed the use of single or double quotes to search entries containing those terms in the exact order.
If the query result contains at least one hit, two tables will be displayed, one with the sumary of the found elements and a second one containing all the elements.

Input: “cobalamin r”

As you can see in the table above, RLE's are also contemplated, although they doesn’t have any textual description, however during gene's RLE assignation, most significant COG and Kegg pathways are computed, and compared to the Rfam database, therefore, for instance, if the textual description match a COG which is related to a particularly RLE, the COG and the RLE family will be included on the result (more about RLE’s).
Following, select at least one of the functional groups and send the form, in order to display the phylogenetic and organism tables (3.0 Organism Selector).
If more than one group is selected, the application will search for genes that match all those criteria.
3.0 Organism selector
Check the box for the desired phylogeny selection.
The number of organisms in each lineage is showed at the right side of the selector.
The corresponding organisms will be automatically checked in the Individual Organisms window (rigth table)

You can also limit the total number of organisms selected. To obtain the desired number, the phylogenetically closest organisms of your selection are discarted.
Note that this limit will affect the overall selection, including individual organisms.
It does not matter if one selection contains another (e.g. prokaryotes and prokaryotes archea), the final list will never contain duplicate organisms.
in order to search for a phylogenetic branch or specific organism, you can use the text fields located above each table and press enter or prees the loupe button; to display all the elemnts after any search, leave empty the input text and press enter.
4.0 Output
There are two main screen outputs. One for the neighborhood, which turns around the central column (genes highlighted yellow in the example below), it is possible to define the number of genes desired at left and right of this column (1.4 Display Clustering).

And other for the operons, in this case, the genes are aligned to the left by the first gene of the predicted transcription unit, a little portion of the 5’ intergenic zone of this first gene is also draw in order to include any possible regulation signal(as shown on the last operon from the image below).

The genes are colored according to the current display category option (COG, KEGG or Pfam. 1.5 ). Genes without a known or predicted category, are fully colored with a light gray color
The names of the organisms are displayed to the left of the screen; the colors are assigned according to their phylogeny.
For more information about the genes or the organism, just mouse over them and a tooltip text is displayed or only for genes, you can click them and it will lead you to a page with the complete summary of the gene info, including its sequences (5', 3', nucleotide and amino acid).
4.1 Symbols
GCT3 include some regulation signals into the chart, some of them have more information about that element, following a table with the simbology used by the application.

Signal Forward Reverse Clickable Color
Rfam Yes Dynamic
Terminator No Fixed
Atenuator No Fixed
RLE Yes Dynamic
DNA Static Curvature No Fixed
4.2 Chart Interaction
GCT3 allows to have some chart interaction in two main ways.
a) By removing or rearranging each genome context / operon. To do this, simply use the three button control located at the left of each organism box. The red button removes the genome context from the chart, the up arrow, shift it up and the down arrow shift it down.

In order to display all the removed elements you can use the "Show all" button located at the upper left corner of the screen.
To hide these controls, just click the "Hide controls" button, located upon the “Show all” button at the upper left corner of the screen.
b) You can update some of the initial input parameters without the need to return to main screen and perform the search again. The updatable parameters are:
  • Display Category (1.5)
  • Gene Style (1.6)
  • Highlight target (1.6)

In order to perform the update, select the desire parameters, and press the refresh button. After trigger this action, the information panel will be updated (4.3) and all the elements will been redrawn, hence if you have some elements rearranged or hidden, will appear on the chart as the initial displayed order.
4.3 Information Panel
Located at the right of the window, there is an internal panel with information regarding the current display category:

The panel contains a table with three columns.
  • The color assigned to such category.
  • The category name.
  • The description of such category.

The table is ordered in descendent mode according to the frequency of such category among all the draw genes.
If you want to omit this panel, just click the button “Hide info” located above the table or click it a second time to display it again. By the other hand, if you find this information useful, you can increase or decrease the width of the table by clicking the buttons with the arrows pointing to the left (increase) or right (decrease).
To see more information you can also click the category name and it will lead you to an external page related to such element.
4.4 Download Sequence
As in previous GCT versions, this one also gives you the option to download sequences.

When you perform this action, all the sequences of the current displayed genes (the sequences of the removed genome contexts or operons won´t be downloaded) will be downloaded on one single fasta file, according to the selected download option and the current displayed clustering. To do this, just select the desire sequence option from the combo box and press the get button.
  • Neighborhood (only apply for target genes)
    • Nucleotide sequences
    • Amino acid sequences
    • 5’ sequence
    • 3’ sequence
  • Operons
    • Nucleotide sequence (first genes of the predicted operons)
    • Nucleotide sequences (target genes)
    • Amino acid sequence (first genes of the predicted operons)
    • Amino acid sequence (target genes)
    • 5’ sequence (first genes of the predicted operons)
    • 3’ sequence (last gene of the predicted operons)
4.5 Download Image
As a new feature GCT3 allows to create and download an image from the resulting chart, as well as give the possibility to include some extra information.

To carry out this, just click the “get image” button; if you select the org details checkbox, a panel with the organism and phylogeny details will be included at the end of the image; if you select the category details checkbox, a panel with the current display category option (COG, KEGG or Pfam -1.5), the color code and its description will be included at the end of the image, the number of category elements to include can be specified on the text field located beside the checkbox. If both category and orgs details are checked, the organism description will be included at the end, and the category details at the left but only the color code and its name, the description is omitted in order to optimize image space.