RNA Loop Modeling

Please cite:
Schudoma et al.,
Nucl. Acids Res. 38: 970-980.
DOI 10.1093/nar/gkp1010.

RLooM - Help

RLooM - Help
How to search the RLooM database? Query details for sequence-based queries Query details for pdb-id-based queries Viewing a loop cluster/loop Loop modeling RLML - The RLooM Modeling Language How to cite RLooM? How to search the RLooM database? Currently the RLooM database can be searched or browsed using the query form at http://rloom.mpimp-golm.mpg.de/cgi-bin/index.py. To browse the database, simply click on one of the four links Hairpins, Single Strand Segments, Internal Loops, or Multiloops. This results in a list of sets of loops of the chosen type grouped by length -- or in case of multiloops, number of stems (which will display the above mentioned length-ordered list upon selection). Selecting one of those sets yields a list of available cluster sets for the selected set. Selection of such a cluster set results in a list of the clusters within the set, which can then be viewed in more detail. Searching the database requires the submission of either a nucleotide sequence (A, C, G, T, U, I + ambiguity codes) and optionally a tolerated number of mismatches, or a Protein Data Bank id with optional chain id. A sequence-based query will search the database for all loops with the given sequence using exact matching for plain sequences or a regular expression-based search for sequences containing ambiguity codes. In case of a pdb-id-based query the search will return all known loops within the respective pdb structure. top Query details for sequence-based queries Queries should be entered uppercase (exception: hairpin exclusion, s. below) To search for internal loops/multiloops, separate the individual segments with a '-'. To limit the search to hairpins, enclose your query in ''. To exclude hairpins type your query in lower case. Mismatch searches are restricted to sequences containing only of A, C, G, U, T, I, -, . Using ambiguity codes causes the mismatch-parameter to be set to 0. The mismatch parameter ranges between [0,length(query)]. Negative input will automatically be set to 0, numbers greater than the query length (i.e., the sum of the lengths of individual segments for multi-segmented loops) will be automatically set to length(query). top Query details for pdb-id-based queries A valid pdb-id contains of a digit followed by three alphanumerical characters. The id is case-insensitive. Optionally, a (case-sensitive) chain-id with a leading ':' can be attached to the pdb-id. This limits the search to loops found in the specified chain. top Viewing a loop cluster/loop The RLooM database stores for each loop, information directly obtained from its source pdb file or computed using external tools. Here, we describe the individual records contained on the result page for a loop cluster/loop. For loop clusters the title bar contains the loop-length, loop-type and cluster id for a cluster, as well as the cutoff used for forming the cluster. For individual loops then length, loop-type and loop id are given. For loop clusters the section Representative Structure provides information on the structure chosen to represent the cluster. Pages for individual loops contain the same fields: Source: [PDB-id:chain] provides a hyperlink to the entry in the Protein Data Bank that corresponds to the structure containing the loop (henceforth referred to as source pdb). Source: Information corresponds to the TITLE record of the source pdb. Source: Compound corresponds to the MOLECULE-part of the COMPND record of the source pdb. Source: Resolution corresponds to the resolution given in the source pdb. Position: Provides the positions of the anchor-bases within the source pdb. Primary Structure: Provides the base-sequence for the loop. Individual segments are flanked by '_', representing the anchor-bases. The next four fields are obtained by scanning the loop structure with the program MC-Annotate[1]. Sugar Puckers: The default sugar pucker for ribonucleotides is the endo-conformation of the ribose C3' - carbon. In this section we list all positions with different or unknown sugar pucker as computed by MC-Annotate (mainly C2'-endo). Glycosidic-bond configurations: Here we list all nucleotide positions with a configuration of the glycosidic bond between ribose and base residue different from the default configuration (anti). Tertiary Structure: Stacked Bases: This is a table showing which pairs of bases are stacked, annotated with the direction of stacking (up, down, left, right, with left corresponding to inwards and right to outwards) Tertiary Structure: Base-pairs: This is a table of contacts between residues in the loop. The notation is consistent to MC-Annotate with W, H, S denoting base-pairs via the commonly known Watson-Crick, Hoogsteen, and Sugar edges and O2'/C... denoting contacts to the ribose-O2'-oxygen and ...-carbon of the phospho-sugar backbone. The following sections provide visualizations of the loop structure. The 3D Structure is displayed using the Java Applet Jmol, while the Structure Graph is computed by a tool developed at our group. The Cluster Members (cluster only) section shows all members of the cluster, sorted and grouped by their base-sequence. For individual loops, the Structural Clusters selection allows browsing the different clusters the loop belongs to in different cluster sets. References: [1] Gendron P, Lemieux S, Major F. Quantitative analysis of nucleic acid three-dimensional structures. J Mol Biol. 2001 May 18;308(5):919-36. top Loop modeling In addition to traditional database search, the RLooM database allows submitting a 3D RNA structure (e.g. from homology modeling) which is then scanned for unpaired regions (=loops). The database then proposes certain loop structures that geometrically fit best to either replace the current loop or add a new one for instance at the end of a helix. The user uploads a PDB file containing the coordinates of an RNA structure.It is important that the PDB complies to current PDB format standards. This especially holds for the column boundaries of ATOM/HETATM records, which must not be overstepped. For improving the quality and usability of RLooM, we kindly ask you to send us your PDB file in case RLooM cannot handle it properly. If the user already has an idea of the location where the new loop should be inserted, they can submit an XML-like modeling query straightforwardly (s. below for a description). Otherwise, they can let RLooM scan the submitted structure for suitable anchor locations. The latter case will yield a list of anchor locations. Clicking the items of the list enters a portion of script-code describing the location of the unpaired region into the query textfield. The user then has to specify a sequence for the new loop and optionally give some additional parameters within the script. Three parameters can be adjusted directly at the form: the cluster set that should be used, the maximum distance between the anchors of a loop and a target structure such that the inserted loop gives a valid model, and the threshold distance defining when a clash occurs between the new loop and the target molecule. top RLML - The RLooM Modeling Language Modeling loops using the RLooM application is performed using a simple XML-like script language -- RLML. Three parameters can be adjusted: the template data set that should be used, the maximum distance between the anchors of a loop and a target structure such that the inserted loop gives a valid model, and the threshold distance defining when a clash occurs between the new loop and the target molecule. A single command is enclosed between tags specifying the loop-type of the query. `<x>...</x>, with x = hairpin\|segment\|internal\|multiloop` Each command has a number of anchors (hairpins/segments:2, internal loops:4, multiloop:6+): `<anchor>ANCHOR ID</anchor>, with ANCHOR ID = RI:C, R=resSeq, I=iCode, C=chainID` The anchor-tag has an optional parameter id, which can be used for specifying the sequence of the anchors. By default, <anchor>-tags are processed in order of appearance. Finally, each command requires a query: `<query>SEQUENCE</query>,` with `SEQUENCE` being a nucleotide sequence (wildcards are allowed.) The `<query>`-tag has three optional parameters: k, force, and mcsearch. The parameter k specifies the tolerated number of mismatches, force denotes whether suitable candidate loops with a different sequence than the query shall be artificially mutated to match the query sequence. The parameter mcsearch, if set to true, allows a valid MC-Search script (see example pattern below, for details see e.g. http://major.iric.ca, or study the output of MC-Annotate) to be submitted instead of the query sequence. By default, k is set to 0, force to false, and mcsearch to true. The optional <remodel>- tag specifies a non-wildcard nucleotide sequence that loop candidates should be mutated into (`<remodel>SEQUENCE</remodel>`.) A sample pattern that searches for tetraloop hairpins of the GNRA sequence motif with two arbitrary flanking bases, any base pair between the first and last loop bases and including two intraloop base stacks: `sequence (A0 NGNRAN) relation ( A2 A3 { stack } A3 A4 { stack } A1 A4 { pairing } )` `top How to cite RLooM? Please cite our 2010 Nucleic Acids Research article: Schudoma, C, May P, Nikiforova, V, and Walther, D Sequence-structure relationships in RNA loops: establishing the basis for loop homology modeling, Nucleic Acids Research, 2010, Vol. 38(3), 970-980 top`

How to search the RLooM database?

Query details for sequence-based queries

Query details for pdb-id-based queries

RLML - The RLooM Modeling Language

How to search the RLooM database?

Currently the RLooM database can be searched or browsed using the query form at http://rloom.mpimp-golm.mpg.de/cgi-bin/index.py. To browse the database, simply click on one of the four links Hairpins, Single Strand Segments, Internal Loops, or Multiloops. This results in a list of sets of loops of the chosen type grouped by length -- or in case of multiloops, number of stems (which will display the above mentioned length-ordered list upon selection). Selecting one of those sets yields a list of available cluster sets for the selected set. Selection of such a cluster set results in a list of the clusters within the set, which can then be viewed in more detail.

Searching the database requires the submission of either a nucleotide sequence (A, C, G, T, U, I + ambiguity codes) and optionally a tolerated number of mismatches, or a Protein Data Bank id with optional chain id. A sequence-based query will search the database for all loops with the given sequence using exact matching for plain sequences or a regular expression-based search for sequences containing ambiguity codes. In case of a pdb-id-based query the search will return all known loops within the respective pdb structure.

top

Query details for sequence-based queries

Queries should be entered uppercase (exception: hairpin exclusion, s. below)
To search for internal loops/multiloops, separate the individual segments with a '-'.
To limit the search to hairpins, enclose your query in '*'. To exclude hairpins type your query in lower case.
Mismatch searches are restricted to sequences containing only of A, C, G, U, T, I, -, *. Using ambiguity codes causes the mismatch-parameter to be set to 0.
The mismatch parameter ranges between [0,length(query)]. Negative input will automatically be set to 0, numbers greater than the query length (i.e., the sum of the lengths of individual segments for multi-segmented loops) will be automatically set to length(query).

top

Query details for pdb-id-based queries

A valid pdb-id contains of a digit followed by three alphanumerical characters. The id is case-insensitive. Optionally, a (case-sensitive) chain-id with a leading ':' can be attached to the pdb-id. This limits the search to loops found in the specified chain.

top

Viewing a loop cluster/loop

The RLooM database stores for each loop, information directly obtained from its source pdb file or computed using external tools. Here, we describe the individual records contained on the result page for a loop cluster/loop.

For loop clusters the title bar contains the loop-length, loop-type and cluster id for a cluster, as well as the cutoff used for forming the cluster. For individual loops then length, loop-type and loop id are given. For loop clusters the section Representative Structure provides information on the structure chosen to represent the cluster. Pages for individual loops contain the same fields:

Source: [PDB-id:chain] provides a hyperlink to the entry in the Protein Data Bank that corresponds to the structure containing the loop (henceforth referred to as source pdb).
Source: Information corresponds to the TITLE record of the source pdb.
Source: Compound corresponds to the MOLECULE-part of the COMPND record of the source pdb.
Source: Resolution corresponds to the resolution given in the source pdb.
Position: Provides the positions of the anchor-bases within the source pdb.
Primary Structure: Provides the base-sequence for the loop. Individual segments are flanked by '_', representing the anchor-bases.

The next four fields are obtained by scanning the loop structure with the program MC-Annotate[1].

Sugar Puckers: The default sugar pucker for ribonucleotides is the endo-conformation of the ribose C3' - carbon. In this section we list all positions with different or unknown sugar pucker as computed by MC-Annotate (mainly C2'-endo).
Glycosidic-bond configurations: Here we list all nucleotide positions with a configuration of the glycosidic bond between ribose and base residue different from the default configuration (anti).
Tertiary Structure: Stacked Bases: This is a table showing which pairs of bases are stacked, annotated with the direction of stacking (up, down, left, right, with left corresponding to inwards and right to outwards)
Tertiary Structure: Base-pairs: This is a table of contacts between residues in the loop. The notation is consistent to MC-Annotate with W, H, S denoting base-pairs via the commonly known Watson-Crick, Hoogsteen, and Sugar edges and O2'/C... denoting contacts to the ribose-O2'-oxygen and ...-carbon of the phospho-sugar backbone.

The following sections provide visualizations of the loop structure. The 3D Structure is displayed using the Java Applet Jmol, while the Structure Graph is computed by a tool developed at our group.

The Cluster Members (cluster only) section shows all members of the cluster, sorted and grouped by their base-sequence. For individual loops, the Structural Clusters selection allows browsing the different clusters the loop belongs to in different cluster sets.

References:
[1] Gendron P, Lemieux S, Major F. Quantitative analysis of nucleic acid three-dimensional structures. J Mol Biol. 2001 May 18;308(5):919-36.

top

Loop modeling

In addition to traditional database search, the RLooM database allows submitting a 3D RNA structure (e.g. from homology modeling) which is then scanned for unpaired regions (=loops). The database then proposes certain loop structures that geometrically fit best to either replace the current loop or add a new one for instance at the end of a helix.

The user uploads a PDB file containing the coordinates of an RNA structure.It is important that the PDB complies to current PDB format standards. This especially holds for the column boundaries of ATOM/HETATM records, which must not be overstepped. For improving the quality and usability of RLooM, we kindly ask you to send us your PDB file in case RLooM cannot handle it properly.

If the user already has an idea of the location where the new loop should be inserted, they can submit an XML-like modeling query straightforwardly (s. below for a description). Otherwise, they can let RLooM scan the submitted structure for suitable anchor locations. The latter case will yield a list of anchor locations. Clicking the items of the list enters a portion of script-code describing the location of the unpaired region into the query textfield. The user then has to specify a sequence for the new loop and optionally give some additional parameters within the script.

Three parameters can be adjusted directly at the form: the cluster set that should be used, the maximum distance between the anchors of a loop and a target structure such that the inserted loop gives a valid model, and the threshold distance defining when a clash occurs between the new loop and the target molecule.

top

RLML - The RLooM Modeling Language

Modeling loops using the RLooM application is performed using a simple XML-like script language -- RLML. Three parameters can be adjusted: the template data set that should be used, the maximum distance between the anchors of a loop and a target structure such that the inserted loop gives a valid model, and the threshold distance defining when a clash occurs between the new loop and the target molecule.

A single command is enclosed between tags specifying the loop-type of the query.

<x>...</x>, with x = hairpin|segment|internal|multiloop

Each command has a number of anchors (hairpins/segments:2, internal loops:4, multiloop:6+):

<anchor>ANCHOR ID</anchor>, with ANCHOR ID = RI:C, R=resSeq, I=iCode, C=chainID

The anchor-tag has an optional parameter id, which can be used for specifying the sequence of the anchors. By default, <anchor>-tags are processed in order of appearance.

Finally, each command requires a query:

<query>SEQUENCE</query>, with SEQUENCE being a nucleotide sequence (wildcards are allowed.)

The <query>-tag has three optional parameters: k, force, and mcsearch. The parameter k specifies the tolerated number of mismatches, force denotes whether suitable candidate loops with a different sequence than the query shall be artificially mutated to match the query sequence. The parameter mcsearch, if set to true, allows a valid MC-Search script (see example pattern below, for details see e.g. http://major.iric.ca, or study the output of MC-Annotate) to be submitted instead of the query sequence. By default, k is set to 0, force to false, and mcsearch to true.

The optional <remodel>- tag specifies a non-wildcard nucleotide sequence that loop candidates should be mutated into (<remodel>SEQUENCE</remodel>.)

A sample pattern that searches for tetraloop hairpins of the GNRA sequence motif with two arbitrary flanking bases, any base pair between the first and last loop bases and including two intraloop base stacks:

sequence (A0 NGNRAN) relation ( A2 A3 { stack } A3 A4 { stack } A1 A4 { pairing } )

top How to cite RLooM? Please cite our 2010 Nucleic Acids Research article: Schudoma, C, May P, Nikiforova, V, and Walther, D Sequence-structure relationships in RNA loops: establishing the basis for loop homology modeling, Nucleic Acids Research, 2010, Vol. 38(3), 970-980 top