2022-11-30 : 15400 3D structures containing nucleic acids | RNAEQ v3.259all
nakb.org
Full Search   

View examples of each type of basepair observed or predicted to occur in RNA 3D structures from this page. Basepairs are organized in geometric families using the Leontis-Westhof classification. Select basepair families with the dropdown menu to view base combinations that form in each family. The interactive heat map shows geometric similarities of basepairs in each family and can be used to select instances to superpose in the 3D viewer window.

More Info



More Information About the RNA Basepair Catalog

Exemplar instances are presented for each base combination that forms an RNA basepair in each geometric family, according to the Leontis-Westhof system [1,2]. Also provided are visualizations of the basepairs, counts of observed instances for that each combination at 4 Angstroms or better in release 1.59 (April 26, 2014) of the Representative Sets of RNA 3D Structures (RNAEQ), and quantitative measures of geometric similarities between different base combinations in the same geometric family. In each of the twelve base pair families, there are potentially 4 x 4 = 16 base combinations, i.e. AA, AC, AG, AU, CA, CC, etc. Not all families have all combinations because of the arrangement of H-bonding donors and acceptors on each base.

Display areas

There are four main display areas which interact with one another to facilitate viewing:

  1. 4x4 grid of base combinations displays static images of each known base combination in each geometric family with ideal hydrogen bonds. Clicking one of the 16 cells will cause the selected base combination to appear in the 3D coordinate viewer; control-clicking will add or remove the clicked base combinations. In cases such as cWW AA, where there is both an Aa and aA combination, clicking on the AA cell will display both the Aa and aA. Selected cells will be highlighted with a circle in the cell.
  2. 3D coordinate viewer, which will display one or more base combinations, as selected by the user.  The button labeled “Show Numbers” refers to the 3D coordinate viewer. The “Clear Selected” will clear both the 3D display as well as the other views. Right clicking brings up a menu of Jsmol options. Clicking and dragging on the window will rotate the molecule(s), rolling the mouse wheel will, on many platforms, zoom in or out, hovering over an atom will pop up text telling the name of the atom and number of the nucleotide. Many more controls exist and can be found by reading the Jsmol documentation.
  3. IsoDiscrepancy heat map. The IsoDiscrepancy is a numerical measure of geometric similarity, known as isostericity, between two base combinations in the same basepairing family [3]. Low numerical values, less than 2.2, indicate geometrically similar basepairs and are colored in shades of blue; these are considered to be isosteric.  Higher numerical values falling between 2.2 and 3.5, colored in shades of yellow, are nearly isosteric, while cells with larger values indicate non-isosteric base combinations and are colored orange or red.  Basepair combinations are listed in an order which puts geometrically similar base combinations near each other in the list.  Clicking (and control-clicking) on the diagonal cells selects which base combinations appear in the 3D coordinate viewer.  Clicking off the diagonal superimposes the two base combinations (one for the row, one for the column) in the 3D coordinate viewer.  The base combinations being shown are indicated with green dots in the diagonal cells.
  4. Base Combinations / Exemplars Table: For each base combination a listing of the isostericity value, count in the 4 Angstrom representative set, PDB ID and resolution of the chosen exemplar along with the model, chain, nucleotide numbers, crystallographic symmetry operators, and the isostericity groups as defined by [2] are shown. The isostericity grouping indicates which set of basepairs are isosteric with each other. Clicking on a row will display that combination in the 3D view and select it in the 4x4 grid.

The uses of basepair isostericity

Basepair isostericity was introduced to explain observed sequence variability in homologous positions between different biological sources such as different organisms.  Most familiar is the sequence variability in cWW basepairs in RNA double helices, where a CG base combination in one organism may correspond to an AU base combination in another.  

Isostericity explains this observation by noting that the CG cWW basepair and the AU cWW basepair have nearly identical connections to the RNA backbone, as measured by the relative locations of their glycosidic bonds (between the N1/N9 atom of the base and the C1’ atom of the backbone).  Similarly, in the tHS basepair family, the AA and AG base combinations are isosteric (though less so than the canonical cWW base combinations). In [3] it was documented that corresponding positions of homologous molecules tend to have conserved 3D structure and, furthermore, make RNA basepairs from the same basepairing family.

Uppercase and lowercase letters

Some basepairs like AA cWW are not actually symmetric, even though the bases are the same and the interacting edges are the same.  We use upper and lowercase letters to distinguish between the two geometries.  One can see them separately by clicking on the diagonal entries on the heat map.  Note, for example, that cWW Cc and cWW uU are isosteric while cWW Cc and cWW Uu are not.  Note also that cWW uU is more nearly isosteric with cWW UG than with GU, so the difference in geometry might be seen in substitution patterns.

Basepair frequencies by base combination and by geometric family

Basepair frequencies calculated from the representative set provide estimates of the relative occurrences of base combinations and base pair families for use in bioinformatics and RNA structure modeling. In the following tables, cells with frequency values of 20% or more are shaded blue, values between 10% and 20% are shaded grey.

  1. Relative frequencies by geometric basepair family, i.e., percent of all basepairs that are cWW, tWW, cWH, tWH, …, tSS for all twelve basepair families, normalized by total number of basepairs (12 x 1 table).

  2. Family Notation Count Percentage
    1 cWW 15181 74.21
    2 tWW 239 1.17
    3 cWH 284 1.39
    4 tWH 826 4.04
    5 cWS 326 1.59
    6 tWS 286 1.40
    7 cHH 9 0.04
    8 tHH 199 0.97
    9 cHS 273 1.33
    10 tHS 1100 5.38
    11 cSS 953 4.66
    12 tSS 781 3.82
    Total 20457 100.00

  3. Relative frequencies by bases in the pair, i.e., percent of basepairs that are AA, AC, AG, … UU for all base combinations, normalized by total number of basepairs (4 x 4 table).  In this summary, each pair of bases is listed only once with bases in alphabetical order, so for example, CG is listed but GC is not.

  4. 1st base↓/2nd base→ A C G U
    A 3.31 4.64 9.17 25.77
    C 0.29 46.3 0.45
    G 1.45 7.24
    U 1.37

  5. Relative frequencies of each base combination by geometric family, normalized by total number of basepairs in that family (12 x 16 table).  In symmetric families cWW, tWW, cHH, and tHH, each pair of bases is listed only once, so for example CG is listed but GC is not (cell is shaded black). Percentages in each row sum to 100.  Basepair combinations with 0% frequency are indicated by empty cells.

  6. Family Notation AA AC AG AU CA CC CG CU GA GC GG GU UA UC UG UU Total
    1 cWW 0.11 0.44 1.26 27.67 0.04 61.37 0.16 7.54 1.42 100.0
    2 tWW 28.87 5.86 29.29 2.93 23.85 1.26 4.18 1.67 2.09 100.0
    3 cWH 1.41 1.41 3.17 2.11 43.31 40.49 2.82 5.28 100.0
    4 tWH 11.38 0.73 12.35 0.97 0.97 5.81 0.97 64.77 0.12 1.94 100.0
    5 cWS 23.62 26.69 0.31 12.27 6.44 2.45 1.84 5.52 0.92 1.84 1.53 3.37 7.36 0.31 3.68 1.84 100.0
    6 tWS 2.80 1.05 48.95 1.05 0.70 3.15 6.64 2.80 22.38 5.59 1.05 3.15 0.7 100.0
    7 cHH 44.44 11.11 44.44 100.0
    8 tHH 80.40 3.02 2.51 4.02 2.01 3.02 5.03 100.0
    9 cHS 14.29 2.20 2.20 2.20 9.16 1.83 0.73 6.96 1.10 2.2 0.37 0.73 49.45 6.59 100.0
    10 tHS 8.36 2.82 74.73 2.09 1.00 1.18 0.73 3.27 3.09 2.73 100.0
    11 cSS 7.87 28.86 12.49 0.52 17.52 2.52 0.52 9.86 0.63 0.63 1.78 12.59 0.42 3.57 0.21 100.0
    12 tSS 6.02 17.03 59.80 9.09 0.51 0.77 6.27 0.51 100.0

  7. Relative frequencies of each pair of bases by geometric family, normalized by total number of basepairs having that pair of bases (12 x 10 table).  In this summary, each pair of bases is listed only once, so for example CG is listed but GC is not. Percentages in each column sum to 100. Basepair combinations with 0% frequency are indicated by empty cells.

  8. Family Notation AA AC AG AU CC CG CU GG GU UU
    1 cWW 2.51 10.70 9.26 93.84 10.00 95.62 66.12 88.77 77.14
    2 tWW 10.18 3.08 0.48 1.59 11.67 0.69 2.20 3.37 0.59 1.79
    3 cWH 0.59 6.16 0.32 6.67 1.35 4.13 41.41 0.98 5.36
    4 tWH 13.86 1.17 2.62 0.34 13.33 0.57 4.41 16.16 1.57 5.71
    5 cWS 11.36 13.93 0.29 0.98 13.33 0.11 6.61 1.68 1.11 2.14
    6 tWS 1.18 1.76 6.79 0.11 15.00 0.20 0.55 4.31 0.71
    7 cHH 0.39 0.05 1.35
    8 tHH 23.60 0.88 0.73 0.17 0.14 1.65 3.37
    9 cHS 5.75 1.61 0.58 0.51 8.33 0.08 10.19 2.02 1.17 6.43
    10 tHS 13.57 6.45 41.61 0.49 21.67 0.37 2.20 12.12
    11 cSS 11.06 40.32 6.06 0.15 0.31 1.93 2.02 1.24 0.71
    12 tSS 6.93 19.50 25.02 1.51 0.50 16.50 0.26
    Total 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

Leontis-Westhof basepair families and annotations

No. Bond Orientation Interacting Edges Symbol Strand Orientation Abbreviated notation
1 Cis Watson-Crick/Watson-Crick Anti-Parallel cWW
2 Trans Watson-Crick/Watson-Crick Parallel tWW
3 Cis Watson-Crick/Hoogsteen Parallel cWH
4 Trans Watson-Crick/Hoogsteen Anti-Parallel tWH
5 Cis Watson-Crick/Sugar Anti-Parallel cWS
6 Trans Watson-Crick/Sugar Parallel tWS
7 Cis Hoogsteen/Hoogsteen Anti-Parallel cHH
8 Trans Hoogsteen/Hoogsteen Parallel tHH
9 Cis Hoogsteen/Sugar Parallel cHS
10 Trans Hoogsteen/Sugar Anti-Parallel tHS
11 Cis Sugar/Sugar Anti-Parallel cSS
12 Trans Sugar/Sugar Parallel tSS

Figure A: Base edges and Base-pair geometric isomerism. (Upper left) An adenosine showing the three base edges that are available for hydrogen-bonding interactions: Watson-Crick, Hoogsteen and Sugar edge. (Lower left) Representation of RNA base as a triangle. The position of the ribose is indicated with a circle in the corner defined by the Hoogsteen and Sugar edge. (Right) Cis and Trans base-pairing geometries, illustrated for two bases interacting with Watson-Crick edges [1].

Figure B: Basepair geometric families and their annotation. Twelve geometric basepair families resulting from all combinations of edge-to-edge interactions of two bases with cis or trans orientation of the glycosidic bonds. Circles represent Watson-Crick edges, squares Hoogsteen edges, and triangles Sugar edges. Basepair symbols are composed by combining edge symbols, with solid symbols indicating cis basepairs and open symbols trans basepairs [2].

References

  1. Leontis, N.B. and E. Westhof, Geometric nomenclature and classification of RNA base pairs. RNA, 2001. 7(4): p. 499-512.  link
  2. Leontis, N.B., J. Stombaugh, and E. Westhof, The non-Watson-Crick base pairs and their associated isostericity matrices. Nucleic acids research, 2002. 30(16): p. 3497-531.  link
  3. Stombaugh, J., C.L. Zirbel, E. Westhof, and N.B. Leontis, Frequency and isostericity of RNA base pairs. Nucleic acids research, 2009. 37(7): p. 2294-312.  link
Back to Top