Genetic Code in 3D

Genetic Code in 3D
John Denker

1 Basics of the Code

The genetic code uses an alphabet of only 4 letters, namely C, A, G, and T. These represent the four possible bases that make up DNA. The letters are used to form 3-letter words, called codons. There are therefore 4³ = 64 different codons, of which 61 code for amino acids. Since there are only 20 amino acids, there are obviously lots of synonyms. The three-letter words very naturally lend themselves to a three-dimensional arrangement:

You can rotate the image by dragging it with the mouse. Double-click to restore the original orientation.

The three dimensions have the following meaning (in the original orientation):

up/down = first base. For example, the top layer is G**
left/right = second base. For example, the left edge is *G*
front/back = third base. For example, the front edge is **G

where * is a wildcard, denoting any of the 4 possible bases. In an expression such as G**, the two wildcards are independent, so G** refers to 16 different codons.

2 Synonyms and Proximity in 3D

Some people have a hard time visualizing things in three dimensions, but for those who can manage it, this is a vastly more faithful representation, compared to the circular “rosetta stone” diagrams or other two-dimensional representations. For starters, you can see there are lots of places where it doesn’t matter if the third base changes. This is the back-to-front direction in the diagram (in its initial orientation).

GG* = make Gly 4 ways
GC* = make Ala 4 ways
GT* = make Val 4 ways
AC* = make Thr 4 ways
CG* = make Arg 4 ways (of the 6)
CC* = make Pro 4 ways
CT* = make Leu 4 ways (of the 6)
TC* = make Ser 4 ways (of the 6)
GA* = make an acidic amino acid 4 ways (the only 4)

There are additional places where it doesn’t matter if the third base changes from one purine to another, or if it changes from one pyrimidine to another. This is easy to see, even on the rosetta-stone diagrams. Let r denote either purine (G or A), and y denote either pyrimidine (C or T):

GAr = make Glu 2 ways
GAy = make Asp 2 ways
AAr = make Lys 2 ways
AAy = make Asn 2 ways
... plus many others, including all of the aforementioned 4-member and 6-member groups.

The second base is essentially always important – more important than the first, and much more important than the third. The only exceptions, i.e. the only places where a group of codons crosses a second-base boundary, are the Stop codons (which are very exceptional, because they don’t code for any amino acid), and serine, which is exceptional because when you change the second base you also need to change the first base to make up for it.

All three Stop codons are contiguous. You can get from TAA to TAG by flipping the third base, or from TAA to TGA by flipping the second base. This is not particularly obvious on the rosetta-stone diagrams.
All six Leu codons are contiguous. Flip the third base to get most of them, or flip the first base to get the other two.
Similar words apply to the six Arg codons.
Glu and Gly are adjacent: flip the second base.
Glu and Gln are adjacent: flip the first base.
Asp and Asn are adjacent: flip the first base.
Ile is adjacent to Leu: flip the first base.
In contrast, the 6 Ser codons are not contiguous, and no amount of fiddling will make them so, because the second group differs from the first both plane-wise and column-wise.

3 Solvation Properties

The triangular flag in the upper-right corner of each tile has the following meaning:

Red: hydrophilic, positive charge (basic)
Blue: hydrophilic, negative charge (acidic)
Magenta: hydrophilic, polar but uncharged (dipole not monopole)
Black slash: hydrophilic, nonpolar
No flag: hydrophobic

4 Why This Matters

The synonyms and the proximity relationships are important because of point mutations. The mutation rate is itself under genetic control (because of proofreading). It is smallish but definitely not zero. As the saying goes:

Mutants R Us

For example, there are four ways of making threonine. A mutation that changes one of those into another is a completely silent mutation, and therefore subject to genetic drift. In the long term you would expect each codon to appear one fourth of the time.

It turns out that two of the threonine codons are adjacent to lysine codons, while the other two are not. Similarly, one of them is adjacent to methionine, while the other three are not, as you can see in the diagram.

Now suppose that due to some new evolutionary pressure it becomes important to evolve from Thr to Met at some point in the molecule. You would expect that 1/4 of the population could make that change with just one additional point mutation. A single mutation is enormously more feasible than anything requiring two mutations in series, especially if the intermediate step would be a deleterious or lethal mutation.

This discussion ignores all sorts of details about diploidy, crossover mutations, et cetera ... but it’s not wrong as far as it goes. It’s part of the story.

5 Rotations

For more about what mouse-motions correspond to what sort of rotations, see reference 1.

6 References

: 1.
John Denker,
“Visualizing Rotations in 3D”
www.av8n.com/physics/rotviz.htm
: 2.
International Agency for Research on Cancer, “Database of Amino Acid Properties”
http://p53.iarc.fr/AAProperties.aspx
: 3.
Kyoto Encyclopedia of Genes and Genomes, “Nucleotide Codes, Amino Acid Codes, and Genetic Codes”
http://www.genome.jp/kegg/catalog/codes1.html
: 4.
Society for Biomedical Diabetes Research, “Genetic Code and Amino Acid Translation”
http://www.soc-bdr.org/rds/authors/unit_tables_conversions_and_genetic_dictionaries/genetic_code_tables/
: 5.
Smita Rastogi and U. N. Dwivedi, “Biomolecules (Introduction, Structure and Functions)”
http://nsdl.niscair.res.in/jspui/bitstream/123456789/763/1/NucleicAcids.pdf

[Contents]