Control of gene expression

I'd like private tutoring!

While the genotype represents the genetic information contained by the totality of someone's DNA, the phenotype represents the products from DNA and genes that actually get expressed (many genes are not expressed) to make an organism appear and function the way it does. These products are in the most obvious and tangible sense, proteins. Proteins are at the heart of biological organisation in both structure and function, and span a huge variety of types, from strong cartilage and protective keratin, to efficient digestive enzymes and oxygen-carrying haemoglobin.

The pathway between genes and proteins involves two key events, and a few more smaller steps. The key events are transcription and translation. DNA itself doesn't get too involved in the process. It is a permanent hard copy, and cannot be subjected to the ups and downs of biology, especially in terms of regulating the processes by which proteins get made based on genetic information. Factors affecting these processes operate both inside cells and outside of them.

Therefore, a reversed copy of the relevant gene is made via messenger RNA (mRNA). This is transcription. The much more complex process of using this code to build complex proteins is translation. It assembles proteins step by step, using protein building blocks - amino acids. Each 3 consecutive mRNA bases code for 1 amino acid in a protein. Hence, the 3 bases are known as a codon.


Proteins are at the heart of living organisms. Their functions are very varied, from the hair on your head, to the haemoglobin in your red blood cells (which carries oxygen around the body), to the claws of a lion, to insulin (blood glucose regulation). All these highly varied proteins are made of their building blocks - amino acids. This is what the generalised structure of an amino acid looks like (make sure you can draw this):

If you're wondering what this actually is, read on. The clues are in the name (as they usually are).

AMINO - the H2N on the left hand side is an amino group
ACID - the COOH on the right hand side is a carboxylic acid group (simply an acid)

The hydrogen (H) on the bottom is there all the time (just like the amino group and the acid group), while the R group is the variable which determines what particular amino acid this will be. For example, if the R group was a hydrogen, the amino acid would be glycine.

The next diagram shows condensation, and the subsequent formation of a bond between two amino acids (any two). This bond is a peptide bond. The resulting molecule is called a polypeptide.

This video is an excellent tool for understanding the processes by which these amino acids end up in highly structured, complex proteins with varied and important functions within organisms:

The theme of protein structure versus function is really strongly played on in exams, throughout A level biology. The core idea must be learnt, and this is it:

Proteins have a primary, secondary, tertiary and (some only) quaternary structure. The tertiary structure of proteins is their 3D shape which is highly folded and has a unique structure. This structure gives proteins their specific function. For example, if insulin was misfolded, it would cease to function properly. Of course though, the origin of misfolding is likely to be in the primary structure, due to a mutation.

For example, if the gene responsible for coding the amino acid sequence for insulin was mutated, then the insulin's primary structure (which is the string of amino acids) would be different, leading to a different secondary structure, tertiary structure, and ultimately, a lack of proper function.

NB: The tertiary structure of proteins determines their proper function.

Proteins have many types of bonds in addition to peptide bonds, operating at their different levels of complexity. One of these is ionic bonding which takes place between a positive ion (e.g. NH3+) which donates one or more electrons, and a negative ion (e.g. O-) which accepts them.

Ionic bonds are weaker than peptide bonds, but stronger than hydrogen bonds. These are momentary bonds between the partial negative charge of an oxygen atom in relation to an available partial positive charge of a hydrogen atom.

Hydrogen bonds are the weakest, while disulfide linkages, bonds or bridges are the strongest. They are covalent bonds between sulphur atoms. Covalent bonds involve a sharing of electrons rather than exchange like ionic bonds.

These bonds are key to the maintenance and formation of a protein's specific three-dimensional structure. In turn, the structure determines function.

Transcription and translation

Having already covered the basics of DNA, let's turn our attention to the principles which govern what actually happens to DNA and how this results in life being the way it is!

The Dogma

DNA is a large molecule made up of variable bases (adenine, thymine, cytosine, guanine). The precise sequence and location of these bases determines what structure a second molecule, mRNA (messenger RNA) has once it's "read" the template DNA. In turn, the sequence and location of mRNA bases determines what amino acids will be chosen in the assembly of a given protein that the original DNA encoded for, once it reaches a ribosome and is constructed by tRNA (transfer RNA)


mRNA stands for messenger ribonucleic acid. DNA is deoxyribonucleic acid, and the only difference really is in the sugar in the backbone. A more important difference is that mRNA is single-stranded unlike double-stranded DNA. Additionally, instead of the base thymine, mRNA uses uracil. So while adenine pairs up with thymine in DNA, it pairs up with uracil in mRNA. Knowing that, the mRNA derived from this DNA (looking at the top strand) would be as follows:

DNA:    ATGGGTACAAATGC (top strand)
             TACCCATGTTTACG (bottom strand)

mRNA: AUGGGUACAAAUGC (single strand)

As you can see, both the top DNA strand and the mRNA are complementary to the bottom DNA strand (in reality either top or bottom may be read, but for simplicity we only look at the top strand whenever it's given - we assume that is the gene of interest). Therefore the top strand may be called the coding strand (or sense strand) while the bottom is the template strand (or anti-sense strand). It's called template because it's the bit of DNA used to actually build up the mRNA according to. The result? The coding strand of DNA except that T is replaced by U!

How is mRNA read? An amino acid is coded for by 3 bases in a row. These are called triplets. AUG codes for methionine (Met) which happens to be the amino acid which signals that a new gene starts, if at a certain position within the overall code. Therefore it's known as a start codon.

The 3 Secrets of mRNA/DNA

There are 3 key properties of the genetic code which regulate its activity.

1. The genetic code is universal. That's right, the 4 bases are the same in all living things - humans, apples, worms, swans, oak trees, etc.! Moreover, the amino acids coded for by these bases are also completely the same, so AUG codes for the amino acid methionine in all living organisms.

2. The genetic code is non-overlapping, so if you have an mRNA AUGCGA it would be read "AUG", "CGA" and not "AUG", "UGC". The amino acids obtained would be methionine and arginine (Arg).

Tables and diagrams showing you what codes corresponds to what amino acids are widely available and you won't be expected to memorise them.

In addition to the start codon methionine, there are multiple stop codons such as UAG and UGA. These signal where the code can stop its translation into the amino acid sequence.

3. The genetic code is degenerate. That might sound slightly offensive, but bear with! Look above, what do the triplet codes UGU and UGC code for (start reading from the inside out by picking each letter)? They both code for cysteine (Cys). How about CUU, CUA, CUC and CUG? They all code for leucine (Leu). This property of different triplet codes coding for the same amino acid is why the genetic code is termed degenerate.

Additionally, some DNA (and in many organisms most of the DNA) does not actually code for amino acids at all. Some repeats many times over, some has regulatory functions, and some has yet to be cracked in terms of its role in the overall function of the organism.

tRNA (Transfer RNA)

We know DNA is double-stranded and uses A, G, C and T bases, while mRNA is single-stranded and uses U instead of T. What about tRNA? Well, tRNA is a very different soup indeed.

It's clover-shaped and uses the same bases as mRNA. It is single-stranded, and where one part of the strand meets another there are hydrogen bonds between bases just like in DNA except that in DNA there are 2 strands bonded rather than 2 parts of the same strand).

At the top of tRNA as seen above there is an amino acid binding site (P is seen as attached), while at the bottom there is an anticodon - in this case it's GAA. The anticodon is complementary to an mRNA codon (triplet code - in this case it would have to be CUU).

Proteins are made up of amino acids linked by peptide bonds, therefore a protein may be referred to as a polypeptide (of course, some proteins such as haemoglobin have extra bits to them). All are encoded for by the information stored in DNA. Let's see how exactly this happens.

Transcription: DNA to mRNA

In a process called transcription, mRNA is formed based on DNA. The bases on the coding strand of DNA are transcribed into a new molecule, mRNA, which is synthesised by the enzyme RNA polymerase

Wanna see more detail?

As you can see, the DNA double helix unwinds, RNA polymerase anneals to the coding strand and recruits freely available bases (A, U, C, G) to build an mRNA strand.

Splicing: pre-mRNA to mRNA

In eukaryotes, genes contain non-coding sequences which must be removed before mRNA is used to produce proteins. These are called introns as opposed to exons which are coding sequences. Splicing therefore is the process of excising (cutting out) introns to be left with mRNA containing purely coding sequence.

This process can result in several different mRNA products from the same DNA sequence. If the introns and exons are arranged differently, the mRNA will code for different amino acids. It's termed alternative splicing.

Since these two possible mRNA products code for different amino acids represented by the different colours (red-yellow-blue versus red-green-blue), the resulting protein after translation of mRNA could function differently. If an enzyme, it may affect its ability to catalyse its reactions, or its efficiency. Equally, the change could not make a difference in another scenario at all.

Translation: mRNA to tRNA

The resulting mRNA finally leaves the nucleus where the above business had been taking place, and arrives in the cytoplasm where the final step takes place. More specifically, in ribosomes. Each mRNA codon is matched against an anticodon on tRNA, which is matched to its respective amino acid. This binds to the next amino acid and so forth, until a polypeptide is made.

Polypeptides often undergo further modification and combination to form the fully-fledged protein itself. For example, haemoglobin is made of 4 polypeptide chains. Some common post-translational modifications of polypeptides include phosphorylation and glycosylation, the respective addition of phosphate groups and sugars to proteins.

Control of gene expression

In eukaryotes, epigenetics refers to the heritable changes in gene function that do not involve any change to the DNA sequence. This underpins an embryo's ability to differentiate its cells into specialised lineages for different organs and tissues in the adult: skin tissue, muscle tissue, nervous tissue, etc.

Transcription can be inhibited by specific means. A common way is increased DNA methylation. The methyl (CH3) group acts as a tag on the DNA at various locations and prevents transcription that might've occurred otherwise.

Another chemical modification that can induce epigenetic effects and control gene expression is histone deacetylation. Histones hold the DNA chromatin and help to compress it. In its acetylated state, it is relaxed and the DNA can be accessed by transcription machinery. Deacetylation results in the tightening of chromatin around the histones, no longer making the genetic material accessible.

Knowledge of epigenetics can help in addressing various illness including cancer. Controlling gene expression remotely is much easier than having to change the DNA sequence itself. Drugs can act as signals for specific genes to be activated or deactivated. In the case of cancer, it has been shown that cancer cells switch off the genes associated with tumour detection. They also show additional epigenetic anomalies such as histone modifications and deregulation of proteins that bind DNA.

RNA interference (RNAi)

A major component in the regulation of transcription and translation is RNA interference, notably via microRNA (miRNA) and small interfering RNA (siRNA).

miRNA is a sequence complementary to a portion of transcribed mRNA. Upon binding a complex protein, it attaches to the section of target mRNA, thus blocking translation as well as speeding up the eventual breakdown of the mRNA strand.

As for siRNA, it does what it says, it interferes and it's small! What does it interfere with? It interferes with translation by binding to mRNA and cleaving it. This prevents it from being translated in the cytoplasm via tRNA and ribosomes to produce a polypeptide. Therefore the specific gene it codes for is not expressed.

siRNA is a short, double-stranded fragment of RNA which binds and cleaves mRNA through a RISC - RNA-inducing silencing complex. This is the same Dicer processing enzyme and the RISC protein complex involved in the miRNA pathway because miRNA and siRNA share the same machinery after they're synthesised.

<< Previous topic: Replication of DNA                                                                               Next topic: Cellular differentiation >>