The proteome and protein structure

The complete set of genes in a cell is known as the genome while the full range of proteins a cell can make is the proteome. The proteome can be many times larger than its corresponding genome due to multiple arrangements of the mRNA product (alternative splicing) as well as post-translational modifications that can tweak the final protein in different ways based off the same polypeptide.

So for humans, a rough 20,000 genes are enough to spawn more than a million different proteins.

Proteins are at the heart of living organisms. Their functions are very varied, from the hair on your head, to the haemoglobin in your red blood cells (which carries oxygen around the body), to the claws of a lion, to insulin (blood glucose regulation). All these highly varied polymer proteins are made of their building blocks, monomers called amino acids. This is what the generalised structure of an amino acid looks like (make sure you can draw this):

If you're wondering what this actually is, read on. The clues are in the name (as they usually are).

AMINO - the H2N on the left hand side is an amino group
ACID - the COOH on the right hand side is a carboxylic acid group (simply an acid)

The hydrogen (H) on the bottom is there all the time (just like the amino group and the acid group), while the R group is the variable which determines what particular amino acid this will be. For example, if the R group was a hydrogen, the amino acid would be glycine.

The R group (functional group) on the amino acid also determines the characteristics of that amino acid, including whether it is basic or acidic, polar or hydrophobic. Amino acids can be charged (positively or negatively) or uncharged, polar or non-polar. On asparagine, for example, it is the extra carboxyl (CO) group that makes it polar.

The next diagram shows condensation, and the subsequent formation of a bond between two amino acids (any two). This bond is a peptide bond. The resulting molecule is called a polypeptide.

This video is an excellent tool for understanding the processes by which these amino acids end up in highly structured, complex proteins with varied and important functions within organisms:

The theme of protein structure versus function is really strongly played on in exams, throughout Advanced/Higher biology. The core idea must be learnt, and this is it:

Proteins have a primary, secondary, tertiary and (some only) quaternary structure. 

Protein primary structure is simply the sequence of amino acids in the polypeptide, while secondary structure refers to the conformation of the polypeptide. This can be an alpha helix, a parallel/anti-parallel beta sheet, or a turn. 

The tertiary structure of proteins is their 3D shape which is highly folded and has a unique structure. This structure gives proteins their specific function. For example, if insulin was misfolded, it would cease to function properly. Of course though, the origin of misfolding is likely to be in the primary structure, due to a mutation.

For example, if the gene responsible for coding the amino acid sequence for insulin was mutated, then the insulin's primary structure (which is the string of amino acids) would be different, leading to a different secondary structure, tertiary structure, and ultimately, a lack of proper function.

NB: The tertiary structure of proteins determines their proper function.

Proteins have many types of bonds in addition to peptide bonds, operating at their different levels of complexity. One of these is ionic bonding which takes place between a positive ion (e.g. NH3+) which donates one or more electrons, and a negative ion (e.g. O-) which accepts them.

Ionic bonds are weaker than peptide bonds, but stronger than hydrogen bonds. These are momentary bonds between the partial negative charge of an oxygen atom in relation to an available partial positive charge of a hydrogen atom.

Hydrogen bonds are the weakest, while disulfide linkages, bonds or bridges are the strongest. They are covalent bonds between sulphur atoms. Covalent bonds involve a sharing of electrons rather than exchange like ionic bonds.

These bonds are key to the maintenance and formation of a protein's specific three-dimensional structure. In turn, the structure determines function.

Collagen and Haemoglobin

There are two classes of protein: fibrous and globular based on their structure.

Fibrous proteins don't usually have a tertiary structure at all, simply forming parallel chains of polypeptides, often cross-linked at intervals to maintain a greater overall arrangement that serves in the structure and support of various tissues including hair, nails and collagen. They are mostly not soluble in water.

Globular proteins on the other hand have a tertiary and sometimes a quaternary structure, are spherical hence the name of globular, normally are water soluble and serve in metabolism such as carrying oxygen in the blood like haemoglobin.


Collagen provides support in skin, bones, teeth, tendons and more.

It is very strong and relies on hydrogen bonding to keep its three polypeptide chains together.

The polypeptide chains are made of hydroxyproline, glycine and proline which are different amino acids.


Haemoglobin carries up to 4 oxygen atoms in red blood cells, unloading them in oxygen-poor cells as needed, and replenishing them from the air we breathe, in the lungs.

The extend of its loading with oxygen determines its red colour which otherwise is purple.

The quaternary structure of haemoglobin involves iron ions at its core, 4 of them for each oxygen available for loading. These are called haem groups, and are known in proteins as prosthetic groups. They are of non-protein origin, but bind tightly with the protein to enable its function.

The haem groups are each contained at the centre of 4 polypeptide chains, 2 alpha chains and 2 beta chains. The interactions between amino acid R groups in hydrophobic regions determine the bonding and hence the protein conformation, alongside all the other types of bonding.

Haemoglobin subunits show cooperativity in their function. When the affinity for oxygen of one subunit changes, so does the affinity of the other subunits.

Van der Waals interactions such as hydrogen bonding, previously mentioned, contribute as well as the ionic bonds and disulfide bridges (see previously).

Temperature and pH affect the bonds between the R groups of amino acids. Especially when it comes to proteins that act as enzymes, catalysing reactions in the cell, a loss of function outside optimal temperature and pH is critical.


Increasing temperature results in a higher rate of activity, up to a certain point where the enzyme becomes denatured. A high temperature causes the molecule to vibratebreaking the weak bonds that hold it together, and changing the structure of the enzyme. This process is denaturation. The point at which this happens is usually around 50 - 60 degrees Celsius.

Denatured enzymes don't work. Look at this graph (click to enlarge) to understand the relationship between enzyme activity and temperature:


Binding of the substrate to the enzyme depends on a close match between shape and charge. The pH is a measure of the concentration of H+ ions versus OH- ions. As you can see, these are positively or negatively charged, so a really high or really low pH can disrupt enzyme function. All enzymes have a specific optimal pH at which they work best. This differs between enzymes. For example, while most enzymes work best at a pH of 7.35 (that is halfway between 1 and 14 - 1 is most acidic, 14 is most basic), pepsin in the stomach acid works best at a pH of 3.

Since pH sensitivity can be so high, buffers are used whenever using enzymes. Buffers are solutions containing chemicals capable of buffering sudden shifts in pH. After setting the solution to the desired pH (for example by adding hydrochloric acid to decrease the pH, or sodium hydroxide to lower the pH), it should be quite stable in pH thereafter. Adding other chemicals to the buffer solution, such as an enzyme of interest and its substrates (e.g. lactase and lactose) shouldn't disturb the pH of the solution. This ensures that the enzyme will function optimally.

Protein placement in membranes and cells

The hydrophobic/hydrophilic properties of protein interactions determine their location inside cells. As seen previously, amino acid residues and their variable R groups interact differently between them. Amino acid residues are on the surface of proteins where they can contact other molecules and interact with the chemical environment.

As you might expect, the hydrophilic R groups are the ones on the protein surface because they contact the water in the cell. Hydrophobic groups tend to cluster inwards towards the centre of the protein, resulting in the globular structure of some proteins.

This arrangement applies to soluble proteins. Many insoluble proteins such as membrane proteins might have the opposite arrangement because their surface is embedded into the lipid cell membrane.

The plasma membrane

We can now explore the structure of plasma membranes, specifically in the context of the fluid-mosaic model. Phospholipids have a hydrophilic (water loving) head, and hydrophobic (water repelling) tails. This results in the formation of a phospholipid bilayer (double layer), which forms the basis for the plasma membrane.

The name of fluid-mosaic model comes from:

Fluid = the arrangement of proteins contained in the membrane is always changing
Mosaic = the proteins present are spread around in a mosaic-like fashion.

It's pretty isn't it? The proteins are crucial to cell communication as well as the selective permeability of the membrane. The glycoprotein (sugars/carbohydrates attached to a protein) side chains act as receptorsLipid soluble stuff such as vitamins A, D and K, as well as oxygen and carbon dioxide, can pass freely though the membrane. Cholesterol can be part of the membrane to restrict the movement of other components.

The main properties of molecules that determine how they may be transported across a membrane are solubilitysize and charge.

Large molecules can't cross the membrane, charged molecules also can't, and naturally, lipid-repelling (or water-attracting) molecules can't. Conversely, small molecules can cross the membrane barrier, alongside molecules with no charge (nonionised) as well as lipophilic (hydrophobic) molecules.

In order for the integral proteins of the plasma membrane to stay within the membrane, their hydrophobic R groups contribute to the strong hydrophobic interaction between them and the plasma membrane. Many of these proteins span the width of the membrane, i.e. transmembrane proteins including transporters, channels and receptors.

On the other hand, peripheral proteins that only span a portion of the membrane and the rest is inside/outside the cell have fewer R groups interacting with phospholipids.

Protein interactions and regulation

While some R groups on proteins are involved in their structure, other R groups are free to interact with other molecules. These molecules that may bind to proteins are generally termed ligands, in the chemical sense.

Given the huge variety of molecules as well as proteins, the specificity of ligand binding is determined by the shape and chemistry of these molecules. Binding can only occur if the ligand and protein have complementary shapes as well as compatible chemical properties on the parts of them that are involved in the binding, e.g. charge.

DNA has its fair share of binding to various proteins. It is packaged in eukaryotes by histones, and certain sequences of its double-stranded state can be identified by special proteins for binding to stimulate or inhibit transcription.

Histone proteins have a positive charge that binds readily to DNA which has an overall negative charge.

Many proteins bind to ligands in order to facilitate chemical reactions. These proteins are enzymes.

Enzymes are proteins which catalyse (speed up) metabolic reactions. Like all other catalysts (e.g. in chemistry), enzymes achieve this by lowering the activation energy (energy needed for a reaction to occur) of a reaction, by forming an enzyme-substrate complex.

This can be described by the lock and key, and induced fit models of enzyme action. The lock and key model is based on complementary shapes between the enzyme and substrate. The substrate fits into the enzyme.

The induced fit model: (the enzyme changes shape to "hug" the substrate)

The enzyme's shape is not exactly matched to the substrate, but it is able to accommodate the substrate with a close enough shape into an enzyme-substrate complex and carry out catalysing that reaction. Here is a video of an enzyme catalysing a reaction between two molecules into one molecule. This is different to the above scenario in the diagrams, where one molecule is broken down into two molecules.

Enzymes are proteins, so have a delicate tertiary structure that enables that enzyme's adequate function. High temperature or pH would alter its tertiary structure. Inhibitors would bind to its active site, preventing substrates from doing so. This results in no enzyme-substrate complexes being formed.


Inhibitors are molecules which interfere with the substrate binding to the active site of an enzyme, slowing down or stopping the reaction. These may be reversible or non-reversible inhibitors. The reversible inhibitors can be competitive or non-competitive.

Competitive inhibitors have a similar 3D shape to the substrate, hence they can bind to the active site of the enzyme, preventing the substrate from doing so. It's easy to picture:

The competitive inhibitors compete (as you'd expect) with the substrate for the active sites of the enzymes. If more substrate is added, then the inhibitors' effect will be diminished. This is what the graph looks like (make sure you can recall this):

Non-competitive inhibitors on the other hand bind to the enzyme at a site away from the active site. All good? No, because that results in the enzyme's shape changing. This means the substrate can no longer bind to the active site. Unlike the case of competitive inhibitors, changing the substrate concentration will not have an effect on the rate of reaction. Here is a comparison diagram (learn this):

If you're a video sort of person, here is a nice one:

Phosphorylation of proteins is a main regulatory event, and also a key post-translational modification. The addition of phosphate groups to certain R groups is carried out by kinase enzymes, while the removal of phosphate groups (dephosphorylation) is catalysed by phosphatases.

Activity is regulated via phosphorylation, which can be carried out via ATP. This is exemplified by the action of myosin in muscle contraction.

Binding of ATP (adenosine triphosphate) and hydrolysing it to ADP and an inorganic phosphate (Pi) changes its conformation and enables the crossbridging movement that underlies muscle contraction. Generally, ATP is a general energy currency for the cell. It releases energy when its third phosphate is removed, leaving behind ADP (adenosine diphosphate). The removed phosphate group itself can be used to phosphorylate a protein.

ATPases also use ATP during their phosphorylation.

ATPases break down ATP into ADP + Pi as their actual function, so phosphorylating through the same reaction is handy.

Ok byeeeeeee

<< Previous topic: Laboratory techniques for biologists                                                      Next topic: Membrane proteins >>