CCEA A2 Topics‎ > ‎

Genome sequencing

As previously touched on, the genome is the entirety of genetic material carried by an individual or species and varies accordingly. The database of genomes of different species is growing and includes humans (the Human Genome Project). For example, the human genome, by chromosome, is viewable here:

Simple genomes

Simple genomes such as those of viruses can enable a relatively straightforward effort of assigning proteins to each gene in the genome, and thus creating a database of them. This is known as a proteome.

The information gleaned from a virus proteome, for example, can inform vaccination targets by selecting appropriate antigens such as elements of the viral capsid.

Other exciting synthetic biology applications can be explored such as glowing beer, synthesising specific compounds useful in medicine or manufacturing using organisms to whom that product isn't native in an attempt to boost production or create new products.

Complex genomes

Analysing and storing information about more complex genomes is hindered by non-coding DNA and regulatory genes. Non-coding DNA and regulatory genes take up the vast majority of this type of genome. This means that the actual protein products that genes code for are in the minority.

The proteomes corresponding to complex genomes, human included, are therefore difficult to build. Sequencing methods themselves have witnessed, and continue to witness a rapid evolution towards faster, more efficient, automated techniques that can yield tremendous amounts of data.


For example, Sanger sequencing has been the main method of sequencing DNA and yielded many variations of itself. The basic concept follows these steps:

1. Mix copies of your target DNA to be sequenced with radioactive nucleotides (with A, T, G or C bases)

2. These nucleotides also prevent further DNA lengthening, resulting in a mixture of different sequence DNA strands complementary to the template DNA

3. e.g. AATGGC creates TTACCG, TACCG, ACCG, CCG, CG and G

4. Run the DNA mixture on a gel to separate the different strands by size

5. Infer their sequence based on the results: the radioactive reading of the different bases (A, T, C or G) alongside the size sequence of the strands (smaller strands run further down the gel while larger strands stay towards the top, where they were loaded)

The sequence obtained can then be converted into the amino acid sequence of the protein it encodes, if the sequence belongs to a gene. Looking at the amino acid sequence can be used to compare various conditions, for example if a variation of a protein amino acid sequence is associated with a blood disorder.

The speed and cost of genetic sequencing has been regarded to follow a trend similar to that of transistor speed and cost known as Moore's Law which predicted that speed would double as price halved. So far it has held true for DNA sequencing and is known as The Carlson Curve.

Fighting disease with non-human genomes

Having access to the genomes of other species can elucidate knowledge about the mechanisms of action of various metabolic pathways and proteins, and the relationship between these things between different organisms. For example, the genomes of malaria-causing Plasmodium falciparum as well as its vector, the mosquito Anopheles gambiae have been sequenced. This data can help develop better ways of controlling malaria.

Genetic information on the parasite can help edit its genome in order to produce attenuated versions for the purpose of vaccination. Alternatively, the mosquito which carries the parasite to humans could be modified so that it's no longer capable of transmitting the malaria agent to humans.

Pests of crops of interest to humans can also be better tackled through information from sequenced genomes.

Research species are often sequenced first, such as the ubiquitous "lab rats", flies (Drosophila melanogaster), worms (Caenorhabditis elegans) and frogs (Xenopus laevis).


As briefly touched upon in the introduction to this chapter, genomics (the study of genomes) is emerging as a key scientific field in terms of addressing disease and learning more about health. Within healthcare, genomics has the potential, and has already begun, to support risk predictionpreventiondiagnosistreatment in terms of drug choice and dosage, and prognosis.

Genomic medicine started in the areas of oncologypharmacologyrare and undiagnosed diseases and infectious disease.

Risk prediction is employed by studying associations between certain diseases and the presence of specific genes preferentially in that patient population. Sometimes, especially for rare disease that tend to have a single genetic root, it's possible to know the mechanism by which that mutation causes a disease. However, other times this isn't elucidated and all we can work with is the knowledge that, for whatever reason as of yet unknown, the association stands. It gives a patient a percentage increased lifetime likelihood of developing a certain disease.

One example are the BRCA1 and BRCA2 alleles whose protein products are involved in DNA repair in cells, acting as tumour suppression genes. Different variations of these genes have been linked to a 20-60% increased risk of breast and ovarian cancer. 

Prevention can then take place by paying close attention, just by being aware of the increased risk, or in some cases, preventative interventions such as taking certain drugs or elective surgeries. In pharmacology, knowledge of increased risk of side effects from certain drugs can inform patients to avoid them or take an alternative drug. This ties in with treatment, and a patient's option to take a drug they will personally have a better response to, or at a better tailored dose. For example, fast metabolism of a drug may mean they will have to take it more frequently as their body is breaking it down more quickly.

Prognosis is about knowing the likely outcome of a condition. This can connect back to the drugs taken and response to those, or refer to how a disease might develop. For example, in the case of some disease there are multiple variations in genes with different outcomes. This could be in terms of the likelihood of getting a disease, as well as in terms of disease severity and progression.

<< Previous topic: Gene therapy                                                               Next topic: Ethics and safety of gene technology >>