- The Human Genome Project was a collaborative effort where scientists uncovered all the genes in the human genome
- This was a hugely ambitious project given the state of technologies at the time
- Sanger sequencing was the key technology for sequencing human genomes, which ivolves using chain terminating nucleotides
- Next generation sequencing technologies are paving the way for huge advancements in the area of human genetics
The Human Genome Project was an international research program involving over 1000 scientists. It was a publically funded project that began in the late 1980s, aiming to map and understand all the genes in the human genome. This can be carried out by determining the order (also called the sequence) of all of the 3.2 billion nucleotides in the genome and characterizing the features of the DNA, specifically by figuring out which sequences code for protein-coding genes. The initial aim was to finish this project within 15 years. The first draft of 90% of the sequence was published in the journal Nature in 2001, the full sequence was published in 2004.
At the time, this was a hugely ambitious project. It was extremely costly (around $13 billion!), and at the time the technology was slow, meaning sequencing the first genome took about 13 years. There was also a lot of controversy surrounding the project, as it was wondered whether the cost of the project would outweigh the benefits. However, the success of the Human Genome Project is clear today. The information it provided and the innovation it sparked has greatly enhanced the way scientists work.
The Human Genome Project determined there are just over 20,000 human protein-coding genes. Interestingly, this is much less than the original estimate of 100,000 protein-coding genes based on the3 number of genes and the size of the genome in bacteria and worms. This differences reflects how complex the regulation of these genes has to be, in order to produce such an advanced organism.
How did they do it? DNA sequencing
DNA sequencing is the process of determining the sequence of nucleotides (A, T, Cs, and G) in a piece of DNA. DNA sequencing involves cutting the DNA into small fragments, and aligning the overlapping regions, allowing the determination of the sequence of whole chromosomes and even whole genomes. Whilst during the Human Genome Project this was complex and time-consuming, the technology has improved so much that this is now not such a daunting task.
In 1977, a scientist named Frederick Sanger developed a technique to sequence DNA, for which he received his (second!) Nobel Prize in chemistry.
Sanger sequencing requires a template DNA to be sequenced, DNA polymerase, the four DNA nucleotides, a primer, and ‘dideoxy’ or ‘chain-terminating’ forms of all four nucleotides (ddATP, ddGTP, ddTTP and ddCTP,). Each of these nucleotides is labelled with a different colour of dye.
The modified dideoxy nucleotides lack an OH group on the 3’ carbon of the sugar ring. When a dideoxy nucleotide has been added to the chain, there is no hydroxyl available to bind the next nucleotide, and no further nucleotides can be added to the chain. Therefore, the synthesised DNA chain ends with the dideoxy nucleotide, which is marked with a particular colour of dye. These are added at a much lower concentration than the regular nucleotides.
The primer binds the template DNA, and DNA polymerase can make new DNA. DNA polymerase continually adds nucleotides to the growing chain until, by chance, it adds a dideoxy nucleotide. The chain can no longer continue to grow, so the DNA strand ends there, with a labelled nucleotide at the final position.
These fragments are run through a special gel matrix in a process called gel electrophoresis, which separates the fragments by size. Short fragments move quickly through the gel, while long fragments move more slowly. Once the fragments have run through the gel, they are exposed to laser, allowing you to detect the colour of the last nucleotide in the chain.
The smallest fragment (which will end just one nucleotide after the primer sequence) passes through the gel first, followed by the next-shortest piece (ending two nucleotides after the primer), and so on.
From the colours of dyes emitted by dye on the chain terminating nucleotide, the sequence of the original piece of DNA can be built up one nucleotide at a time.
Sanger sequencing is laborious and not applicable to larger projects. Since the Human genome Project, huge leaps in technology have been made, these advances are collectively described as next generation methods. There are many different types, but generally they can be performed on many different samples at once, and are very small and completed on a tiny chip. There cost is reduced and a huge amount of sequencing data can be generated quickly and more cost effectively.
The Human Genome Project sparked a new era of molecular medicine, and huge advances in sequencing technologies. We are currently in an era where whole genomes can be readily sequenced, which is likely to have huge impacts on the way medicines are created and diseases are treated.
The Human Genome Project was not the final piece of the puzzle, in fact it was only the beginning of the era of ‘genomics’. The underlying sequence of the DNA is only one part of the puzzle, interpreting the sequencing and characterizing what the sequences are doing is an important research avenue.
References and further reading
http://www.onlinebiologynotes.com/sangers-method-gene-sequencing/ (Image sanger sequencing)