Protein Sequencing Explained

Protein Sequencing Explained

Proteins are complex biomolecules that are found in all living organisms. They perform a wide variety of functions and are versatile in nature. Their diversity in function is matched by their wide range of structures.

This led scientists to study their structures. One of the major steps in this process is protein sequencing.

What is Protein Sequencing?

Proteins consist of chains of amino acids
Proteins consist of chains of amino acids. (Source)

To understand what protein sequencing is, we need to know more about what proteins are.

Proteins consist of building blocks called amino acids.

These amino acids are strung together in a sequence, forming what is called the primary structure of the protein. This primary structure undergoes higher levels of folding to become secondary, tertiary, and higher levels of structure.

Thus the first step to understanding what the protein’s structure is would be to understand its primary structure, i.e. sequence of amino acids. Thus the process is called protein sequencing.

How do Amino Acids Form a Protein?

Amino acids are usually attached in long chains called polypeptides. This could be compared to a string of colorful beads, each different color representing a different amino acid.

All the amino acids in the middle are attached to one amino acid on either side.

However, the front and back amino acids are free as they do not form a loop. These free amino acids are called, in scientific terminology, the N-terminal and C-terminal amino acids. These have one free end each. This comes in useful when it comes to sequencing.

How proteins are made up of amino acids.
How proteins are made up of amino acids. (Source)

Developments in Protein Sequencing

Sanger’s Method: Finding the Terminals

Frederick Sanger is an illustrious biochemist, whose contributions to the field earned him two Nobel Prizes in Chemistry. His work on protein sequencing in the late 1940s and early 1950s was groundbreaking.

The scientific community at that time was uncertain about protein structure as we know now. There was no known method that could reliably identify even the ends (“terminals”) of the proteins – sequencing was a far off dream.

However, Sanger worked on the matter, along with other scientists, under Albert Chibnall, to study how amino acids could form proteins. His protein of interest was insulin, specifically bovine (cow) insulin.

His early work in the late 1940s involved finding suitable chemicals that could help identify the N-terminals of the proteins. After several fruitless searches, he found that FDNB (fluorodinitrobenzene) that could suitably attach to N-terminals of proteins and thus identify them.

The complete procedure involved using FDNB, then breaking the polypeptide, and using separation methods such as chromatography to identify the pieces.

The success of this method let Sanger turn his attention to finding the entire sequence of the protein.

Sanger Again: Unravelling the Protein

Sanger realized that some enzymes could be used to break the protein only at specific places. Let us use the string of beads analogy to understand this.

Consider that beads of five colors – red (R), blue (B), green (G), yellow (Y), and pink (P) – form the string. Let us represent the string as GRPBGYPBRPBGBBRYP.

Sanger proceeded to cut the protein at different places using different enzymes. As each enzyme had different spots to cut, this gave various fragments.

Using the beads analogy again, consider enzyme 1 cuts the string only between R and P in that order. This would give the pieces

Now suppose enzyme 2 cuts the string only between P and B in that order. This gives

These steps are done for many enzymes, so different kinds of pieces are obtained. Now, the pieces are compared using chromatography.

As the ends (terminals) are already identified using the previous method, one can find what the overlapping sections are.

For example, in the above case, PGYPBR and BRP are overlapping sections from different enzyme cuts.

Sanger’s method for protein sequencing: an overall view. (Source)

Each overlapping piece is studied further by breaking it down and doing further chromatography. The information obtained then can be used to work back to the structure of each piece.

Finally, knowing which piece overlaps with which helps find out the overall order of the pieces also. Thus we can arrive at the sequence of the total protein.

Sanger used this method to great success in the early 1950s to correctly obtain the sequence of bovine insulin.

Arriving at Different Solutions: Edman Degradation

Pehr Edman was a Swedish biochemist. He designed a procedure in the same era (early to late 1950s) to sequence proteins. In this process, chemicals instead of enzymes were used.

His procedure simply involved using a chemical to attach itself to the N-terminal amino acid of the protein. Then suitable conditions were set up so only that one terminal amino acid got separated, leaving the rest of the protein undamaged. Now, this amino acid was identified separately.

Meanwhile the newly shortened protein has a new N-terminal in place of the removed amino acid.

The procedure was repeated, taking away the new N-terminal amino acid again, step by step. The protein was thus sequenced one amino acid at a time.

Edman degradation: an overview of the method
Edman degradation: an overview of the method. (Source)

While Sanger’s method used enzymes to break the protein into various parts and use some deduction to get back the original sequence, Edman’s procedure was more direct. It did not involve multiple methods, just one step that was repeated over and over again.

It also directly gave the sequence, because the procedure involved starting at one end and breaking the protein step by step.

This efficiency meant that other scientists such as Stanford Moore and William Stein developed automatic protein sequencers (called sequenators) that could use this procedure automatically.

A downside of this method is that it is not reliable for polypeptides (science speak for pieces of proteins) longer than 50 or so amino acids.

So in practice, big proteins are often broken into smaller pieces first (like in Sanger’s method) and then Edman’s degradation process is used.

More Advanced Methods

These early methods made protein sequencing a real possibility. They enabled scientists to study the primary structures of proteins of several sizes. As technology and scientific knowledge developed, new methods arose, to deal with larger proteins.

One method is mass spectrometry. In this process, the protein is broken into small chunks which are then studied by a piece of equipment called a mass spectrometer. The procedure can deal with really large proteins and give quick, accurate results. This has been combined with the use of the internet of late to give huge databases of protein sequences that let us arrive at the correct sequences really fast.

Protein sequencing by mass spectroscopy.
Protein sequencing by mass spectroscopy. (Source)

Another method involves the link between DNA/RNA and proteins. A basic principle of molecular biology is that DNA guides the production of RNA, which in turn gives rise to proteins. If we know the DNA sequence and certain other factors, we could predict the output protein, and hence sequence it.

This procedure often involves using the Edman degradation on part of the protein, identifying that sequence, then matching up with DNA to find out the rest of the sequence. Incidentally, the earliest procedures for DNA sequencing were also discovered by Sanger.

Protein sequencing is a very important tool for biochemists, enabling them to understand the primary structure of the protein. This leads them further in finding out the higher-level structure of the protein and its function, a very exciting field of research.

Recommend0 recommendationsPublished in Blogs, Life Science

Related Articles


Your email address will not be published. Required fields are marked *