Lock and key

ScienceDuuude
Apr 27, 2020
8 min read

Updated: May 4, 2020

The SARS-CoV-2 coronavirus causing the current COVID-19 pandemic uses its spike protein complex like a key to enter the human cell.

You can see the spikes in the electron micrograph creating the corona or crown surrounding the blobish membrane-bound body of the virus itself. This electron microscope image is by the National Institute of Allergy and Infectious Diseases (NIAID) at the Rocky Mountain Lab. This false-color image shows the spikes in green and the membrane-bound viral body in yellow.

I’m near sighted. And although you can click on the top right side of the image to zoom in, it will always remain frustratingly fuzzy to you, which is how I see the world no matter the distance.

But we do know in molecular detail what that spike looks like. In March of this year, folks from the University of Texas, Austin and NIAID were able to use a method called cryo-electron microscopy to obtain a model of the spike that previously would have taken x-ray crystallography to obtain. This is what the spike protein, the viral key to our human cells, looks like.

That’s a side view on the left and a top-down view on the right. The colored bar at the top is like a map of the protein’s amino acid sequence relating the sequence position to the physical location of those amino acids in the actual 3D protein itself.

We can generate an image not too far from that structure ourselves from the comfort of our home. Check it out.

The spike protein is encoded within the RNA sequence of the virus. And we have the viral sequences available to us here in the National Center for Biotechnology Information (NCBI):

https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/

If you click on the above link and then scroll down that page a bit you run into a table with some other links you can click on, on the left-hand side.

Each of those links takes you to another page which contains the details of a specific isolate of the SARS-CoV-2 virus.

Say you click on the first link in that table labeled “MN908947” and it takes you to this page:

https://www.ncbi.nlm.nih.gov/nuccore/MN908947

At the very bottom of that page is about 30,000 letters of the RNA code, made up of letters A, G, C, and T. Yes, I know, RNA is made of the letters A, G, C and U – but in order to sequence RNA we need to convert it to DNA first. So the NCBI database is representing the actual viral RNA code in the form of the actual DNA code which was read.

Anyways, the database above is nice enough to translate the RNA code into each protein’s amino acid sequence, and to label and specify the sequence of the spike protein, called “S”. It’s buried in the middle of that page so let me copy it for you here:

"MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFR

SSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFASTEKSNIIR

GWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVY

SSANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQ

GFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFL

LKYNENGTITDAVDCALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITN

LCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCF

TNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYN

YLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPY

RVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFG

RDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQDVNCTEVPVAI

HADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPR

RARSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKTSVDCTM

YICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFG

GFNFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFN

GLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRFNGIGVTQN

VLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFGA

ISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMS

ECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICHDGKAH

FPREGVFVSNGTHWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPELD

SFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELG

KYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCGSCCKFDEDDSE

PVLKGVKLHYT"

That wasn’t so bad was it? Microsoft Word says there are 1,273 letters in that sequence. Since each capital letter represent an amino acid, there are 1,273 amino acids making up the viral protein we call S. The question is, can we determine just from that sequence how it will scrunch and fold up - and will it look anything like the actual image of the protein taken using cryo-EM?

Well, there’s a program called Phyre (Protein Homology/Analogy Recognition Engine) we can access online and that can help us here:

http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index

We can copy and paste the entire amino acid sequence for our viral protein of interest into that tool, and Phyre will email us back a .pdb file containing the proposed structure. We can visualize that .pdb file using one of many free online structure visualization tools - one I used is called EZmol here:

http://www.sbg.bio.ic.ac.uk/~ezmol/

Here is a screenshot of Phyre’s proposed S protein structure in the form of a ribbon diagram, similar to what we saw from the Science paper.

Wait, that looks nothing like the cryo-EM structure I hear y’all saying.

But wait.

The spike protein forms what is called a trimer. Three of these individual S proteins bind together to form the more complex unit visualized in the Science paper. In that figure from the paper, one of the trimer units is represented in the color-coded ribbon structure, one is shaded gray, and the third is shaded white. Each trimer is an individual spike embedded in the membrane of the virus.

Do you now see how the two arms of the “Y” in this structure here are like the blue and green domains in the cryo-EM structure? And telephone-wire-like coil structures in the stem of the Phyre structure are very similar to the yellow and red colored domains in the cryo-EM image?

I think that is not too bad for some couch-potato structural biochemistry. I do think we got lucky here, and in many cases a program like Phyre is at best a first guess at what a protein structure might look like. Experimental methods like x-ray crystallography, nuclear magnetic resonance, and more recently cryo-EM, are still necessary to really understand protein structure.

We’ve been focused on the key so far. How about the lock?

I should say here that the lock and key analogy, although it is ubiquitous when discussing protein biochemistry, is a very imperfect analogy – there is nothing hard and metallic and solid. Proteins are blobbular wiggly jello-like things – and although they operate in an almost automated rapid-fire machine-like manner – we have to remember that everything is moving and jiggling about. Even the machine.

The lock is a protein called ACE2 which protrudes from many of our cells. The physical interaction between the viral spike protein and the human ACE2 protein somehow triggers the virus to be engulfed by and thus enter the cell.

Wikipedia says that ACE2 is an enzyme which lowers blood pressure by cleaving a short protein that causes blood vessels to constrict (vasoconstriction), into another short protein that causes blood vessels to open up (vasodilation). ACE2 is located on the surface of cells found in a wide range of organs such as the lungs, arteries, heart, kidney, and intestines - but also in the smooth muscle cells that surround most organs. Importantly to new data on the clinical effects of SARS-CoV-2, ACE2 is also found in the cells lining the vasculature, as well as in the brain.

You may have heard… a few young people infected with SARS-CoV-2, but often displaying only mild COVID-19 symptoms, are suffering catastrophic strokes:

https://www.sfgate.com/news/article/Healthy-people-in-their-30s-and-40s-barely-sick-15224874.php

There is a morbidly fascinating article on the Science Magazine website that discusses new information on what the novel coronavirus does to some of its victims:

https://www.sciencemag.org/news/2020/04/how-does-coronavirus-kill-clinicians-trace-ferocious-rampage-through-body-brain-toes

We’re seeing that SARS-CoV-2 attacks the vasculature every bit as much as the lungs for which it is known and named, thanks to the presence of ACE2 in all those cells.

Do we know what ACE2 looks like? It appears we do partially, check it out here.

A couple things to note.

First. You see two almost identical images next to each other. Almost – because they are not exactly identical. Pairs of images like these are called stereoscopic images – you can use special glasses – or if you squint juuust right – the two images should blurrily merge into a semblance of a 3D image. Try it.

Second. The pink structure in A is the same as the blue and red structure in B, just different representations of it. A shows the trajectory if you followed a particular central carbon atom (called the alpha carbon) in each amino acid like links in a chain. B shows the ribbon diagram that we already showed above.

Third. This data was collected using the more traditional x-ray crystallography method (the study was published in 2004) – the same method that Jim Watson and Francis Crick used to obtain the structure of DNA. In order to get x-ray data, you need to form a crystal. Protein crystals are notoriously difficult to make since they do not have the beautiful planar faces of say salt or sugar molecules. And proteins such as ACE2 which have a portion that embed in the cell membrane are even more difficult to crystallize.

One way to think about that is that the cell membrane is made up of fatty acids… it’s oily and repels water (hydrophobic). For a protein to be able to have part embedded through the cell membrane, that domain of the protein must have similar properties of hydrophobicity and therefore must have a large number of hydrophobic amino acids, like phenylalanine, tyrosine, tryptophan, etc. Proteins with significant hydrophobic domains tend to agglomerate during the crystallization process, much like how oil will form droplets in water. That blobbing together prevents nice clean crystals from forming. By the way, cryo-EM eliminates this problem by not requiring the formation of crystals. Cryo-EM is still a difficult method, but forming crystals is even harder (and currently impossible for proteins like ACE2).

So, the structure of ACE2 that we have above is a truncated form of the protein missing the domain that sticks into the cell membrane. The dotted line towards the bottom of each structure of ACE2 represents the part of the protein that was cut off. Removing the oily transmembrane domain of ACE2 makes it possible to form crystals.

What we see in the figure above is called the extracellular domain, the part sticking out of the cell, the lock that the viral key matches up with.

So we might as well ask – do we know what it looks like when the lock and key are bound together?

It turns out we do.

This is a screenshot for a preprint (thus the watermark across the image) of a paper that looks at the SARS-CoV-2 spike protein bound to the human ACE2 protein. ACE2 is green, and the viral spike protein is in blue and red below. This is not a stereoscopic image, but a front and back image.

This data was obtained by x-ray crystallography, so it uses only extracellular parts of the viral S and the human ACE2 proteins. These ribbon diagrams might look to be as precise as a bunch of wood shavings from your basement shop, and are somewhat misleading in that regard. The resolution of these techniques are at the atomic level (at least the larger atoms like carbon and oxygen). This paper for example was able to zoom in for us right into the interface between these two proteins. Check it out here. The gray mesh represents the boundary of the electron-dense regions of the protein and could be loosely thought of as the “surface” of the protein. The green (ACE2) and red lines (S protein) on the left represent again the alpha-carbon trace of the amino acid sequence. The thinner green and red lines on the right represent a stick model of the whole amino acids making up the proteins in the field of view.

Although we are not there yet, this atomic level of resolution for the structures of these protein locks and keys should enable us to generate therapies, whether small drugs that target this interaction, or antibodies that target unique features within the key’s ACE2 binding surface.

This is not a slam dunk though. We need to consider that it has been a couple decades since the genomic revolution triggered by the human genome project, and neither this nor the atomic resolution of proteins, or bioinformatics, or a dozen advances, has led to a gold rush in new drugs. We have had some successes – but they are far less than the golden age of gene therapies or medical technologies that I think many of us were imagining years ago.

Biology continues to be a hard subject.

COVID-19 shows us how hard it can be.

Update 2020 05 04

Here is a really nice article in Chemical & Engineering News about structural biochemistry in the coronavirus age, and how the technology of obtaining protein crystal structures has evolved over the years:

https://cen.acs.org/analytical-chemistry/structural-biology/structural-biologists-revealed-new-coronaviruss/98/i17

Following a meandering path...

Science and Medicine and History

Oh My!

Lock and key

Recent Posts

Comments