I. Looking backwards to when life began
“…all the organic beings which have ever lived on this earth may be descended from some one primordial form.” — C. Darwin
"In the beginning...". The search for origins is a universal quest that probably predates modern humans. Where did we come from? How did life begin? Who built our world? The answers to those fundamental questions have until recently always taken the form of stories and myth, been expressed in song and dance, inspiring reverence and awe.
Only recently have we had the technology to approach the scientific truth of the matter. Our understanding of human origins and the origin of the universe progressed rapidly even in early 20th century science, thanks to fossil bones and ancient radiation, respectively. But the origin of life on Earth has not yielded as readily to scientific inquiry, thanks to a lack of fossil and other geological evidence.
One special beneficiary of technology is our search for LUCA, the last universal common ancestor. Technology gives us a way to look backwards from today towards the origin of life. LUCA is a hypothetical construct built from many thousands of genes composed of thousands of nucleotides (the letters of the genetic code) whose sequence we have extracted from thousands of different species living today. This data-heavy, computationally-intensive search is just one of the fascinating paths in our search for life’s origins. Aside from the geewiz technical aspects of this quixotic search, this approach has surprisingly deep roots.
The most prescient and influential beginning of the scientific search for life's origins came from Soviet biochemist Alexander Oparin and British scientist J. B. S. Haldane. They proposed independently that natural abiotic (non-living) processes acting on carbon dioxide, ammonia, water, methane, and other simple gases that existed in a primitive, oxygen-less atmosphere, led incrementally to life. This was in the early 1920s and 30s.
Stanley Lloyd Miller, a Ph.D. candidate in Harold Clayton Urey’s chemistry laboratory at the University of Chicago, conducted his famous abiotic synthesis experiment in the early 1950s. This experiment recapitulated a primitive Earth as they imagined it. A flask of boiling water simulated a primitive ocean. Electrical discharges simulated lightning. A mixture of simple gases including hydrogen, water, ammonia, and methane simulated the primitive atmosphere. All were hermetically connected within a system of glass tubes and flasks. Within a week, Miller found that his tiny ocean turned a deep red with accumulated organic compounds including simple amino acids, the building blocks of proteins.
Miller and Urey for the first time provided an experimental basis for the speculations of Oparin and Haldane three decades earlier. Their early experiment has inspired countless other labs to follow suit and constitutes one major arm of the search for the origin of life that continues to this day. One weakness of this approach is that we can only infer that some abiotic organic synthesis is feasible because it was simulated on a benchtop – but this lab-based method cannot say that is how life actually arose.
In the 1970s a team led by John Corliss, Richard von Herzen, and Robert Ballard discovered submarine hydrothermal vents in the Galapagos Rift in the southern Pacific using the deep-sea submersible Alvin. Corliss was one of the crew in the Alvin when they discovered bizarre and amazing creatures like giant tube worms, clams, mussels and limpets thriving around deep-sea volcanic vents spewing poisonous superheated gas- and metal-laden water. (Ballard was a diver on the team who later went on to fame by discovering the Titanic and other historical wrecks.). Importantly, the team discovered high concentrations of hydrogen sulfide dissolved in the waters surrounding these vents, as well as tremendous concentrations of sulfur-metabolizing bacteria. Huge concentrations of bacteria, thriving in and on the hot minerals within these vents, spilled out and fed the masses of unique deep-sea vent creatures. Corliss linked these discoveries to the possible origin of life in the deep ocean’s volcanic vents, rather than in the surface environment as Miller tested.
With such a variety of possible terrestrial origin of life scenarios, a question recurs: did life’s constituents arise wholly on Earth?
The first simple organic (carbon-containing) compounds in interstellar space were discovered in the 1930s. Since then scientists have discovered an increasing number and complexity of organic compounds, both in interstellar space as well as on comets and meteorites within our solar system. But most importantly, extraterrestrial organic compounds directly relevant to life have been found, including amino acids (building blocks of proteins), nucleobases (components of DNA and RNA), an essential sugar, ribose (the backbone of RNA), and fatty acids (which can make the membrane encapsulating the first protocell).
The ubiquity of complex organics in space is compelling, but questions remain whether meteoric debris was an important source of life’s building blocks. Did sufficient organic compounds survive the temperatures of entry and impact on Earth to seed the oceans or ponds or mudflats or where ever life arose?
Traditional evidence to support evolutionary processes and ancient life has always been geological. But a geological record of life’s origins faces two problems. The depths of time wherein life must have arisen, and the geological activity of our Earth. We have debatable evidence of life as far back as 3.85 billion years ago. Clearly, our search for origins has to look farther back in time than the furthest fossil evidence of life. But we cannot go so far back in time that Earth was not conducive to life, and one of the critical requirements for life is water. We have evidence that water existed on earth as far back as 4.3 billion years ago. Life must have originated between these dates. The oldest rock we know of is 4.02 billion years old. Over this period, mountain ranges have risen and fallen like waves beating against a shore. This combination of age and activity erases not only rare fossils that might have preserved evidence of the origin of life, but also destroys the very geology containing the fossils.
There may be evidence to scrape up nonetheless. Examination of carbon isotopes has been one way to infer a biological origin of carbon deposits within the oldest of rocks.
Another line of evidence lies in zircon geochemistry. Zircons are practically indestructible little crystals, oxides of zirconium and silicon, which contain evidence of the environment in which they formed in igneous rocks. A plume of magma may solidify into a basalt or granite containing these zircons. When those mountains are worn down to make a layer cake of sedimentary rock, say a sandstone, the zircons are still in there. When those sedimentary layers are buried under miles of debris from generations of worn-down mountain ranges, and the temperatures and pressures of burial metamorphoses those sediments into perhaps a quartzite, the zircons are still in there. When tectonic pressures thrust that quartzite back up into the thin reaches of the atmosphere, only to be worn back down by glaciers and rivers and time, and to be compressed under miles of delta mud, the zircons are still in there. Tiny time capsules in which the conditions of their birth are frozen, the clock of radioactive isotopes still ticking. Waiting for us to learn how to extract that buried knowledge.
Last, but not least, there is the evidence lying within living things today. This is the evidence we use to find and build a model of LUCA.
II. The tree of life
“The affinities of all the beings of the same class have sometimes been represented by a great tree. I believe this simile largely speaks the truth.” — C. Darwin
In 1837, decades before On the Origin of Species was published, Darwin sketched a very important tree in one of his notebooks. This was a conceptual representation of the evolutionary relationships between organisms, living and extinct. The use of a tree as a model for relationships is much older and dates to at least the medieval period, where family trees were used to document ancestral or genealogical relationships. Today, biologists refer to this type of diagram as a phylogenetic tree, where each branch point represents the most recent common ancestor, and the splayed branches are the descendants. The most distant tips of the tree can represent a living species (imagine each species name written on a leaf attached to the smallest twig), or an extinct species (a truncated leafless branch that does not extend as far as the living twigs).
The usefulness of a phylogenetic tree as a model for evolutionary relationships has changed over time as our understanding of species and of evolution have deepened.
From the very beginning, even before Darwin delineated the process of evolution by variation and natural selection, there was broad disagreement in the field over the problem of how to define a species, much less an ancestral species. When do differences between two living pigeons represent merely inherent variation within a species, and when do they define a new species? Even if we are able to agree on two living organisms being different but closely related species, how do we decide if an extinct species was ancestral to them? The tree diagram had an inherent ambiguity based on the uncertainty over basic definitions of species, past and present.
Despite these clear weaknesses, the phylogenetic tree has been a powerful tool for illustrating and driving our understanding of evolutionary relationships.
This definitional conundrum continues even today, despite the use of molecular genetics as a way to augment Darwin’s discussion of speciation. The problem of defining species becomes acute in microbes.
In 1928, Frederick Griffith reported on his efforts to create a vaccine for pneumonia after the post-world war I Spanish Flu pandemic. In this study, Griffith killed a virulent strain of pneumococcus bacteria before injecting it into mice. He also injected a live strain of non-virulent pneumococcus, which a normal immune system could defeat. The virulent strain encoded a capsule that shielded it from the host’s immune system, a capsule which the non-virulent strain lacked. Neither the dead virulent strain, nor the live non-virulent strain, killed the host mice. But when Griffith injected a combination of the two strains, the mice died. This was the first demonstration that bacteria were able to transform from one type to another.
Today, we know that this transformation of bacterial types is caused by a transfer of DNA from one type (the dead virulent strain) into the other (the live non-virulent strain), transforming the latter into a new type (a live virulent strain). We call this horizontal gene transfer (HGT). This contrasts with the canonical vertical gene transfer in which DNA from a mother cell is passed to its daughter cells.
We also know now that in prokaryotes (microbes with no nucleus, which includes two broad domains, bacteria and archaea), HGT has played a tremendous role in their evolution. We know that the mechanisms of HGT is diverse. Bacteria and archaea are both able to take up free DNA floating in a medium. Viruses called bacteriophages that specifically infect these organisms can transfer bacterial DNA from one to another. Mobile genetic elements can also be injected by one bacterium into another using a structure called a pili.
In the early 1970s Lynn Margulis published a then astonishing theory, that evolution also proceeded by a mechanism of symbiosis, and that the eukaryotic cell’s various organelles arose through a series of symbiotic events. For example, eukaryotes have mitochondria, the power plant of the cell, which were recognized as possible bacterial endosymbionts (an organism that lives within another) as early as the 1920s. The photosynthetic engine of plants, called the chloroplast, derives from a blue-green algae endosymbiont.
HGT and symbiosis has further muddied the definition of species and dramatically complicated the image of a phylogenetic tree. Darwin’s sketch of a tree with a single trunk seems quaint from the modern perspective of genomic data. Nonetheless, this imperfect tool, the phylogenetic tree, remains a powerful means of analyzing species relationships, and uncovering possible common ancestors.
III. Bushwhacking towards the trunk of the tree
“…species are not immutable; but that those belonging to what are called the same genera are lineal descendants of some other and generally extinct species…” — C. Darwin
In 1958, Emile Zuckerkandl, a former refugee from Nazi Germany, applied to Linus Pauling for a research position. Zuckerkandl learned chemistry in Pauling’s lab, and applied that new knowledge to analyzing hemoglobin from a variety of primates and other animals. Zuckerkandl and Pauling noticed that the amount of variation in the sequence of amino acids in these proteins varied in apparent relation to evolutionary relationships between the species. This led to their 1965 publication of an important paper that started the field of molecular evolution. The image above is a figure from their paper showing a phylogenetic tree based on hemoglobin sequences.
In 1977, Carl Woese published a seminal paper that announced a new class of what we previously thought were bacteria. At the time, life was grouped at its broadest level into eukaryotes (anything having cells with a nucleus like us), and prokaryotes (anything without a nucleus). Woese discovered a new class, archaea, that looked like small single-celled bacteria, and like bacteria also lacked a nucleus. Rather than distinguish them by sequencing proteins as Zuckerkandl and Pauling did, Woese painstakingly sequenced a rRNA (ribosomal RNA) used in all cells. This rRNA is a part of a cellular machine called a ribosome which translates messenger RNA into protein. Woese then compared those sequences between representatives of the three domains. This comparison showed that the new archaea were as different from bacteria as eukaryotes were.
To this day, rRNAs remain one of the important cellular molecules used to infer evolutionary relationships and to build phylogenetic trees. But with the rapidly increasing ability to sequence various molecules such as proteins, DNA, and RNA, the number of phylogenetic trees has exploded.
In 1998, Woese published a proposal about the universal ancestor. In this remarkable paper, Woese speculates that the secondary evolutionary mechanism of horizontal gene transfer in prokaryotes, becomes the primary mechanism in the universal ancestor. His image of the universal ancestor is striking. Woese’s own words explain it best:
“The ancestor cannot have been a particular organism, a single organismal lineage. It was communal, a loosely knit, diverse conglomeration of primitive cells that evolved as a unit, and it eventually developed to a stage where it broke into several distinct communities, which in their turn become the three primary lines of descent. The primary lines, however, were not conventional lineages. Each represented a progressive consolidation of the corresponding community into a smaller number of more complex cell types, which ultimately developed into the ancestor(s) of that organismal domain. The universal ancestor is not an entity, not a thing. It is a process characteristic of a particular evolutionary stage.”
Meanwhile, even the disruptor was being disrupted. The three-domain classification of life proposed by Woese was recently re-organized into two primary domains: bacteria and archaea. Close examination of core genes (genes involved in essential processing of DNA, RNA and proteins) showed that eukaryotes indeed arose from archaea (which was long suspected based on analysis of the rRNA), and reflect a symbiotic merging of the two primary domains.
The removal of eukaryotes from the base of the phylogenetic tree simplifies any search for a common ancestor which must lie at the base of the tree. However, HGT remains a major obstacle and threatens to uproot the very tree.
Perhaps the tree of life is better represented as the shrubbery of life. Not so majestic. If we want to use some form of a phylogenetic tree as a tool to work backwards from today and understand something of the last common ancestor to all living things, then we need a method to trim the undergrowth, the thick tangle of interlocked branches from symbiosis and HGT that obscures the base of the tree.
The question is whether we can extract a clear and useful signal about LUCA despite the noise of HGT.
In 2009, Eugene Koonin’s lab analyzed thousands of phylogenetic trees, each tree representing the evolutionary relationships for a particular gene. They confirmed that indeed HGT sullied the original concept of the tree of life and made the simple view obsolete. However, there remained a clear phylogenetic signal amid the noise, and a modified form of the tree could still be applied with caution.
IV. A good look at LUCA
“…the largeness of any group shows that its species have inherited from a common ancestor some advantage in common.” — C. Darwin
William F. Martin, Ph.D. is a former carpenter, born in Bethesda, Maryland, educated in Texas, who after hammering nails in Dallas moved to Hanover, Germany to get his degree in biology, and then to the Max-Planck Institute in Cologne for his Ph.D.
In July 2016, Madeline Weiss et al. from Martin’s lab published a paper in Nature Microbiology that worked backwards from today’s organisms to uncover what they described as the last universal common ancestor to all life on Earth – LUCA. They showed that LUCA was anerobic (did not use oxygen), obtained energy from hydrogen, converted carbon dioxide and nitrogen into essential organic compounds, and was heat loving. Extreme heat loving. They believed that LUCA originated in an environment much like the black smoker hydrothermal vents at the bottom of the ocean.
Weiss et al. looked at 1,981 complete genomes (the total compendium of genes in a given organism), of which 134 genomes were archaeal and 1,847 were bacterial.
Among the genomes of these two thousand species were 6.1 million protein-coding genes which were grouped into 286,514 protein families or clusters. Out of all those protein families, only about 11 thousand were common between both bacteria and archaea. This is important since the point behind looking for a common ancestor is that we expect descendants of a common ancestor to share homologies (common features – or genes). Therefore, the vast majority of protein coding genes were of no use to this analysis since they were unique to either bacteria or archaea.
The team then built a phylogenetic tree using algorithms that compared the amino acid sequences of the 11 thousand protein families, and grouped the ones that were the most closely related. Martin’s group was cognizant of the HGT problem and put in place filters to reduce them. Out of 11 thousand protein families or clusters, only 355 were thought to be present in LUCA according to this analysis.
A quick word on what LUCA was not… LUCA was the last universal common ancestor of bacteria and archaea - not the first cell or bit of life. Just as your parents are the last common ancestor of you and your siblings, but your parents are not the first human. We need to work our way backwards, one step at a time. And LUCA is our first step.
The Weiss paper showed that LUCA was only half alive since it appeared to be missing many basic metabolic genes. LUCA depended on geochemistry to provide many critical biochemicals such as amino acids (the raw materials of proteins) and nucleotides (the raw materials of DNA and RNA). These are the basic biochemicals that most organisms have the ability to either synthesize or obtain from their organic foodstuffs. LUCA would likely have harvested the necessary organic components from the soup in which it swam.
LUCA was a thermophile, an organism that was adapted for and thrived in the superheated volcanic sea-bottom vents that today we call black smokers or hydrothermal vents. This is based on an enzyme in LUCA called reverse gyrase which is unique to modern organisms called hyperthermophiles (organisms that love very high temperatures). Reverse gyrase protects DNA at high temperatures. Other enzymes in LUCA are consistent with organisms that thrive in hydrothermal vents.
Weiss et al. showed that LUCA obtained energy from inorganic compounds and lived without oxygen. The gases hydrogen, carbon dioxide and nitrogen were all LUCA needed to survive (unlike us who need organic compounds we call food and oxygen to survive).
The team also showed that LUCA contained a number of ancient enzymes. Biologists have long considered modern proteins that require iron-sulfur (FS) clusters to be relics of ancient enzymes, based on tracing amino acids through a variety of bacteria to eukaryotes. FS clusters are among the most common cofactor (something required for an enzyme’s activity) in LUCA’s proteins.
V. Where do we go from here
“…whilst this planet has gone circling on according to the fixed law of gravity, from so simple a beginning endless forms most beautiful and most wonderful have been, and are being evolved.” — C. Darwin
We have not made tremendous progress in our search for the origin of life by working backwards from current living organisms to LUCA. If the results of this Weiss paper stand the test of time and of challenging experiments, then they have made an important but very small step in time back from the world of bacteria and archaea some 3.5 billion years ago to the common ancestor of both. But we need to move significantly farther back in time and complexity from LUCA to approach the origin of life.
In the late 1960s, Francis Crick, Leslie Orgel, and Carl Woese proposed that RNA was the original genetic code. What they never expected was that RNA was probably also one of the first enzymes. And one of the most important enzymatic activities for cells is the translation of RNA into protein. Today, the cellular machine that translates RNA to protein in all living organisms from bacteria to man, is the ribosome, and at the heart of the ribosome is an RNA enzyme. In almost every other case the vast majority of other enzymes are proteins.
The ribosome is a big molecular machine made of both RNA and proteins. It comes in two pre-assembled parts, a small sub-unit and a large sub-unit. The ribosomal proteins play an accessory or regulatory role – and almost none of the ribosomal proteins are required for the ribosome to work. The core functions of a ribosome are two-fold – and done by RNA.
The small sub-unit has a tiny region at its core called the decoding center (DCC), a strip of ribosomal RNA which helps read the messenger RNA carrying the genetic code from the genes in our DNA, the code for how to make a protein. How to make us.
The large sub-unit has at its core a region in its ribosomal RNA called the peptidyl transferase center (PTC) which catalyzes the joining of an amino acid to the growing protein chain. That is the enzymatic heart of the ribosome. And arguably, that is the enzymatic heart of life.
When the small and large sub-units come together, they form a little chamber to accept incoming transfer RNA (tRNA), each of which carries a single amino acid to be added to the growing protein. If the tRNA’s code matches the code on the messenger RNA, the tRNA and its amino acid move on to the next step. The PTC transfers the peptide bond of the amino acid from the incoming tRNA to the growing protein. The subunits also form an exit tunnel which allows the growing protein chain to exit.
RNA molecules form these key functional cores of the ribosome. Proteins, though they stud the outside and thread through the ribosome, never come close to the enzymatic heart of the ribosome.
In the images above, from a 2020 paper by Bowman et al. in Chemical Reviews, we can see representative ribosomes from bacteria, archaea and eukaryotes. The lower image shows an overlap of the DCC and PTC, with bacterium in red, archaeon in blue, and eukaryote in yellow. Despite some variation in the apparent overall structure of the ribosome especially in the eukaryotes, the enzymatic core remains essentially identical, as shown by the almost perfect overlap of the RNA strands there.
LUCA may have finalized the ribosome and when it passed on to bacteria and archaea, it has stayed almost exactly the same among those two groups over the past several billion years. In eukaryotes, the ribosome is evolving at an exponential rate, but mostly on the external surfaces of the ribosome, away from the enzymatic core.
We may be approaching the informational event horizon for the origin of life, the point behind which all data is lost irretrievably from this direction, from the direction of today looking back to the past. Whether LUCA is like a black hole or the Big Bang, it is unlikely that the data we have at hand among the living organisms today, can give us any insight into organisms that predated LUCA and back towards the origin of life.
However, if there is a portal through which we can move back farther than LUCA, perhaps it will be through the catalytic and decoding RNAs at the heart of the ribosome. But how that portal will be designed and built is not clear now, any more than the design of a time machine is clear to us.
Comments