Posted on 03/23/2004 5:04:32 PM PST by js1138
March 22, 2004
|
||
Evolution Encoded
|
||
New discoveries about the rules governing how genes encode proteins have revealed nature's sophisticated "programming" for protecting life from catastrophic errors while accelerating evolution
|
||
On April 14, 2003, scientists announced to the world that they had finished sequencing the human genome--logging the three billion pairs of DNA nucleotides that describe how to make a human being. But finding all the working genes amid the junk in the sequence remains a further challenge, as does gaining a better understanding of how and when genes are activated and how their instructions affect the behavior of the protein molecules they describe. So it is no wonder that Human Genome Project leader Francis S. Collins has called the group's accomplishment only "the end of the beginning."
Collins was also alluding to an event commemorated that same week: the beginning of the beginning, 50 years earlier, when James D. Watson and Francis H. Crick revealed the structure of the DNA molecule itself. That, too, was an exciting time. Scientists knew that the molecule they were finally able to visualize contained nothing less than the secret of life, which permitted organisms to store themselves as a set of blueprints and convert this stored information back into live metabolism. In subsequent years, attempts to figure out how this conversion took place captivated the scientific world. DNA's alphabet was known to consist of only four types of nucleotide. So the information encoded in the double helix had to be decoded according to some rules to tell cells which of 20 amino acids to string together to constitute the thousands of proteins that make up billions of life-forms. Indeed, the entire living world had to be perpetually engaged in frenetic decryption, as eggs hatched, seeds germinated, fungus spread and bacteria divided. But so little was understood at the time about the cellular machinery translating DNA's message that attempts to crack this genetic code focused on the mathematics of the problem. Many early proposals proved wrong, a few spectacularly, although their sheer ingenuity and creativity still provide fascinating reading. In fact, when the actual code was finally deciphered during the 1960s, it nearly disappointed. Nature's version looked less elegant than several of the theorists' hypotheses.
Certain codons were just redundant. Many came to view nature's real code as little more than a random accident.
When we speak of the "code" and "decoding," we are being quite literal. Genetic instructions are stored in DNA and RNA, both made of one type of biochemical molecule, nucleic acid. But organisms are mostly built from (and by) a very different type of molecule, protein. So although a gene is traditionally defined as the sequence of nucleotides that describes a single protein, the genetic sentence containing that description must first be translated from one system of symbols into an entirely different kind of system, rather like converting from Morse code to English. Cracking the Code Little else was obvious at the time about how genes might be translated into proteins. Today we understand that gene sequences do use three-letter codons to specify individual amino acids and that several steps are needed for the gene's sequence of bases to be converted into a sequence of amino acids. The DNA gene is first copied and edited into a transcript made of RNA, employing similar nucleic acid bases, except that DNA's thymine is replaced by uracil. This messenger RNA (mRNA) version of the gene is then read by cellular machinery, three letters at a time, while tiny cellular butlers known as transfer RNAs (tRNA) fetch the specified amino acids to be strung together. But in the early 1950s this process was a black box, leaving only an intriguing mathematical puzzle. And the first proposed solution came not from a biologist but from physicist George Gamow, better known as an originator of the big bang theory. His "diamond code," published in 1954, elegantly combined the arithmetic of getting 20 amino acid meanings from a four-nucleotide alphabet with the physical structure of DNA itself. Gamow theorized that at every turn in the double helix there was a diamond-shaped space bounded at its four corners by nucleotides. These gaps would allow DNA to act as a template against which amino acids would line up, determined by the nucleotide combinations present at each twist. His model eliminated one corner of each diamond, then sorted the 64 possible three-nucleotide codons into chemically related groups. It also allowed meaningful codons to overlap, depending on the "reading frame," or where one began reading the sequence of letters along the length of the DNA molecule. This kind of data compression was an efficiency prized by coding theorists of the day. Unfortunately, amino acid chains were soon discovered that could not be accounted for by Gamow's or any other overlapping codes.
The code can evolve, which means that it probably did evolve. Nature's amino acid assignments are no accident.
At the same time, evidence was suggesting that DNA and amino acids were not interacting with one another directly. Crick developed a hypothesis that so-called adaptor molecules could be serving as intermediaries, and in 1957 he put forth a set of rules by which they might operate. Simply put, Crick's adaptors recognized only 20 meaningful codons designating each of the 20 amino acids, making the remainder of the 64 possible triplets "nonsense." Rather than overlapping, Crick's code was "commaless": meaningless codons were effectively invisible to the adaptors, so nature needed no figurative punctuation to designate the start of a reading frame. The commaless concept was so streamlined that it immediately won near universal acceptance--that is, until the data again proved an elegant theory wrong. In the early 1960s experiments showed that even supposed nonsense codons could provoke protein synthesis in a beaker, and by 1965 the actual amino acid meanings of all 64 possible triplet codons had been worked out in the lab. No tidy numerology was apparent: certain codons were just redundant, with some individual amino acids specified by two, four, even six different codons. After all the enthusiastic speculation, many came to view nature's real code as little more than a random accident of history. Frozen Accident?
Darwinian natural selection rests on the premise that sometimes a small change in a single gene can prove beneficial if it allows organisms to fare better in their environment. But altering an organism's decoding rules would be tantamount to simultaneously introducing changes at countless sites throughout its genetic material, producing an utterly dysfunctional metabolism. It would be the difference between introducing a single typo and rewiring the entire typewriter keyboard. This attractively straightforward reasoning, however, has since proved simplistic. Although most living systems do employ the standard genetic code, scientists now know of at least 16 variants, distributed across a diverse array of evolutionary lineages, that assign different meanings to certain codons. The underlying system remains the same: triple-nucleotide codons are translated into amino acids. But where most organisms would read the RNA codon "CUG" to mean the amino acid leucine, many species of the fungus Candida translate CUG as serine. Mitochondria, the tiny power generators within all kinds of cells, have their own genomes, and many have also developed their own codon assignments. For instance, in the mitochondrial genome of baker's yeast (Saccharomyces cerevisiae), four of the six codons that normally encode leucine instead encode threonine. As discoveries of these variations proliferated during the 1990s, it became clear that the code is not frozen at all. It can evolve, which means that it probably did evolve. So nature's standard codon-amino acid assignments, refined and preserved by billions of years of natural selection, are no accident. In fact, their arrangement does an excellent job of minimizing the impact of accidents. Damage Control In a living organism, errors come in many forms. Sometimes the original DNA version of a gene changes (a mutation). Sometimes the wrong adaptor (tRNA) binds to the mRNA transcript of a gene, misincorporating an amino acid into a nascent protein. But even when scientists considered the code a product of chance, they noticed that it did seem to be arranged well in terms of ensuring that individual errors are of little consequence. As early as 1965 Carl R. Woese, then at the University of Illinois, observed that similar codons (those sharing two of three letters) usually specify similar amino acids, so a mistake here or there does not greatly affect the resulting protein. Defining "similar" with regard to amino acids can be complex: the 20 amino acids differ from one another in all sorts of properties, from size to shape to electric charge. What Woese and others noted is that codons sharing two out of three bases tend to code for amino acids that are much alike in the extent to which they are repelled by or attracted to water. This property is crucial to the ultimate functioning of the protein. A newly made amino acid chain folds into a distinctive shape depending on the positioning of hydrophobic amino acids, which like to cluster together away from the cell's watery cytoplasm, leaving hydrophiles to form the protein's surface. The remarkable feature of the genetic code is that when a single-nucleotide error occurs, the actual and intended amino acids are often similar in hydrophobicity, making the alteration in the final protein relatively harmless. But just how efficient is the code in this regard? This is where, in 1998, we stepped in to develop the observations of earlier scientists.
Testing the Code To generate these hypothetical alternative codes, we had to begin with certain assumptions about realistic restrictions under which a code would operate in a world made of DNA, RNAs and amino acids. One observation is that mistakes in translation of mRNA into a corresponding amino acid occur most frequently at the codon's third position. This spot is simply where the binding affinity between the mRNA and tRNA is weakest, which is why Crick dubbed the phenomenon "wobble." But synonymous codons--those coding for the same amino acid--usually differ by only their last letters, so such mistranslations often yield the same amino acid meaning.
By minimizing the effects of any mutation, the code maximizes the likelihood that a gene mutation will improve the resulting protein.
Although this grouping of synonymous codons in itself reduces the error value of the code, the mechanics of wobble make the arrangement more likely to be a biochemical limitation rather than an evolutionary adaptation. Thus, to err on the side of caution when deriving our measure, we should consider only alternative codes that share this feature. Moreover, it is impossible to put a hydrophobicity value on the codons assigned to the "stop" signal, so we kept their number and codon assignments the same in all alternative codes. Using these technical assumptions, we generated alternatives by randomizing the 20 meanings among the 20 codon blocks. This still defined some 2.5 X 1018 possible configurations (approximately equal to the number of seconds that have elapsed since the earth formed). So we took large random samples of these possibilities and found that from a sample of one million alternative codes only about 100 had a lower error value than the natural code. Still more striking was our finding when we incorporated additional restrictions to reflect observed patterns in the way DNA tends to mutate and the ways in which genes tend to be mistranscribed into RNA. Under these "real world" conditions, the natural code's error value appeared orders of magnitude better still, outperforming all but one in a million of the alternatives. A straightforward explanation for the genetic code's remarkable resilience is that it results from natural selection. Perhaps there were once many codes, all with different degrees of error susceptibility. Organisms whose codes coped best with error were more likely to survive, and the standard genetic code simply won in the struggle for existence. We know that variant codes are possible, so this assumption is reasonable. Evidence for error minimization as the driving evolutionary force behind the arrangement of the code has its critics, however. Sophisticated computer searches can certainly improve on nature's choice, even when they accept the premise that a "good" code is one that minimizes the change in amino acid hydrophobicity caused by genetic errors. But computer predictions for an optimal code are limited to the criteria provided by the programmer, and most of the "better" codes that have been described so far are based on oversimplified assumptions about the types of errors that a code encounters in the real world. For example, they ignore the wobble phenomenon, which prevents their algorithms from perceiving the advantage of having synonymous codons differ only in their third letter. This shortcoming emphasizes a second problem with designer-optimized codes. Natural selection is a "blind designer," in that it can only grope toward an ideal by choosing the best alternative within a population of variants at each generation. When we simulate natural selection in this manner, we find that the degree of error minimization achieved by the standard genetic code is still rather impressive: typically less than 3 percent of random theoretical codes can evolve under selection to match its resilience.
In other words, the diamond and commaless codes once looked superior to nature's own, and computers may generate yet more mathematically idealized codes. But merely demonstrating the possibility of better codes without taking into account the evolutionary process is of dubious relevance to understanding the strength of natural selection's choice. Indeed, the standard code is not only a product of natural selection; it may act as a search algorithm to speed evolution. The impact-minimizing properties of the code, with its blocks of both synonymous codons and those specifying biochemically similar amino acids, achieve more than damage control. "Smaller" mutations, in contrast with extreme alterations, are statistically more likely to be beneficial, so by minimizing the effects of any mutation, the code maximizes the likelihood that a gene mutation will lead to an improvement in the resulting protein. Using the Code Sifting through reams of raw genome sequence data to find the actual genes is a priority in molecular biology, but current searches are limited to matching the characteristics of genes that we already know about. Taking into account the way that the genetic code filters gene mutations can enhance these searches by allowing scientists to recognize highly diversified genes and perhaps infer the function of the proteins they encode. Researchers can even derive clues about the folded protein shape that an amino acid sequence dictates by looking at the error-minimizing properties of its codons and how substitutions might affect amino acid size, charge or hydrophobicity. Biologists can also apply our awareness of organisms that deviate from the standard code to "disguise" genes for research. Because a single code is nearly universal to all life, it has become common practice to take a gene of interest, such as a human cancer gene, and insert it into an organism, such as Escherichia coli, that will churn out the protein the gene encodes. But occasionally the organism fails to express the gene at all, or it produces less of the protein than expected or a slightly different version of the protein found in humans. This problem can play havoc with biology research, but we now realize that sometimes the failure arises because the organisms exhibit different preferences among synonymous codons. For example, the standard code contains six codons for the amino acid arginine, and human genes tend to favor using the codons AGA and AGG. E. coli, however, very rarely uses AGA and often mistranslates it. Knowing these variations and preferences enables us to design versions of the human gene that will work reliably when moved between different organisms. One of our labs (Freeland's) is developing software applications to help molecular biologists turn such theoretical observations about the code into practical tools for genetic engineering, gene finding, and predicting protein shapes. And both of us, along with other researchers, are investigating how the code itself came to be--how RNA first started interacting with amino acids, how their association developed into a system of formal coding and how the amino acid alphabet expanded during early evolution. This approach may allow inroads into many additional unresolved questions: Why 20 and only 20 standard amino acids? Why are some amino acids assigned six codons, whereas others have just one or two? Could this pattern have anything to do with minimizing error? Cracking the code has proved merely the start to understanding its meaning.
STEPHEN J. FREELAND and LAURENCE D. HURST use bioinformatics to study evolutionary biology. Freeland is assistant professor of bioinformatics at the University of Maryland, Baltimore County, where he is working to convert insights about genetic code evolution into practical approaches to exploring genome data. His students are currently testing his theories by reengineering a human cancer gene to express in Escherichia coli according to that organism's codon preferences. Freeland earned his Ph.D. in evolutionary theory at the University of Cambridge, studying under Royal Society Research Fellow Laurence Hurst, who is now professor of evolutionary genetics at the University of Bath in England. Hurst's research concentrates on understanding the structure and evolution of genetic systems, particularly the evolutionary origin of phenomena such as sexes, the order of genes on chromosomes, genomic imprinting and the genetic code itself.
|
Indeed He is; and evolution is one of His greatest creations, for through it He produced the only beings capable of understanding His handiwork.
Then why did he constrain himself to only the types of biological "designs" that could have been produced by evolutionary means?
A "master programmer" would have chosen from a wider range of design choices.
(Working scientists excluded. You know who you are)
That is why "he/she/it" is God and the rest of us mere mortals can only struggle towards comprehension.
Amen to that bro. I don't understand why some religious people poo-poo science. To me, science is what happens when the Holy Spirit inspires the human mind to uncover the wonders if His creation.
LOL with chocolate on top!
I read the article through after posting, so I know who posted before reading. One was a ping, which doesn't, count; another was from someone who could have written the article, or come close. No extra points for guessing who's who.
You will find evolution proponents posting regularly on this site who do not argue against this (except to say that it is a matter of belief, since there is no way to prove or disprove it).
But the more we learn about biochemistry, the more confident we become in the historical correctness of evolution.
How it got started is still a matter of faith.
So that you could ask that very question.
Not exactly fine tuning, but even at that, this number assumes that the potential "genetic landscape" hasn't changed since its inception. Perhaps early on, there used to be fewer amino acids used, requiring only two codons, and then that was optimized by evolution before the current three-codon vocabulary was developed. Or perhaps there was a four-codon system that was optimized, before being pruned to three.
But such speculations aside, it seems as if they are saying that three percent of the potential genetic landscape drains into a minimum this deep or deeper. To me, that's just not a miracle crying out for an explanation.
Evolution by God, HeHeHe....
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.