Systemic determinants of gene evolution and function

What determines a gene's evolutionary rate? In particular, does it depend solely on functional constraints imposed on the structure of the encoded protein or are there higher-level factors related to the selection at the organismal level? These questions seem to be among the most fundamental ones in biology because comprehensive answers will reveal the nature of the links between genome evolution and the phenotypes of organisms. A recent study by Wall et al (2005) proves more convincingly than ever before that systemic determinants of gene evolution rate do exist, and an intriguing paper by Fraser (2005) sheds light on some of the underlying mechanisms. However, a recent report by Coulomb et al (2005) issues an important warning by showing that some of the intuitively plausible connections discovered by Systems Biology may be due to biases in the data.

Nearly 30 years ago, Wilson et al (1977) put forward a general proposition that may be called the rate-dispensability conjecture—the evolutionary rate should be a function of, firstly, the constraints on the function of the given gene (protein) and, secondly, the 'importance' (fitness effect of knockout or dispensability) of the gene for the organism: Ri=f(Pi)f(Qi) (Ri is the rate of evolution of the given protein, Pi is the probability that a substitution is compatible with the function of this protein, and Qi is the probability that the organism survives and reproduces without this protein).

The prediction, thus, is that essential (indispensable) genes, on average, should evolve slower than nonessential genes. This conjecture generally follows from Kimura's neutral theory of evolution but is nontrivial given the broad variance of structural−functional constraints on proteins, regardless of their dispensability; in principle, this variance could completely explain the distribution of evolutionary rates among genes without invoking the fitness connection. Thus, empirical tests of the conjecture are of interest, and such tests have been conducted as soon as the combination of genome sequences and genome-wide knockout fitness effect data became available. The results, however, were ambiguous. The first attempt by Hurst and Smith (1999) involving only 100 orthologous human and mouse genes, for which knockout effect data in mouse were available, failed to detect the predicted connection. A subsequent study by Hirsh and Fraser (2001) dealt with 300 yeast genes, with quantitative fitness effect data taken from the results of a genome-wide measurement in yeast and the rates derived from a comparison with the nematode orthologs. These authors reported a weak but statistically significant negative correlation between the knockout fitness effect and evolution rate, in accord with the Wilson conjecture. However, when the genes were classified into two categories, essential and nonessential, no significant difference in rates was detected. In contrast, Jordan et al analyzed much larger sets of orthologous genes in bacteria for which knockout data were available and came to the conclusion that essential genes, indeed, on average, evolved slower than nonessential ones (Jordan et al, 2002). The issue has been further confounded by two studies that examined partial correlations between evolution rate, fitness effect, and expression level of a gene and concluded that the link between evolution rate and fitness effect vanished once expression level was taken into account (Pal et al, 2003; Rocha and Danchin, 2004).

A recent study by Wall et al (2005) makes major strides to finally settle the issue. These authors produced robust estimates of short-term evolutionary rates for >3000 orthologous gene sets from four yeast species of the genus Saccharomyces and compared them with two independent data sets on the phenotypic effects of yeast gene knockouts and two measures of gene expression (experimentally determined mRNA abundance and codon adaptation index). Now, partial correlation analysis gave an unequivocal answer: a gene's evolutionary rate significantly depends both on its dispensability and on expression level, and the contributions of these two variables are, largely, independent. Thus, 'important' genes and genes that are highly expressed tend to evolve slowly, supporting and extending Wilson's conjecture.

This is not the final word on the connection between evolutionary rate, dispensability, and expression, as much work remains to be carried out to obtain reliable quantitative estimates of the strength of the dependences involved. It does seem, however, that, at least for yeast, the reality of these links is now established beyond reasonable doubt. The simple and not particularly new methodological lesson from this work is that, in many cases, careful analysis of improved data sets will do more to resolve a fundamental scientific issue than sophisticated theoretical considerations.

Gene dispensability and expression level are not the only functional variables that have been linked to the evolution rate. In the current era of Systems Biology, many researchers have been particularly intrigued by the possibility that gene evolution is affected by the topology of various interaction networks. In particular, negative correlation has been reported to exist between a gene's node degree in protein−protein interaction (Fraser et al, 2002) and coexpression networks (Jordan et al, 2004) and evolutionary rate. In other words, genes that interact with many other genes either at the level of coexpression or through physical interaction between their protein products tend to evolve slowly.

However, at least the connection between a protein's position in the interaction network and evolutionary rate has been no less contentious than the link with dispensability. Subsequent to the original report on the correlation, one re-analysis failed to confirm the overall connection although the most prolific interactors (network hubs) did seem to evolve slowly (Jordan et al, 2003), whereas another study denied the link altogether, suggesting that it was an artifact of protein abundance (Bloom and Adami, 2003).

A recent study by Fraser (2005)seems to clarify the issue and provides an intriguing insight into the evolutionary forces that may be at play in network evolution. Fraser partitioned the interaction network hubs into two classes and showed that they dramatically differ in terms of the connection with the evolutionary rate (or, more precisely, the strength of purifying selection measured as the ratio of the rates for synonymous and nonsynonymous positions in coding sequences).

It turns out that hubs that interact with numerous partners within a network module (intramodule hubs, also known under the more appealing name of 'party hubs'; Han et al, 2004), indeed, are strongly constrained and evolve much slower than either proteins that have no partners at all or intermodule hubs ('date hubs'; Han et al, 2004) that interact with partners from different modules. The intermodule hubs are only slightly more constrained than noninteractors. This observation leads to the intuitively plausible hypothesis that organization and functions of network modules tend to be conserved during evolution, whereas intermodule hubs are involved in network rewiring and could be foci of innovation.

Taken together, these recent studies make, perhaps, relatively small but concrete inroads into the domain of Evolutionary Systems Biology (Medina, 2005). This area of inquiry is just making its baby steps, and the road ahead will be long and hard. That this is so, is demonstrated by the recent analysis of Coulomb et al (2005), which, while not dealing directly with evolution, is an important note of caution for systems biologists. These authors take on the connection between a gene's position in biological networks, in particular, genome-wide networks of protein−protein interactions and essentiality. It seems intuitively almost obvious that genes with many connections (network hubs) are 'important' and should be essential more often than poorly connected genes; of course, this is perfectly compatible with the observations on slow evolution of both network hubs and essential genes discussed above. Indeed, such a connection between 'centrality and lethality' has been reported by several groups (Jeong et al, 2001); apparent links between a gene's essentiality and other topological characteristics of networks, such as clustering coefficient, also have been reported (Yu et al, 2004). However, Coulomb et al (2005) argue that these effects were caused by biases in the analyzed interaction data that contained a greater number of valid interactions for essential genes. When a supposedly unbiased data set (Ito et al, 2001) was analyzed, only a marginal correlation between node degree (centrality) and essentiality was detected, and no dependence at all was seen for other topological features of networks (Coulomb et al, 2005).

The current state of Evolutionary Systems Biology is typical of any burgeoning discipline: it is clear that there are important signals out there but our ability to discern and understand these signals is hampered both by inaccuracies and biases in the data and the inadequacy of the existing theoretical models. These difficulties notwithstanding, we should be motivated by the (I believe, reasonable) hope that, as this field matures, our one-dimensional understanding of genome evolution develops into a multidimensional picture of evolution of organisms as systems.

Here's the press release version. (hat tip to Ichneumon) It's very cool stuff actually:

Scientists Uncover Rules that Govern the Rate of Protein Evolution

PASADENA, Calif.--Humans and insects and pond scum-and all other living things on Earth-are constantly evolving. The tiny proteins these living things are built from are also evolving, accumulating mutations mostly one at a time over billions of years. But for reasons that hitherto have been a mystery, some proteins evolve quickly, while others take their sweet time-even when they reside in the same organism.
Now, a team of researchers at the California Institute of Technology, applying novel data-mining methods to the now-completed sequence of the yeast genome, have uncovered a surprising reason why different proteins evolve at different rates.
Reporting in the September 19 edition of the journal Proceedings of the National Academy of Sciences (PNAS), lead author Allan Drummond and his coauthors from Caltech and the Keck Graduate Institute show that the evolution of protein is governed by their ability to tolerate mistakes during their production. This finding disputes the longstanding assumption that functionally important proteins evolve slowly, while less-important proteins evolve more quickly.
"The reason proteins evolve at different rates has been a mystery for decades in biology," Drummond explains. But with the recent flood of sequenced genomes and inventories of all the pieces and parts making up cells, the mystery deepened. Researchers discovered that the more of a protein that was produced, the slower it evolved, a trend that applies to all living things. But the reason for this trend remained obscure, despite many attempts to explain it.
Biologists have long known that the production machinery that translates the genetic code into proteins is sloppy. So much so, in fact, that on average about one in five proteins in yeast is mistranslated, the equivalent of translating the Spanish word "Adios" as "Goofbye." The more copies of a protein produced, the more potential errors. And mistakes can be costly: some translation errors turn proteins into useless junk that can even be harmful (like miscopying a digit in an important phone number), while other errors can be tolerated. So the more protein copies per cell, the more potential harm-unless those abundant proteins themselves can evolve to tolerate more errors.
"That was the 'Aha!'" says Drummond. "We knew from our experiments with manipulating proteins in the lab that some had special properties that allowed them to tolerate more changes than other proteins. They were more robust." So, what if proteins could become robust to translation errors? That would mean fewer harmful errors, and thus a more fit organism.
To test predictions of this hypothesis, the team turned to the lowly baker's yeast, a simple one-celled organism that likes to suck up the nutrients in bread dough, and then expels gas to give baked bread its fluffy texture. Baker's yeast is not only a simple organism, it is also extraordinarily well understood. Just as biologists have now sequenced the human genome, they have also sequenced the yeast genome. Moreover, the numbers of every type of protein in the yeast cell have been painstakingly measured.
For example, there's a protein in the yeast cell called PMA1 that acts as a transformer, converting stored energy into more useful forms. Since nothing living can do without energy, this is a very fundamental and important component of the yeast cell. And every yeast cell churns out about 1.26 million individual PMA1 molecules, making it the second-most abundant cellular protein.
The old assumption was that PMA1 changed slowly because its energy-transforming function was so fundamental to survival. But the Caltech team's new evidence suggests that the sheer number of PMA1 molecules produced is the reason that the protein doesn't evolve very quickly.
"The key insight is that natural selection targets the junk proteins, not the functional proteins," says Drummond. "If translation errors turned 5 percent of the PMA1 proteins in a yeast cell into junk, those junk proteins would be more abundant than 97 percent of all the other proteins in the cell. That's a huge amount of toxic waste to dispose of."
So instead, Darwinian evolution favors yeast cells with a version of PMA1 that continues to function despite errors, producing less junk. That version of PMA1 evolves slowly because the slightest changes destroy its crucial ability to withstand errors.
Consider two competing computer factories. Both make the same number of mistakes on their assembly lines, but one company's computers are designed such that the inevitable mistakes result in computers that still work, while with the other company's design, one mistake and the computer must be tossed on the recycling heap. In the cutthroat marketplace, the former company, with lower costs and higher output, will quickly outcompete the latter.
Likewise, viewing yeast cells as miniature factories, the yeast whose most-abundant proteins are least likely to be destroyed by production mistakes will outcompete its less-efficient rivals. The more optimized those high-abundance proteins are--the more rigid the specifications that make them so error-resistant-the slower they evolve. Hence, high abundance means slow evolution.
The team is now exploring other predictions of this surprising hypothesis, such as what specific chemical changes allow proteins to resist translation errors. "It's the tip of the iceberg," Drummond says.
Drummond is a graduate student in Caltech's interdisciplinary Computation and Neural Systems program. The other authors of the paper include his two advisors: Frances Arnold, the Dickinson Professor of Chemical Engineering and Biochemistry at Caltech, and Chris Adami, an expert in population genetics who is now at the Keck Graduate Institute in Claremont, California. The other authors are Jesse D. Bloom, a graduate student in chemistry at Caltech; and Claus Wilke, a former postdoctoral researcher of Adami's who has recently joined the University of Texas at Austin as an assistant professor.
The title of the PNAS paper is "Why highly expressed proteins evolve slowly."

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.