Posted on 03/10/2002 12:38:04 PM PST by Phaedrus
I usually avoid threads like this - cuts a little too close to what I do for a living - but I think Mr. Commoner is doing people a disservice by misrepresenting what molecular biologists believe and what the object of the Human Genome project was. Not that he was alone in the latter regard...
I don't really resent that -- I use specially prepared copper foil which reflects negative vibrations of all kinds -- but I do think such comments reflect an incomplete view of the modern world. At the risk of having Phaedrus think even less of me, I want to make a fast comment about "conspiracies" in general.
Have you ever noticed that the same face will appear on the cover of many different magazines during the same month? Even though the magazines come from different publishers, even though the magazines have lead times ranging from a few months to many, many months, and even though the magazines may have completely different target audiences, they still manage very often to get the same face on the cover during the same month.
Random chance? Hardly.
Conspiracy? Not exactly...
It's just a bunch of people working together to accomplish something.
Advertising specialists and public relations specialists and media specialists routinely work months and even years in advance to co-ordinate a zillion different components of promoting a star or a movie or a message. That's what such people do.
It's not a conspiracy, per se. But anybody who doesn't know modern terminology would be excused from slipping up and calling it one.
There's no reason to think that "modern POP science" is any different.
It's a bunch of people who basically live in the media, revolve around half a dozen or so publishers, and get a heck of a lot of funding from a relatively small number of money sources which all have rather defined agendas.
Conspiracy? Not per se. Just people living in the modern world. A world of media specialists who work together to promote a consistent message (and discourage discordant notes).
Mark W.
The arrogant implications of the posing of this question in its form is not surprising to me. And neither is this portion of the answer.
Over longer periods of time, I have some confidence that most people sense the right from the wrong.
This is from your Dr. West link.
Besides the explicit arguments to start the genome project, there were many more valid implicit arguments where the benefits have been realized by now.
I have worked on bioinformatics projects that utilized the output from the genome project in near real time to provide potential genetic matches to do continued research on. The genome projject helped reduce the search field by orders of magnitude over any other methods that were currently in use.
Even if the foundations of a theory contain flaws, those flaws don't necessarily invalidate all associated research and results. We just have to be extra careful in applying those results.
It is notable that the opinions expressed in the following links are not as definite as yours.
Link 1
Link 2
Link 3
Link 4
And I like the comment contained in the following where a member of the OSU team hedges on its count. IOW-- "I strongly suspect our numbers are high"
Link 5
"Some researchers are unsettled by the certainty with which the Human Genome Consortium is presenting its lower gene count," said Fred Wright of Ohio State University. "In my view, the final number of genes - when it is known - will lie somewhere between their high of 40,000 and our value of 70,000."
It isn't much more now. We know that DNA codes for proteins and we know that proteins are involved in the expressions of traits. What we don't know is how many proteins are involved per trait - but it's looking like it varies from trait to trait, which makes a lot of common sense. The assertion that it was a one-to-one ratio was really more one of science popularization than real science - it makes it easy to visualize, but it's a major oversimplification to which science popularization is prone. But I don't think any working molecular biologist has been proceeding under that assumption for decades if ever.
The Human Genome Project's actual product was a listing of DNA sequences, and nothing more. There are a number of crude methods of estimating protein expression from this (simply enumerating stop codons is one) but no precise one, and no method at all of estimating traits, hence genes. Anyone who ever said it could do that was either misunderstanding the process or deliberately misrepresenting it. Without pointing any fingers I will observe that science journalists are prone to the first and scientists hungry for grant money to the second. Of course, the same is true of "debunkers."
The general problem that I see with his and others research into the origins of life is when they make reasonable assumptions that are still not proven empirically and then forget that their knowlegde foundation is based on many of these non-proven "reasonable assumptions".
I am all for expansion of human knowledge, but we are wrong to turn a blind eye to the potential damage caused by our gain, and allow products into the market with unknown long term implications. What if, in the future, we were to empirically prove that life really starts at conception? Can it now be empirically proven that it does not? The same with genetically engineered plants that we eat, can we empirically prove today that they have no negative long term consequences for the human species, or the ecosystems that they participate in?
I am not saying that we should be frozen with non-progress when faced with uncertainties, just that we need to seriously weigh the consequence of guessing wrong.
Politically, does anyone think that our regulation via FDA and USDA are adequate, inadequate, too invasive to business, not invasive enough? Should government tend to the liberal, socialist, conservative, or libertarian (in that order on purpose) viewpoint when it comes to our food and medicine production?
Yes that is one reason I particularly cited Fred Wright who is the hedger on the 70,000 gene count. And I am not sure if the particular method used by the team has been verified. But in any case it is Fred Wright who is allowing that the final number of genes in the human genome is less than the number in his paper.
OHIO STATE GENOME MAP REVEALS MANY ADDITIONAL PROBABLE GENES
"Some researchers are unsettled by the certainty with which the Human Genome Consortium is presenting its lower gene count," said Fred Wright, assistant professor of human cancer genetics and lead author of the paper.
"In my view, the final number of genes-when it is known-will lie somewhere between their high of 40,000 and our value of 70,000."
Commentary2. I had to reformat this since it didn't come across via source in double columns. The figures and tables didn't appear, either. For figures, tables, and legends, see the original article.
March 2001 Volume 19 Number 3 p 196
Piecing together the significance of splicing
Rotem Sorek & Mor Amitai
Rotem Sorek is a scientist (e-mail: rotem.sorek@cgen.com) and Mor Amitai (e-mail: mor.amitai@cgen.com) is the president and chief executive officer at Compugen, 72 Pinchas Rosen St., Tel-Aviv, Israel, 69512.
Alternative splicing increases protein diversity by allowing multiple, sometimes functionally distinct, proteins to be encoded by the same gene. It can be specific to tissues, stress conditions, and developmental and pathological states. In many cases, it serves as an on/off regulation mechanism by introducing a premature stop codon1. We still do not understand how alternative splicing is regulated, but the following fact is now quite clear: in metazoans, it happens very often.
Computational analysis of expressed sequences can teach us a great deal about alternative splicing, because aligning expressed sequences of different splice variants of the same gene usually results in a typical gapped pattern of alignment. The recent information explosion in nucleotide databases gives us the possibility of analysis at the transcriptome (the set of messenger RNAs) level. We can take the entire human expressed sequence tags (ESTs) database (http://www.ncbi.nlm.nih.gov/dbEST/) (currently containing more than 3.1 million ESTs), together with the known complementary DNAs and genomic sequence data, and cluster them by alignment overlaps.
Such large-scale analyses were conducted in the past few years by four independent groups using different methods2-5. They estimated that 3359% of human genes have at least two splice variants, with the highest estimation being the most recent one2. All four groups also pointed out that these figures are probably an underestimation, because the EST database does not cover the entire repertoire of tissues or developmental states, and precautions taken to avoid false positives were extremely stringent. Because the latest estimation of the total number of human genes is 30,00040,000 (ref. 2), one must bear in mind that at least 10,000 of our genes, and probably many more, undergo alternative splicing.
Understanding alternative splicing and gaining knowledge of the transcriptome are crucial for the design and interpretation of expression profiling experiments, in particular DNA chip experiments. Such experiments enable comparisons between transcriptomes of different cell types or under different conditions. Designing DNA chips that will effectively report on the transcriptional levels of genes must take into account their alternative splicing patterns, even if alternative splicing is not the subject being studied.
To demonstrate this, let us assume a putative gene X that has three exons (A, B, and C) and two splice variants, ABC and AC. To design a chip that will measure the transcriptional levels of gene X in different tissues, we must use a probe from exons A or C to which both variants will hybridize in the assay. Taking the probe from exon B will cause the hybridization of variant ABC only, and will not correctly measure the transcriptional level of the gene. The accuracy of the experiment can be increased by measuring the level of each variant separately; hence, two probes will have to be taken, one from exon B to measure variant ABC, and one from exon A or C to measure both variants.
Another illustration of the importance of awareness of alternative splicing comes from the field of gene prediction. One thing gene prediction programs do not predict is alternative splicing, because sequences that regulate alternative splicing are generally unknown. Because alternative splice sites often correspond weakly to the splice consensus sites, gene prediction programs will probably frequently fail to identify alternative exons or introns.
Alternatively spliced genes are likely to take center stage as drug targets, therapeutic agents, and diagnostics markers in the next decade. First, there are many splice variants of pharmaceutically important genes that have been detected but not yet studied in depth. The function of the known variant gives us a clue to the function of the new variant, especially if we know which domain was added or removed. For example, we have identified some 60 kinase enzymes that undergo alternative splicing that eliminates their catalytic domains (E. Levanon et al., unpublished data). Although many of them have not been biochemically studied, our educated guess is that they function as competitive inhibitors of the known kinases.
Second, it has been estimated that 15% of the point mutations that cause genetic diseases in humans alter the normal splicing pattern6. Splice variants that are disease specific can be excellent diagnostic markers for these and other human diseases, being easily identifiable by PCR reactions.
Alternative splicing is no longer considered an esoteric twist of nature. Articles with the phrase "alternative splicing" or "splice variants" in their title or abstract are published at the rate of two a day (according to Medline query). In many ways, the concept is breaking our iron-clad rules: exons are not always exons, and introns are sometimes expressed.
Indeed, the very definition of "gene" should be reconsidered in light of discoveries of unusual alternative splicing events, such as the one yielding a novel splice variant of PSA (prostate specific antigen)the standard prostate cancer marker7. This variant shares with PSA only the first exon, which encodes only a signal peptide, leaving the two mature proteins with no common protein sequence. The only connection between them is that they are coded by the same genomic region and probably share the same transcriptional regulation (A. David et al., unpublished data).
Even more extreme is the example of the p19 and p16 protein products of the INK4a/ARF locus8. The two transcripts are synthesized from different promoters and have different first exons, but share exons 2 and 3, and are encoded in two distinct reading frames, in a process that yields two entirely different protein products. Although the p19 and p16 proteins are clearly the products of the same genomic locus, can we say that these two unusual and entirely distinct splice variants are coded by the same gene?
Clearly, increasing protein diversity does not simply correlate with increasing gene number. It is dependent both on the number of genes in the genome and on the rate of alternative splicing of those genes. Work is now needed to characterize in greater detail the molecular basis for this process and its regulation. This will likely uncover a host of new targets for drug discovery, yield new diagnostic markers for disease, and perhaps even help us unravel the mechanisms underlying biological complexity.
REFERENCES
- Smith, C.W. & Valcarcel, J. Trends Biochem. Sci. 25, 381-388 (2000). | PubMed | ISI |
- International Human Genome Sequencing Consortium. Nature 409, 860-921 (2001). | Article | PubMed |
- Brett, D. et al. FEBS Lett. 474, 83-86 (2000). | Article | PubMed | ISI |
- Mironov, A.A., Fickett, J.W. & Gelfand, M.S. Genome Res. 9, 1288-1293 (1999). | Article | PubMed | ISI |
- Croft, L. et al. Nat. Genet. 24, 340-341 (2000). | Article | PubMed | ISI |
- Cooper, T.A. & Mattox, W. Am. J. Hum. Genet. 61, 259-266 (1997). | PubMed | ISI |
- Diamandis, E.P. Trends Endocrinol. Metab. 9, 310-316 (1998). | Article | ISI |
- Sharpless, N.E. & DePinho, R.A. Curr. Opin. Genet. Dev. 9, 22-30 (1999). | Article | PubMed | ISI |
Nature Genetics 30 * January 13 2002
A genomic view of alternative splicing
Barmak Modrek & Christopher Lee
Departments of Chemistry and Biochemistry,
University of California Los Angeles,
Los Angeles, California 90095-1570, USA.
Correspondence should be addressed to C.L.
(e-mail: leec@mbi.ucla.edu).
Recent genome-wide analyses of alternative splicing indicate that 4060% of human genes have alter- native splice forms, suggesting that alternative splicing is one of the most significant components of the functional complexity of the human genome. Here we review these recent results from bioinformatics studies, assess their reliability and consider the impact of alternative splicing on biological functions. Although the `big picture' of alternative splicing that is emerging from genomics is exciting, there are many challenges. High-throughput experimental verification of alternative splice forms, functional characterization, and regulation of alternative splicing are key directions for research. We recommend a community-based effort to discover and characterize alternative splice forms comprehensively throughout the human genome.
Introduction
The sequencing of the human genome has raised important questions about the nature of genomic complexity. It was widely anticipated that the human genome would contain a much larger number of genes (estimates based on expressed-sequence clustering ran as high as 150,000 genes) than Drosophila (14,000 genes) or Caenorhabditis elegans (19,000 genes) 13 . The report of only 32,000 human genes thus came as a surprise 4,5 . This basic disparity indicated that the number of human expressed- sequence (mRNA) forms was much higher than the number of genes, suggesting a major role for alternative splicing in the production of complexity. Many groups have recently presented genomic analyses of alternative splicing that strongly support this hypothesis, raising intriguing questions about the identification, functional roles and regulation of alternative splice forms across the whole genome. The study of alternative splicing has long been a valuable subfield of molecular biology, but has received comparatively little attention compared with major fields such as the discovery of new genes or transcriptional regulation. Only several hundred alternatively spliced genes have been identified so far by molecular biologists (see Table 1 for database resources). After the discovery of exons and introns in the Adenovirus hexon gene in 1977 (ref. 6), Walter Gilbert proposed that different combinations of exons could be spliced together (`alternative splicing') to produce different mRNA isoforms of a gene 7 . By the early 1980s, alternative splicing was well documented in several genes 8,9 , and researchers estimated that 5% of genes in higher eukaryotes might have alternative splicing 10 . A range of processes from sex determination to apoptosis use alternative splicing 11,12 . Its regulatory mechanisms have recently been discovered in several genes 11,13 .
Genome-scale analyses of alternative splicing
High-throughput sequencing of the human genome and especially of expressed sequence tag (EST) sequences has enabled a completely different approach based on bioinformatics. Because ESTs are derived from fully processed mRNA (after 5 capping, splicing and polyadenylation), they provide a broad sample of mRNA diversity. This diversity can be analyzed computationally. In the last two years, bioinformatics studies have identified an order of magnitude more alternatively spliced genes than were found in the past 20 years and are beginning to provide a global view of alternative splicing in humans. We will first describe these studies and then assess the evidence. Bioinformatics approaches. Most bioinformatics studies 4,1418 (Table 2) rely on identifying ESTs that come from the same gene and looking for differences between them that are consistent with alternative splicing, such as a large insertion or deletion in one EST (Fig. 1 a ). Each candidate splice can be fur- ther assessed by aligning the ESTs exactly to their gene sequence in the draft genome (Fig. 1 b ). This reveals candidate exons (matches to the genomic sequence) separated by candidate splices (large gaps in the EST-genomic alignment; Fig. 1 b ). As intronic sequences at splice junctions are highly conserved (99.24% of introns have a GT-AG at their 5 and 3 ends, respectively), they can be used to verify candidate splices 19 . In the earliest large-scale discovery of new alternative splicing, Mironov et al. 14 aligned ESTs to genomic sequence for 392 known genes and found alternative splicing in 133 of these genes 14 . Croft et al . 15 took a different approach that did not rely on aligning ESTs to the complete genomic sequence: they created a database of individual intron sequences annotated in GenBank and searched for EST sequences that matched intronic sequence. They found matches to introns from 582 genes, suggesting an alternative splice. Brett et al . 16 looked for insertions or deletions in ESTs relative to a set of known mRNAs, indicative of alternative splices, but without EST alignment to the genomic sequence. This work identified 3,011 alternatively spliced genes 16 . The International Human Genome Sequencing Consortium reported 145 alternatively spliced genes from a comprehensive analysis of chromosome 22 based on aligning ESTs to the genomic sequence 4 . Modrek et al . 18 aligned available human EST and mRNA sequences (2.1 million) to the whole draft genome, applying strict matching, splice site and alternative splice detection criteria, to identify 6,201 alternative splices in 2,272 genes.
Alternative splicing frequency.
These studies have consistently reported a high rate of alternative splicing in the human genome, with 3559% of human genes showing evidence of at least one alternative splice form 4,14,1618 . Moreover, given that only a few ESTs have been sequenced for most genes, it seems possible that even more alternative splicing exists that is not yet detectable in the available ESTs. These studies indicate that alter- native splicing is far more abundant, ubiquitous and functionally important than previously thought. And there are more types of mRNA isoforms. For example, bioinformatics studies have reported that about 25% of genes have alternative polyadenylation forms, that is, mRNAs that are cleaved and polyadenylated at different sites 4,20 .
Functional impact.
How do these newly discovered alternative mRNA forms affect protein function? Despite an early report that most alternative splices occur within the 5 untranslated region 14, recent studies indicate that 7088% of alternative splices change the protein product 4,17,18 . The majority of these changes appear to be functionally interesting, such as replacement of the amino or carboxy terminus, or in-frame addition and removal of a functional unit (Fig. 2 b ) 18 . Only 19% of the alternative protein forms were shortened due to frameshift 18 . Fig. 2 c shows an alternative isoform of a new FC receptor -like protein, whose C-terminal transmembrane domain (TM) and cytoplasmic tail (important for signal transduction in this class of receptors) is neatly replaced with a new TM domain and tail by alternative polyadenylation 18 . What is the functional pattern of alternative splicing across the genome? A random sample of 50 alternatively spliced genes showed that over three-quarters were involved in signaling and regulation (such as receptors, signal transduction, transcription factors, and so on). Moreover, the systemic categories most highly represented in this sample were genes specific to the immune and nervous systems 18 . This should be interpreted cautiously, as the overall breakdown of gene functions in the whole genome is still unclear. However, alternative splicing may be most important in complex systems where information must be processed differently at different times (such as immune tolerance, or development) or a very high level of diversity is required (such as axonal guidance). Notable examples of combinatorial alternative splicing of multiple cassettes of exons, generating up to 40,000 isoforms of a single gene, have recently been discovered in the nervous system, including Dscam (axonal guidance receptor in Drosophila ) and neurexin (neuropeptide receptor) 21 .
Fig. 2 Types of alternative splicing and possible effects on protein. a , Alternative splicing can lead to either the inclusion or exclusion of an exon, use of a different 5 site, or use of a different 3 site. b , Alternative splicing can lead to use of a different site for translation initiation (alternative initiation), a different translation termination site due to a frameshift (truncation or extension), or the addition or removal of a stop codon in the alternative coding sequence (alternative termination). Alternative splicing can also change the internal region because of an in-frame insertion or deletion. c , Alternative splicing of Hs.11090, a putative FC receptor chain homolog: genomic structure and two alternative spliced (and polyadenylated) mRNA forms. The differential RNA processing results in substitution of one transmembrane domain instead of another. However, one form has a different cytoplasmic tail (involved in signaling in this family), whereas the other does not.
Bioinformatics evidence for alternative splicing.
It is essential that biologists understand the forms of evidence and problems that underlie this new `big picture' view of alternative splicing. Bioinformatics is an automated analysis of high-through-put experimental data and follows a very different process than traditional molecular biology. It can be simultaneously `more rigorous' (much more detailed, mathematical measures of evidence are required for a computer to do this analysis at all) and much less rigorous (bioinformaticists typically cannot order a new set of experimental tests for all the isoforms they detect, as is common in molecular biology labs studying a specific isoform). Two kinds of problems must be distinguished: (i) a false negative, the failure to detect a real splice form, and (ii) a false positive, a reported result that is not a true, functional splice form. Analyzing the causes of these problems during cDNA library construction, EST sequencing and sequence comparison suggests many interesting questions for the next stage of this research (Table 3). Detection of alternative splicing through bioinformatics depends on finding deviant EST forms within the mass of data produced by undirected EST sequencing, raising a fundamental question: when an analysis is used to look for some form of deviation in a very large data set, other causes of deviation, even if infrequent, could add up to a substantial fraction of the result. How can we be sure this is real alternative splicing? The bioinformatics studies have tried carefully to screen out many possible sources of false positives. Simple forms of EST deviation, such as random variation in where a given EST sequence begins or ends within a gene, and potential vector contamination at the ends of ESTs, are excluded. The most important screen is provided by mapping (aligning) ESTs to the draft human genome sequence. Chimeric ESTs can be easily excluded by requiring that each EST align completely to a single genomic locus. The genomic location found by homology search and alignment can often be checked against radiation hybrid mapping data. As the genomic regions that match the ESTs should be exons and the alignment gaps between them should be introns, the putative splice sites at their boundaries can be carefully checked. Because the splice-site motifs (GT-AG, polypyrimidine tract, and so on) are primarily in the intron, this provides a validation that is independent of the EST evi- dence. Reverse transcriptase artifacts or other problems causing imperfect cDNA construction may be screened out in this way. Improper inclusion of genomic sequence in ESTs (due to either mRNA purification problems or incomplete splicing) can also be excluded by requiring pairs of mutually exclusive splices in different ESTs. Observing a given splice in one EST but not in a second EST may be insufficient, because the latter could be an un-spliced EST rather than a biologically significant intron inclusion. This problem can be eliminated by focusing on mutually exclusive splices, two different splices seen in different ESTs, that overlap in the genomic sequence. One can make this even stricter by requiring that the two splices share one splice site but differ at the other. This approach detects the classic forms of alternative splicing, such as alternative exon usage and alternative 5 or 3 splicing (Fig. 2 a ). Detection of valid intron inclusions will probably require further statistical analysis. The presence in the human genome of many pseudogenes and paralogous genes resembling other genes greatly complicates the problem. Correct alternative splice detection depends on clustering the EST data into separate groups representing individual genes. EST clustering (such as UniGene) is well known to have both exces- sive `splitting' of genes (there are 80,000 UniGene clusters, versus the estimate of 32,000 human genes) and excessive `lumping', in which paralogous gene sequences are mixed together 4,22 . This mixing can suggest spurious alternative splices that are actually just differences between similar but distinct genes 23 . Methods that map the ESTs onto genomic sequence with a high level of identity (9598%) probably exclude much of this paralog mixing, but not all. Ultimately, mapping ESTs to their unique gene location in the genomic sequence is the only way to sort out paralogs. Requiring that the consensus sequence for an EST cluster match completely, over its full length, to its genomic contig can help exclude artifacts where the genomic sequence has been misassembled. Instead of getting false positives (incorrect alternative splices), this may cause false negatives due to refusing to map the EST cluster at all. A high rate of false negatives is the greatest disadvantage of methods that require mapping ESTs to the draft genome sequence. Despite these sources of uncertainty, the agreement among many studies on a high frequency of alternatively spliced genes (3560%) suggests that this result is valid. These studies support each other persuasively, because they differ not only in the sets of genes sampled (ranging from well-characterized mRNAs, to specific chromosomes, to a whole-genome study), but also in their specific criteria for reporting an alternative splice. It is important, however, to emphasize that there has only been one study so far verifying alternative splices detected by bioinformatics. Twenty genes with putative alternative splices were amplified from a multiple tissue cDNA panel by RTPCR, with primers flanking the alternative splice (Fig. 3 a ). Sixteen were confirmed to be alternatively spliced, although thirteen of them were already recognized in the literature 16 .
Future Challenges
High-throughput validation.
Large-scale experimental verification of alternative splicing will be needed to assess the accuracy of the bioinformatics-based analyses. One promising technology is inkjet printing of long probes (up to 60 nt) to make rapidly customizable microarrays. Shoemaker et al . 24 used this technology to monitor the coordinate expression of 8,183 exons annotated on chromosome 22q. This technology could easily be adapted to detect alternative splicing, by designing probes that span specific exonexon junctions. As alternative splicing of a given gene creates different exonexon junctions, it can be detected by measuring hybridization of mRNA samples from different tissues to these probes (Fig. 3 b ). Whereas the hybridization ratios of most exonexon junction probes for a given gene will be constant, alternative splicing will cause some junctions to be up- or down- regulated in different tissues. Rapid printing of such `splicing chips' will enable cataloging of splice forms for all genes, in different tissues, developmental states and conditions. Com- bined with the human genome sequence, this data can in turn be used to identify cis elements that regulate these forms. Recently, the Affymetrix microarray design has also been used to identify potential alternative splices within the rat genome. The Affymetrix array uses 20 probe pairs (25 nt) representing different exons of a gene. Whereas the intensities of most probes for a gene varied together in different tissues, probes for certain exons were anomalously depressed in some tissues, indicating potential alternative splices 25 . Other methodologies that use microarray technology to assess alternative splicing have also been developed (X.-D. Fu and M. Ares Jr, personal communication).
Rigorous measures of evidence.
It should be emphasized that microarray approaches will not settle the question of identifying alternative splices independent of bioinformatics analysis. If any- thing, these data are likely to increase the need for bioinformatics, to measure rigorously the strength of the evidence for alternative splices in all the raw experimental data (ESTs, microarrays, and so on). For example, the original inkjet microarray paper treated differences in probe hybridization among exons in a gene as indicators that low-expressed probes were not real exons but simply gene prediction errors. By con- trast, the Affymetrix study treated such differences as evidence of alternative splicing. The assessment of both competing interpretations is a bioinformatics analysis problem. This will require moving beyond simple `rules' for filtering out potentially misleading data to probabilistic measurement of the relative strength of the evidence for the competing interpretations.
Cataloguing alternative splice forms.
Although the new bioinformatics results are based on data from the whole genome, it is important to understand they are highly incomplete. They detect many new splice forms but miss many known isoforms. This is a result of both the incompleteness and fragmentation of the EST and genomic sequence data, as well as many causes of false negatives in the bioinformatics methods (Table 3). In Modrek et al . 18 , at least 50% of the EST data (and their potential alter- native splices) were excluded by these problems. These studies are just the beginning of an accelerating process of mRNA isoform discovery. The EST sequence data are growing rapidly, the draft genome sequence is being completed and new streams of high-throughput data (such as splice-detection microarrays) are beginning. Thus, a worthwhile goal is simply to build a catalog of alternative splice forms, just as the human genome sequence is being used to build a catalog of the genes. The development of new high-throughput technologies for detecting the protein products of alternative splicing will be needed to streamline this process.
What is truly functional?
Although bioinformatics and high- throughput experiments can have a key role in building a catalog, in our view this can only succeed as a community annotation process involving all molecular biology researchers. For example, how can one prove that a particular splice form is actually carrying out an important biological function? Even with strong evidence that a form is real (that it was actually made by the spliceosome in a living cell), it does not seem safe to assume that it has a biological function. If the spliceosome had a 0.1% rate of mis-splicing, it could produce over 4,000 meaningless `alternatively spliced' ESTs among the approximately 4 million ESTs. Bioinformatics can partly address this by discerning that a large subset of alternative splice forms (47%) are observed in multiple ESTs (often from different libraries) and thus are unlikely to be low-frequency error products 18 . At the same time, it is also not safe to dismiss a given form as `functionless' simply because it has no obvious function. For example, even an alternative splice form that causes early translational termination (and an inactive protein product) can act as an important form of regulation of biological activity 13 . Only detailed functional studies can resolve these questions. Bioinformatics can infer likely functional impacts, however, by detecting the addition or removal of known domains, and can predict how experimenters could verify the presence of these forms and their likely disease or tissue specificity. Biologists interested in some of these putative forms could then use a variety of techniques (PCR, northern and western blots) to test these predictions. This process will be best served by a central repository for both the bioinformatics predictions and subsequent experimental verification and functional studies, which would act as a community annotation database (Fig. 4). We hope this process can evolve rapidly into an active partnership between prediction and experiment.
Alternative splicing regulation.
One intriguing new area is the study of alternative splicing regulation. Regulation of splicing could be involved in 15% of genetic diseases 26 and may contribute to cancer by missplicing of exon 18 in BRCA1 , which is caused by a polymorphism in an exonic enhancer 27 . If alternative splicing is as widespread as bioinformatics studies indicate, how different splice forms are turned on and off may become a major research area, like transcriptional regulation. So far, molecular biology has identified some cis regulatory elements (such as exonic splicing enhancers) and trans factors (SR proteins, PTB, and so on) 11,13 . Bioinformatics could make important contributions, for example, in the identification of cis regulatory elements 2831 . Recently, Brudno et al . 31 analyzed intronic sequence upstream and downstream of 25 alternatively spliced brain specific exons. They detected the motif UGCAUG at a much higher frequency downstream of alternatively spliced exons (relative to constitutive exons), for both brain-specific and muscle-specific alternative splicing 31 . This motif had previously been implicated in the alternative splicing of several genes including c-src, fibronectin, calcitonin/CGRP, and nonmuscle myosin II heavy chain-B 3235 , so this result is very suggestive. It bodes well for genome-wide studies that combine the flood of new alternative splicing data with complete genome sequences for multiple organisms.
Acknowledgments
We are grateful to D. Black, S. Galbraith and K. Ke for their critical comments and suggestions. C.L. was supported by a grant from the Department of Energy. B.M. was supported by National Science Foundation Integrative Graduate Education and Research Training award. Received 16 August; accepted 20 November 2001.
1.Pennisi, E. Human genome project: and the gene number is...? Science 288 , 11461147 (2000).
2.Adams, M.D. et al. The genome sequence of Drosophila melanogaster . Science 287 , 21852195 (2000).
3. The C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans : a platform for investigating biology. Science 282 , 20122018 (1998).
4. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409 , 860921 (2001).
5. Venter, J.C. et al. The sequence of the human genome. Science 291 , 13041351 (2001).
6. Sambrook, J. Adenovirus amazes at Cold Spring Harbor. Nature 268 , 101104 (1977).
7. Gilbert, W. Why genes in pieces? Nature 271 , 501 (1978).
8. Early, P. et al. Two mRNAs can be produced from a single immunoglobulin m gene by alternative RNA processing pathways. Cell 20 , 313319 (1980).
9. Rosenfeld, M.G. et al. Calcitonin mRNA polymorphism: peptide switching associated with alternative RNA splicing events. Proc. Natl Acad. Sci. USA 79 , 17171721 (1982).
10. Sharp, P.A. Split genes and RNA splicing. Cell 77 , 805815 (1994).
11. Lopez, A.J. Alternative splicing of pre-mRNA: developmental consequences and mechanisms of regulation. Annu. Rev. Genet. 32 , 279305 (1998).
12. Boise, L.H. et al. bcl-x, a bcl-2-related gene that functions as a dominant regulator of apoptotic cell death. Cell 74 , 597608 (1993).
13. Smith, C.W.J. & Valcarcel, J. Alternative pre-mRNA splicing: the logic of combinatorial control. Trends. Biochem. Sci. 25 , 381388 (2000).
14. Mironov, A.A., Fickett, J.W. & Gelfand, M.S. Frequent alternative splicing of human genes. Genome Res. 9 , 12881293 (1999).
15. Croft, L. et al. ISIS, the intron information system, reveals the high frequency of alternative splicing in the human genome. Nature Genet. 24 , 340341 (2000).
16. Brett, D. et al. EST comparison indicates 38% of human mRNAs contain possible alternative splice forms. FEBS Lett. 474 , 8386 (2000).
17. Kan, Z., Rouchka, E.C., Gish, W.R. & States, D.J. Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res. 11 , 889900 (2001).
18. Modrek, B., Resch, A., Grasso, C. & Lee, C. Genome-wide analysis of alternative splicing using human expressed sequence data. Nucleic Acids Res. 29 , 28502859 (2001).
19. Burset, M., Seledtsov, I.A. & Solovyev, V.V. Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res. 28 , 43644375 (2000).
20. Beaudoing, E., Freier, S., Wyatt, J.R., Claverie, J. & Gautheret, D. Patterns of variant polyadenylation signal usage in human genes. Genome Res. 10 , 10011010 (2000).
21. Graveley, B.R. Alternative splicing: increasing diversity in the proteomic world. Trends Genet. 17 , 100107 (2001).
22. Wheeler, D.L. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 28 , 1014 (2000).
23. Burke, J., Wang, H., Hide, W. & Davison, D.B. Alternative gene form discovery and candidate gene selection from gene indexing projects. Genome Res. 8 , 276290 (1998).
24. Shoemaker, D.D. et al. Experimental annotation of the human genome using microarray technology. Nature 409 , 922927 (2001).
25. Hu, G.K. et al. Predicting splice variant from DNA chip expression data. Genome Res. 11 , 12371245 (2001).
26. Krawzczak, M., Reiss, J. & Cooper, D.N. The mutational spectrum of single base- pair substitutions in mRNA splice junctions of human genes: causes and consequences. Hum. Genet. 90 , 4154 (1992).
27. Liu, H.X., Cartegni, L., Zhang, M.Q. & Krainer, A.R. A mechanism for exon skipping caused by nonsense or missense mutations in BRCA1 and other genes. Nature Genet. 27 , 5558 (2001).
28. Stamm, S., Zhang, M.Q., Marr, T.G. & Helfman, D.M. A sequence compilation and comparison of exons that are alternatively spliced in neurons. Nucleic Acids Res. 22 , 15151526 (1994).
29. Kent, W.J. & Zahler, A.M. Conservation, regulation, synteny, and introns in a large-scale C. briggsae C. elegans genomic alignment. Genome Res. 10 , 11151125 (2000).
30. Stamm, S. et al. An alternative-exon database and its statistical analysis. DNA Cell Biol. 19 , 739756 (2000).
31. Brudno, M. et al. Computational analysis of candidate intron regulatory elements for tissue-specific alternative pre-mRNA splicing. Nucleic Acids Res. 29 , 23382348 (2001).
32. Modafferi, E.F. & Black, D.L. A complex intronic splicing enhancer from the c-src pre-mRNA activates inclusion of a heterologous exon. Mol. Cell. Biol. 17 , 65376545 (1997).
33. Huh, G.S. & Hynes, R.O. Regulation of alternative pre-mRNA splicing by a novel repeated hexanucleotide element. Genes Dev. 8 , 15611574 (1994).
34. Hedjran, F., Yeakley, J.M., Huh, G.S., Hynes, R.O. & Rosenfeld, M.G. Control of alternative pre-mRNA splicing by distributed pentameric repeats. Proc. Natl Acad. Sci. USA 94 , 1234312347 (1997).
35. Kawamoto, S. Neuron-specific alternative splicing of nonmuscle myosin II heavy chain-B pre-mRNA requires a cis -acting intron sequence. J. Biol. Chem. 271 , 1761317616 (1996).
36. Dralyuk, I., Brudno, M., Gelfand, M.S., Zorn, M. & Dubchak, I. ASDB: database of alternatively spliced genes. Nucleic Acids Res. 28 , 296297 (2000).
37. Ji, H. et al. AsMamDB: an alternative splice database of mammals. Nucleic Acids Res. 29 , 260263 (2001).
38. Spingola, M., Grate, L., Haussler, D. & Ares, M.J. Genome-wide bioinformatic and molecular analysis of introns in Saccharomyces cervisiae . RNA 5 , 221234 (1999).
39. Kent, W.J. & Zahler, A.M. The intronerator: exploring introns and alternative splicing in Caenorhabditis elegans . Nucleic Acids Res. 28 , 9193 (2000). ©2002 Nature Pub lishing Gr oup http://g enetics.nature .com
Is Biologist Barry Commoner a Mutant?
January 30, 2002
By Ronald Bailey
Ronald Bailey is Reason's science correspondent and the editor of Earth Report 2000: Revisiting the True State of the Planet(McGraw-Hill)
Genes are not neat orderly sequences of DNA bases that are simply read off one by one. Instead, the DNA bases that make up a gene--called exons--are often interrupted by other DNA bases called introns that have nothing to do with the gene. In the first step in transcribing DNA into RNA, both exons and introns are read off to produce pre-messenger RNA. To get the proper recipe for a protein, the introns must be removed. That feat is accomplished by an editing machine composed of RNA and protein called the spliceosome that removes the introns and splices together the exons into mature messenger RNA that now embodies the proper recipe for a specific protein.
Alternative splicing occurs when regulatory elements in the genome perhaps tell the spliceosome to treat some introns as exons or some exons as introns, thus changing the protein recipe. As University of Georgia biologist Wayne Parrott notes, to a certain extent this is all a matter of nomenclature -- is it the "same" gene that is specifying different proteins or are they really different genes that happen to share overlapping DNA sequences? The fact is "there is still one DNA sequence per protein," says Parrott.
. . .there simply arent any "failures" in commercial plant biotechnology he can cite.
and Has Commoner any evidence that [problems] occurs frequently or at all in commercial biotech crops? If he does, he doesnt cite it in Harpers.
This is a bit of an exaggeration -- note the use of the word "frequently" and see the article.
To produce a commercial biotech crop variety, biotechnologists typically begin by producing hundreds and thousands of plants in which they are trying to insert a particular gene. Over the years they grow and select the ones in which the trait they are seeking -- say, pest resistance--is stable. Only after years of testing and research will they commercialize the selected crop variety.
. . . says biologist Parrott. "Transgenics cannot be different from conventional varieties." Biotech crops must be "substantially equivalent" to conventional varieties before they can be marketed. In every case, biotech companies have submitted reams of information to the Food and Drug Administration (FDA) on things like nutrient profiles and feeding values before marketing genetically enhanced crops.
"substantially equivalent"? Lots of wiggle room there.
Interestingly, scores of varieties of crops being grown today were produced through mutations induced by radiation and caustic chemicals in the 1940s and 1950s. No one knows what proteins these random genetic mutations produced, but people have been eating them for half a century without ill effects.
Yes, this is interesting.
Parrott points out that plant genomes are filled with DNA fragments called retrotransposons that naturally jump randomly from one part of a plants genome to another. These jumps occur billions of times every growing season. They often disrupt gene expression in plants and may well sometimes induce the production of novel proteins. But this is no cause for alarm, since people have been eating these crops with their jumping genomes for centuries. It is evident that such disruptions in plant genomes have an extremely low probability of producing any dangerous proteins.
Well, Jumping Genomes!!!
A year ago, it was fairly consistently reported that the human genome projects had counted some 30,000 - 40,000 human genes with, at that time, seemingly little dissent. This leads me to believe that there was a generally agreed-upon usage of the term "gene" and that what we may be seeing today is a redefinition of terms to more conform to expectations. But you all tell me . . .
The foregoing is not in any way the picture of simplicity and I would be most interested in hearing about the "guiding mechanism" that tells the splicosomes what to do and when. I think it's clear that there is much we do not understand, and that alone would appear to be sufficient reason for caution.
ME TOO!!!! YEAH, YEAH!!!!! me too!!
...but you just can't bring yourself to say, God, I guess.
What an interesting thread! Thanks for posting the story at the top. What I'm most reminded of, by the Human Genome Project's results thus far, is the persistent failure of the "science" of alchemy to transmute dross metal into gold. In the Mediaeval ages and later (and maybe sooner), some of the most brilliant and penetrating minds were devoted to facilitating this object. But nobody ever got anywhere. After centuries of trying, the answer always came up: "No, you can't do that. It is in the nature of things that that should be so." So gambling is all you've got left to satisfy such cravings . Some passages from Voegelin on the general context in which the present hush may be whispered: * * * * * * We must remind the reader that at the end of the sixteenth century Giordano Bruno had formulated clearly the issue between speculation on the infinite substance of the cosmos and a mathematized science of the "accidences of accidences." Bruno's speculation, on the one hand, found no immediate succession. The "accidences of accidences," on the other hand, had become the absorbing interest of scholars as well as of a wider public in the centuries of the rising natural sciences. The impressive spectacle of the advancement of science and of the Newtonian system created attitudes and sentiments that have become a decisive ingredient in modern man and modern civilization. One element in this new complex of sentiments [is] scientism: the belief in mathematized science as the model science to the methods of which all other sciences should conform. We must now deal with the complex as a whole, and we shall call it phenomenalism in order to indicate the preoccupation of man with the phenomenal aspects of the world, as they appear in science, and the atrophy of awareness of the substantiality of man and the universe. Phenomenalism has nothing to do with the method of the advancement of science itself; the term is supposed to designate sentiments, imaginations, and speculations, as well as patterns of conduct determined by them, which originate on occasion of the advancement of mathematized science. Furthermore, we must beware of the assumption that the advancement of science is the one and only cause of the rise of phenomenalism. The new sentiments and attitudes, while hardly conceivable without the prodigious advancement of science, are not necessitated by it. That phenomenalism could gain the importance that it actually has is primarily due to the atrophy of Christian spirituality and the growth of intramundane sentiments. The advancement of science is a contributing factor in the process, insofar as its success is apt to fortify intramundane sentiments; and insofar as phenomenalism, grafted on science, has become an important instrument for their expression. [Eric Voegelin, The History of Political Ideas, Volume VII: The New Order and Last Orientation. Columbia: University of Missouri Press, 1999. ] * * * * * * EV is daunting. (Personally, I almost had a heart attack the first time I encountered his term, "hypostasization of reality.") Maybe some notes on the above text might be helpful. (Please beware, this according to my interpretation.) First, Giordano Bruno was an "Italian philosopher, b. at Nola in Campania, in the Kingdom of Naples, in 1548; d. at Rome, 1600. At the age of eleven he went to Naples, to study "humanity, logic, and dialectic", and, four years later, he entered the Order of St. Dominic, giving up his worldly name of Filippo and taking that of Giordano. He made his novitiate at Naples and continued to study there. In 1572 he was ordained priest." [Catholic Encyclopaedia on-line] In 1600, however, he was burned at the stake as a heretic of pantheist and Unitarian persuasions. "The infinite substance of the cosmos" refers ultimately to the life of God and its manifestation in man and nature. It grapples with the questions, "why does anything exist? And why is a given thing the way it is, and not some other way?" A student of culture and history knows that such ultimate questions have resonated with intelligent human beings for millennia by now. They constitute the formal philosophical discipline called ontology: the study of Being. I conclude that "being" and "substance" are virtually synonymous terms in the contexts of Bruno and EV. With the Greeks and the Christians, the result of such questions has been the development of a "science of man," an anthropology, that is premised on man being a "natural creature," but also a "spiritual creature." That is to say, man lives in the space-time dimension that conditions empirical reality; but he is not completely contained, constrained, or determined by empirical conditions. (This is why man is said to have Free Will. But again, this development deduces from classical/Christian premises.) The "intramundane man" has extension into the infinite; that is, there is a native capacity for transcendence in the nature of man. He lives in at least two time orders, the "natural," spatio-temporal order in which we all "naturally" live; and also an order that is "outside" or "beyond" time. The mystery is that both orders interleave, or "play" more or less simultaneously, whether we are specifically conscious of this or not. (But this would be the subject of a whole 'nother thread. I'd love to get back to it some time; but right now, we're out of time and bandwidth.) "Accidences of accidences" is Bruno's quaint way of signifying a chain of causation that never reaches out beyond the intramundane dimension. That is, it confines itself to the study of causal relations among observable phenomena the way science must. But that supposition suggests to me that a very great deal of the human picture must be deliberately erased in order to make this "understanding" turn out "right." If you properly understand the point of the phenomenalist exercise, you know it seeks to account for man and the universe without reference to anything lying beyond time and space such that sensory perception can register. Its basic definition (it doesn't even have an anthropology) is that man is abstract individual, with no ties to the past, the most tenuous ties to the present (living in TV land and admiring hard-core sophistry as much as he seems to do) no expectations of the future, and no interest in understanding his own existence as having extension and expression beyond a world which itself is condemned to intramundane existence. I figure you get "accidences of accidences" problems anytime you get your fundamental premise wrong. As arguably, the Human Genome Project has done exactly this. But then, the point and purpose of the Human Genome Project from the beginning was to factually establish the theory of "accidences of accidences," not to refute it. It's late. Must be time to stop. Anyone wants to continue with anything above, please just give me a yell. Thank you for a wonderful discussion, Phaedrus. Peace and love, bb.
#73 posted by betty boop
ME TOO!!!! YEAH, YEAH!!!!! me too!!
-----------------------------------
Ahhhh yes, - NO one can refute BB on Voegelin, because NO one can understand the man to begin with.
He is a master of gibberish, bombast & grandiloquent nonsense. -- A rhetorical fraud, imo.
And please, -- run off another few hundred words, telling me about his 'brilliant' career. -- I love this sort of pretentious BS.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.