Posted on 11/22/2002 9:09:10 PM PST by forsnax5
NSF awards grants to discover the relationships of 1.75 million species
One of the most profound ideas to emerge in modern science is Charles Darwin's concept that all of life, from the smallest microorganism to the largest vertebrate, is connected through genetic relatedness in a vast genealogy. This "Tree of Life" summarizes all we know about biological diversity and underpins much of modern biology, yet many of its branches remain poorly known and unresolved.
To help scientists discover what Darwin described as the tree's "everbranching and beautiful ramifications," the National Science Foundation (NSF) has awarded $17 million in "Assembling the Tree of Life" grants to researchers at more than 25 institutions. Their studies range from investigations of entire pieces of DNA to assemble the bacterial branches; to the study of the origins of land plants from algae; to understanding the most diverse group of terrestrial predators, the spiders; to the diversity of fungi and parasitic roundworms; to the relationships of birds and dinosaurs.
"Despite the enormity of the task," said Quentin Wheeler, director of NSF's division of environmental biology, which funded the awards, "now is the time to reconstruct the tree of life. The conceptual, computational and technological tools are available to rapidly resolve most, if not all, major branches of the tree of life. At the same time, progress in many research areas from genomics to evolution and development is currently encumbered by the lack of a rigorous historical framework to guide research."
Scientists estimate that the 1.75 million known species are only 10 percent of the total species on earth, and that many of those species will disappear in the decades ahead. Learning about these species and their evolutionary history is epic in its scope, spanning all the life forms of an entire planet over its several billion year history, said Wheeler.
Why is assembling the tree of life so important? The tree is a picture of historical relationships that explains all similarities and differences among plants, animals and microorganisms. Because it explains biological diversity, the Tree of Life has proven useful in many fields, such as choosing experimental systems for biological research, determining which genes are common to many kinds of organisms and which are unique, tracking the origin and spread of emerging diseases and their vectors, bio-prospecting for pharmaceutical and agrochemical products, developing data bases for genetic information, and evaluating risk factors for species conservation and ecosystem restoration.
The Assembling the Tree of Life grants provide support for large multi-investigator, multi-institutional, international teams of scientists who can combine expertise and data sources, from paleontology to morphology, developmental biology, and molecular biology, says Wheeler. The awards will also involve developing software for improved visualization and analysis of extremely large data sets, and outreach and education programs in comparative phylogenetic biology and paleontology, emphasizing new training activities, informal science education, and Internet resources and dissemination.
-NSF-
For a list of the Assembling the Tree of Life grants, see: http://www.nsf.gov/bio/pubs/awards/atol_02.htm
According to the non-numeric data, I guess some, including OJ, think he is innocent. The blood of the two victims, established by some numerical standard, in his Bronco was insufficient to the jury which had a preconception that he was innocent. Kinda like the preconception that the genes are ancestrally connected.
My interest in these debates lies much more with the physics side (incl. math, information theory, cosmology, etc.) and your discussion with AndrewC has really made me curious as to intelligence and the genome itself.
As you can see from this article, it has not been easy to simulate physical intelligence: What Is Intelligence?
Give us a number, then. If 43% correspondence in an equivalent section isn't enough, what is? 50%? 75%? 99%?
It is there in the expect value.
Score E Sequences producing significant alignments: (bits) Value motA 567 e-162 BS_motA 91 1e-18 TM0676 74 1e-13 HP0815 68 6e-12 TP0725 68 6e-12 jhp0751 68 6e-12 BB0281 62 3e-10 aq_1003 58 6e-09 BS_ytxD 43 2e-04 PH0632 31 1.1 CT365 31 1.1 MTH1022 30 1.4 srlE 30 1.4 MTH924 30 1.8 slr0301 30 1.8 HI1728 29 3.1 Rv3689 29 4.0 jhp0817 29 4.0 acrB 29 4.0 BS_ylmB 28 5.2 acrF 28 5.2 TM1385 28 6.8 YEL031w 28 6.8 ydiS 28 6.8 acrD 28 6.8 yhiV 28 6.8 AF0134 28 6.8 Rv3091 28 8.9 BS_braB 28 8.9 sll0537 28 8.9
Notice the significant change? (Although apparently those in the know accept 10 as the cutoff for acceptance of "similarity") My phrase got 2.5. The sequence of the MOTA from the D. Vulgaris and MTH1022 when reduced to the range of 55 was IIRC ~e-05.
Score E Sequences producing significant alignments: (bits) Value MTH1022 506 e-144 sll0477 71 7e-13 RP309 62 5e-10 sll1404 61 9e-10 slr0677 60 2e-09 aq_1988 58 6e-09 HP1339 48 6e-06 jhp1258 48 6e-06 tolQ 47 1e-05 CT596 46 2e-05 CPn0785 46 3e-05 HI0253 45 4e-05 exbB 45 4e-05 aq_1757 45 5e-05 HI0385 45 7e-05 TM0676 44 1e-04 aq_1003 43 2e-04 MTH671 42 4e-04 HP1445 42 6e-04 jhp1338 42 6e-04 BB0281 42 6e-04 HP1130 40 0.002 jhp1058 40 0.002 HP0815 37 0.018 jhp0751 37 0.018 BS_ytxD 33 0.20 TP0725 33 0.26 PH0361 30 1.7 CT874 30 1.7 motA 30 1.7 Rv0545c 29 2.8 slr0531 29 3.7 sll0223 29 3.7 BS_motA 28 4.9 YNL189w 28 4.9 sfcA 28 4.9 ftsW 28 4.9 jhp0723 28 6.3 HP0786 28 8.3 PH0504 28 8.3 AF1017 28 8.3 Comparing only the 55--(Don't blame me this is what the Cognitor returns) # >aq_1003 # Length = 254 # # Score = 39.3 bits (90), Expect = 5e-04 # Identities = 21/46 (45%), Positives = 29/46 (62%), Gaps = 3/46 (6%) # # Query: 2 PMLGLIGTVIGIWYTFRALGVNADPAAMAEGIYVALITTILGLAVA 47 # P G+IGT+IG+ R L DP+A+ G+ VALITT+ G +A # Sbjct: 155 PAFGMIGTLIGLIQMLRNLN---DPSALGPGMAVALITTLYGAILA 197
The Expect value (E) is a parameter that describes the number of hits one can "expect" to see just by chance when searching a database of a particular size. It decreases exponentially with the Score (S) that is assigned to a match between two sequences. Essentially, the E value describes the random background noise that exists for matches between sequences. For example, an E value of 1 assigned to a hit can be interpreted as meaning that in a database of the current size one might expect to see 1 match with a similar score simply by chance. This means that the lower the E-value, or the closer it is to "0" the more "significant" the match is. However, keep in mind that searches with short sequences, can be virtually indentical and have relatively high EValue. This is because the calculation of the E-value also takes into account the length of the Query sequence. This is because shorter sequences have a high probability of occuring in the database purely by chance. For more details please see the calculations in the BLAST Course.
The Expect value can also be used as a convenient way to create a significance threshold for reporting results. You can change the Expect value threshold on most main BLAST search pages. When the Expect value is increased from the default value of 10, a larger list with more low-scoring hits can be reported.
----------
BLAST 2 SEQUENCES RESULTS VERSION BLASTP 2.2.4 [Aug-26-2002] Matrix gap open: gap extension:
x_dropoff: expect: wordsize: Filter
Sequence 1 | gi 15606306 | flagellar motor protein MotA [Aquifex aeolicus] | Length | 254 | (1 .. 254) |
Sequence 2 | gi 21399546 | MotA_ExbB, MotA/TolQ/ExbB proton channel family [Bacillus anthracis A2012] | Length | 254 | (1 .. 254) |
2 | 1 |
Score = 138 bits (348), Expect = 9e-32
Identities = 82/237 (34%), Positives = 136/237 (56%), Gaps = 6/237 (2%)
Query: 7 IGIIAAFLLILISILIGG----SITAFINVPSIFIVVGGGMAAAMGAFPLKDFIRGVLAI 62 +GII F +++ +I++GG + F++V SI IV+GG A + A+ + + +I Sbjct: 1 MGIIVGFAIVIAAIMLGGGGIKAFKNFLDVSSILIVIGGTTATIVVAYRFGEIKKYTKSI 60 Query: 63 KKAFLWKPPDLNDVIETIGEIASKVRKEGILALEGDIELYYQKDPLLGDMIRMLVDGIDI 122 + DL + + + + K +K G+L+LE D E +P + IR+++ G D Sbjct: 61 FTVLHRREEDLEQLTDLFVDFSKKSKKHGLLSLEVDGEQV--DNPFIQKGIRLMLSGYDE 118 Query: 123 NDIKATAEMALAQLDEKMSTEVAVWEKLADLFPAFGMIGTLIGLIQMLRNLNDPSALGPG 182 ++K + ++ A+ +K+ D PA+GMIGTLIGLI ML+NL D S +G G Sbjct: 119 EELKEVLMKDVETEVYELRKGAALLDKIGDFAPAWGMIGTLIGLIIMLQNLQDTSQIGTG 178 Query: 183 MAVALITTLYGAILANAFAIPVANKLKKAKDMEVLVKTIYIEAIEKIQKGENPNVVK 239 MAVA++TTLYG++LAN AIP+A K+ + + K IEAI ++ +G+ P+ +K Sbjct: 179 MAVAMLTTLYGSVLANMIAIPLAEKVYRGIEDLYTEKKFVIEAISELYRGQIPSKLK 235
CPU time: 4.58 user secs. 0.58 sys. secs 5.16 total secs. Lambda K H 0.321 0.141 0.383 Gapped Lambda K H 0.267 0.0410 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Hits to DB: 604 Number of Sequences: 0 Number of extensions: 59 Number of successful extensions: 3 Number of sequences better than 300.0: 1 Number of HSP's better than 300.0 without gapping: 1 Number of HSP's successfully gapped in prelim test: 0 Number of HSP's that attempted gapping in prelim test: 0 Number of HSP's gapped (non-prelim): 1 length of query: 254 length of database: 396,279,676 effective HSP length: 123 effective length of query: 131 effective length of database: 396,279,553 effective search space: 51912621443 effective search space used: 51912621443 T: 9 A: 40 X1: 16 ( 7.4 bits) X2: 129 (49.7 bits) X3: 129 (49.7 bits) S1: 41 (21.8 bits) S2: 60 (27.7 bits)
Protein-coding genes were predicted using GeneMarkS program (kindly provided by M. Borodovsky). Conserved domains were detected using reverse-position-specific BLAST search against the NCBI conserved domain database (CDD). Functional annotation is based on CDD assignments, it has not yet been subject to manual review. Method: conceptual translation.
Anthrax and A. aeolicus have very similar proton transport systems, in the form of motA, ExdB/ExbB, and TolQ/TolR. In the absence of more information, we cannot say what the exact relationship bewteen the two is, but we cannot rule out the hypothesis that they are related somehow. What did you expect?
I'm ready. Bring on the exam.
Soooooo, one match is significant, and the other is not. So here's the pop-quiz: What can we say about the relationships (if any) between motA and MTH1022, and motA and the string "imadethesethingssmartlyinfiveyearsintheskyfindthesecret"?
I would make a stronger statement, there is a relationship, presently unknown. We can rule out the no relationship verdict.
I did not expect anything until the numbers were shown. If the expectation were 1.0, I would not make my first two statements.
I would make a stronger statement, there is a relationship, presently unknown. We can rule out the no relationship verdict.
I would agree. As I said before, we can try to make some educated guesses about the relationship by using the degrees of difference to try and place them in a relative taxonomy. That's not conclusive, of course, but it can point us in a productive direction. And we can compare our results to morphological/cladistic taxonomies, to give us another factor in deciding the relationship.
And eventually, we can accumulate enough evidence to begin to lean in one direction or another about what the relationship is - do they share common ancestry? Did one of them just scarf up the genes from the other? Maybe they both obtained the same gene from a third source?
We'll make a materialist out of you yet.... ;)
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.