Posted on 08/07/2024 10:39:30 AM PDT by Red Badger
GROVER, a new large language model trained on human DNA by researchers at Dresden University of Technology’s Biotechnology Center, can decode complex genomic information by treating DNA as a language. This innovative tool holds the potential to revolutionize genomics and accelerate personalized medicine.
===================================================================
DNA is crucial for life, and its organization has been a significant scientific challenge. GROVER, a model developed by BIOTEC, decodes DNA like text, promising advancements in genomics and personalized medicine.
DNA holds the essential information required to sustain life. Deciphering how this information is stored and organized has been one of the greatest scientific challenges of the past century. Now, with GROVER, a new large language model trained on human DNA, researchers can attempt to decode the intricate information concealed within our genome. Developed by a team at the Biotechnology Center (BIOTEC) of Dresden University of Technology, GROVER treats human DNA as text, learning its rules and context to extract functional information about DNA sequences. Published in Nature Machine Intelligence, this innovative tool has the potential to revolutionize genomics and accelerate personalized medicine.
Since the discovery of the double helix, scientists have sought to understand the information encoded in DNA. 70 years later, it is clear that the information hidden in the DNA is multilayered. Only 1-2 % of the genome consists of genes, the sequences that code for proteins.
“DNA has many functions beyond coding for proteins. Some sequences regulate genes, others serve structural purposes, and most sequences serve multiple functions at once. Currently, we don’t understand the meaning of most of the DNA. When it comes to understanding the non-coding regions of the DNA, it seems that we have only started to scratch the surface. This is where AI and large language models can help,” says Dr. Anna Poetsch, research group leader at the BIOTEC.
DNA as a Language
Large language models, like GPT, have transformed our understanding of language. Trained exclusively on text, the large language models developed the ability to use the language in many contexts.
“DNA is the code of life. Why not treat it like a language?” says Dr. Poetsch. The Poetsch team trained a large language model on a reference human genome. The resulting tool named GROVER, or “Genome Rules Obtained via Extracted Representations”, can be used to extract biological meaning from the DNA.
“GROVER learned the rules of DNA. In terms of language, we are talking about grammar, syntax, and semantics. For DNA this means learning the rules governing the sequences, the order of the nucleotides and sequences, and the meaning of the sequences. Like GPT models learning human languages, GROVER has basically learned how to ‘speak’ DNA,” explains Dr. Melissa Sanabria, the researcher behind the project.
The team showed that GROVER can not only accurately predict the following DNA sequences but can also be used to extract contextual information that has biological meaning, e.g., identify gene promoters or protein binding sites on DNA. GROVER also learns processes that are generally considered to be “epigenetic”, i.e., regulatory processes that happen on top of the DNA rather than being encoded.
“It is fascinating that by training GROVER with only the DNA sequence, without any annotations of functions, we are actually able to extract information on biological function. To us, it shows that the function, including some of the epigenetic information, is also encoded in the sequence,” says Dr. Sanabria.
The DNA Dictionary
“DNA resembles language. It has four letters that build sequences and the sequences carry a meaning. However, unlike a language, DNA has no defined words,” says Dr. Poetsch. DNA consists of four letters (A, T, G, and C) and genes, but there are no predefined sequences of different lengths that combine to build genes or other meaningful sequences.
To train GROVER, the team had to first create a DNA dictionary. They used a trick from compression algorithms. “This step is crucial and sets our DNA language model apart from the previous attempts,” says Dr. Poetsch.
“We analyzed the whole genome and looked for combinations of letters that occur most often. We started with two letters and went over the DNA, again and again, to build it up to the most common multi-letter combinations. In this way, in about 600 cycles, we have fragmented the DNA into ‘words’ that let GROVER perform the best when it comes to predicting the next sequence,” explains Dr. Sanabria.
The Promise of AI in Genomics
GROVER promises to unlock the different layers of genetic code. DNA holds key information on what makes us human, our disease predispositions, and our responses to treatments.
“We believe that understanding the rules of DNA through a language model is going to help us uncover the depths of biological meaning hidden in the DNA, advancing both genomics and personalized medicine,” says Dr. Poetsch.
Reference:
“DNA language model GROVER learns sequence context in the human genome” by Melissa Sanabria, Jonas Hirsch, Pierre M. Joubert and Anna R. Poetsch, 23 July 2024, Nature Machine Intelligence.
DOI: 10.1038/s42256-024-00872-0
Trying to decipher the hidden language of (non coding) DNA is like aliens landing on earth knowing nothing about earth or humans, entering a public library, discovering racks and racks of printed text and trying to derive meaning out of it. There is meaning of course. But making sense of it is another matter. Maybe AI will provide that starting point.
Do you want acid-bleeding, face-hugging, stomach-bursting aliens? Because this is how you get acid-bleeding, face-hugging, stomach-bursting aliens.
And Democrats
Thanks Red Badger. Our old pal Grover.
At some point, and sooner rather than later, DNA will be read by computers and you will have a printout and picture of the person just from a DNA sample, with no comparative sample needed.
The computers will be able to tell you everything from eye color to shoe size.....................
Cliff Quote #71 “No Help Wanted”
Cliff: So as we see, the roots of physical aggression in the male of the species is found right here in the old DNA molecule itself. Yeah, right up here at about 1:00, as I recall.
Diane: Fascinating, cliff.
Cliff: Oh, yes, Diane. Fascinating. You, hold onto your hat, too, because the very letters “DNA” are an acronym for the words “dames are not aggressive.”
Diane: They stand for deoxyribonucleic acid.
Cliff: Ah, yes, but parse that in the Latin declension, and my point is still moot.
However, this clearly shows that the human brain beats AI. For man is the one that determines what is needed for the AI to function correctly (and it still falls short on many occasions). The same holds true with this endeavor as well.
Interesting endeavor though I must admit.
Will they be worse than Democrats? If not, then I'll take two please.
It just looks for patterns and tries to correlate those patterns with known areas of genetic responsibility.
Much more efficient at it than a human is............
“...researchers can attempt to decode the intricate information concealed within our genome.”
Good luck with that. The retards can’t even define what a woman is or which restroom they should use.
Fascinating stuff.
The great filter which will remove humans from the universe, as it has many other species throughout the universe...
My DNA was just decoded it translated into “Be sure to drink your Ovaltine”.
I imagine that this AI model still cannot decipher “noncoding DNA”. This was once classified as “junk”, but now in many cases has been found to have important functions.
https://medlineplus.gov/genetics/understanding/basics/noncodingdna/
“At some point, and sooner rather than later, DNA will be read by computers and you will have a printout and picture of the person just from a DNA sample, with no comparative sample needed.”
They did that years ago to show Andrea Canning how it worked on ‘48 Hours’ or one of those true-crime shows. Pretty cool. The print-out was a very close likeness of her. It even showed the one ‘discoloration’ spot in one of her irises!
Wonder if AI will settle the MYSTERY of whether people are born Male or Female? Now THAT would be a scientific breakthrough!
*SMIRK*
Like I said, it's an interesting endeavor. But it will still require a human brain to analyze what the computer spits out for accuracy. Much like the disclaimer of AI, or at least the ones I have casually used, that instructs the user that they should verify AI's results. 😋
However, that said it may still lead to a quicker improvements to the application. But it could also muddy the water just as easily.
The problem with using a machine, is that it tends to degrade the human mind. It also tends to create a dependency upon the machine.
The Left wants there to be a ‘Gay Gene’ so bad ......................
Real Frankenstein stuff. They’ll conjure unimaginable horrors.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.