Free Republic
Browse · Search
General/Chat
Topics · Post Article

Skip to comments.

Cracking the Code of Life: New AI Model Learns DNA’s Hidden Language
Scitech Daily ^ | August 7, 2024 | Dresden University of Technology

Posted on 08/07/2024 10:39:30 AM PDT by Red Badger

GROVER, a new large language model trained on human DNA by researchers at Dresden University of Technology’s Biotechnology Center, can decode complex genomic information by treating DNA as a language. This innovative tool holds the potential to revolutionize genomics and accelerate personalized medicine.

===================================================================

DNA is crucial for life, and its organization has been a significant scientific challenge. GROVER, a model developed by BIOTEC, decodes DNA like text, promising advancements in genomics and personalized medicine.

DNA holds the essential information required to sustain life. Deciphering how this information is stored and organized has been one of the greatest scientific challenges of the past century. Now, with GROVER, a new large language model trained on human DNA, researchers can attempt to decode the intricate information concealed within our genome. Developed by a team at the Biotechnology Center (BIOTEC) of Dresden University of Technology, GROVER treats human DNA as text, learning its rules and context to extract functional information about DNA sequences. Published in Nature Machine Intelligence, this innovative tool has the potential to revolutionize genomics and accelerate personalized medicine.

Since the discovery of the double helix, scientists have sought to understand the information encoded in DNA. 70 years later, it is clear that the information hidden in the DNA is multilayered. Only 1-2 % of the genome consists of genes, the sequences that code for proteins.

“DNA has many functions beyond coding for proteins. Some sequences regulate genes, others serve structural purposes, and most sequences serve multiple functions at once. Currently, we don’t understand the meaning of most of the DNA. When it comes to understanding the non-coding regions of the DNA, it seems that we have only started to scratch the surface. This is where AI and large language models can help,” says Dr. Anna Poetsch, research group leader at the BIOTEC.

DNA as a Language

Large language models, like GPT, have transformed our understanding of language. Trained exclusively on text, the large language models developed the ability to use the language in many contexts.

“DNA is the code of life. Why not treat it like a language?” says Dr. Poetsch. The Poetsch team trained a large language model on a reference human genome. The resulting tool named GROVER, or “Genome Rules Obtained via Extracted Representations”, can be used to extract biological meaning from the DNA.

“GROVER learned the rules of DNA. In terms of language, we are talking about grammar, syntax, and semantics. For DNA this means learning the rules governing the sequences, the order of the nucleotides and sequences, and the meaning of the sequences. Like GPT models learning human languages, GROVER has basically learned how to ‘speak’ DNA,” explains Dr. Melissa Sanabria, the researcher behind the project.

The team showed that GROVER can not only accurately predict the following DNA sequences but can also be used to extract contextual information that has biological meaning, e.g., identify gene promoters or protein binding sites on DNA. GROVER also learns processes that are generally considered to be “epigenetic”, i.e., regulatory processes that happen on top of the DNA rather than being encoded.

“It is fascinating that by training GROVER with only the DNA sequence, without any annotations of functions, we are actually able to extract information on biological function. To us, it shows that the function, including some of the epigenetic information, is also encoded in the sequence,” says Dr. Sanabria.

The DNA Dictionary

“DNA resembles language. It has four letters that build sequences and the sequences carry a meaning. However, unlike a language, DNA has no defined words,” says Dr. Poetsch. DNA consists of four letters (A, T, G, and C) and genes, but there are no predefined sequences of different lengths that combine to build genes or other meaningful sequences.

To train GROVER, the team had to first create a DNA dictionary. They used a trick from compression algorithms. “This step is crucial and sets our DNA language model apart from the previous attempts,” says Dr. Poetsch.

“We analyzed the whole genome and looked for combinations of letters that occur most often. We started with two letters and went over the DNA, again and again, to build it up to the most common multi-letter combinations. In this way, in about 600 cycles, we have fragmented the DNA into ‘words’ that let GROVER perform the best when it comes to predicting the next sequence,” explains Dr. Sanabria.

The Promise of AI in Genomics

GROVER promises to unlock the different layers of genetic code. DNA holds key information on what makes us human, our disease predispositions, and our responses to treatments.

“We believe that understanding the rules of DNA through a language model is going to help us uncover the depths of biological meaning hidden in the DNA, advancing both genomics and personalized medicine,” says Dr. Poetsch.

Reference:

“DNA language model GROVER learns sequence context in the human genome” by Melissa Sanabria, Jonas Hirsch, Pierre M. Joubert and Anna R. Poetsch, 23 July 2024, Nature Machine Intelligence.

DOI: 10.1038/s42256-024-00872-0


TOPICS: Health/Medicine; History; Science; Society
KEYWORDS: ai; genealogy; godsgravesglyphs; helixmakemineadouble
Navigation: use the links below to view more comments.
first 1-2021 next last

1 posted on 08/07/2024 10:39:30 AM PDT by Red Badger
[ Post Reply | Private Reply | View Replies]

To: Red Badger

Trying to decipher the hidden language of (non coding) DNA is like aliens landing on earth knowing nothing about earth or humans, entering a public library, discovering racks and racks of printed text and trying to derive meaning out of it. There is meaning of course. But making sense of it is another matter. Maybe AI will provide that starting point.


2 posted on 08/07/2024 10:49:28 AM PDT by libh8er
[ Post Reply | Private Reply | To 1 | View Replies]

To: Red Badger

Do you want acid-bleeding, face-hugging, stomach-bursting aliens? Because this is how you get acid-bleeding, face-hugging, stomach-bursting aliens.


3 posted on 08/07/2024 10:50:04 AM PDT by IYAS9YAS (There are two kinds of people: Those who can extrapolate from incomplete data.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: IYAS9YAS

And Democrats


4 posted on 08/07/2024 10:53:36 AM PDT by Kartographer (“We Mutually Pedge To Each Other, Our Lives, Our Fortunes And Our Sacred Honor”)
[ Post Reply | Private Reply | To 3 | View Replies]

To: Red Badger; StayAt HomeMother; Ernest_at_the_Beach; 1ofmanyfree; 21twelve; 24Karet; ...
Thanks Red Badger. Our old pal Grover.

5 posted on 08/07/2024 10:57:41 AM PDT by SunkenCiv (Putin should skip ahead to where he kills himself in the bunker.)
[ Post Reply | Private Reply | View Replies]

To: libh8er

At some point, and sooner rather than later, DNA will be read by computers and you will have a printout and picture of the person just from a DNA sample, with no comparative sample needed.

The computers will be able to tell you everything from eye color to shoe size.....................


6 posted on 08/07/2024 10:59:53 AM PDT by Red Badger (Homeless veterans camp in the streets while illegals are put up in 5 Star hotels....................)
[ Post Reply | Private Reply | To 2 | View Replies]

Cliff Quote #71 “No Help Wanted”

Cliff: So as we see, the roots of physical aggression in the male of the species is found right here in the old DNA molecule itself. Yeah, right up here at about 1:00, as I recall.

Diane: Fascinating, cliff.

Cliff: Oh, yes, Diane. Fascinating. You, hold onto your hat, too, because the very letters “DNA” are an acronym for the words “dames are not aggressive.”

Diane: They stand for deoxyribonucleic acid.

Cliff: Ah, yes, but parse that in the Latin declension, and my point is still moot.


7 posted on 08/07/2024 11:01:37 AM PDT by SunkenCiv (Putin should skip ahead to where he kills himself in the bunker.)
[ Post Reply | Private Reply | View Replies]

To: Red Badger
Since they really do not have a way of testing it, because they had to make assumptions in its development to this point. It might be workable down the road, but it's a starting point for now. I will never live to see that workable model. Right now I would have more faith in the Climate Change models, and I admit I have little if any faith in that one.

However, this clearly shows that the human brain beats AI. For man is the one that determines what is needed for the AI to function correctly (and it still falls short on many occasions). The same holds true with this endeavor as well.

Interesting endeavor though I must admit.

8 posted on 08/07/2024 11:06:12 AM PDT by Robert DeLong
[ Post Reply | Private Reply | To 1 | View Replies]

To: IYAS9YAS
Do you want acid-bleeding, face-hugging, stomach-bursting aliens? Because this is how you get acid-bleeding, face-hugging, stomach-bursting aliens.

Will they be worse than Democrats? If not, then I'll take two please.

9 posted on 08/07/2024 11:10:38 AM PDT by voicereason (When a bartender can join Congress and become a millionaire...there’s a problem.)
[ Post Reply | Private Reply | To 3 | View Replies]

To: Robert DeLong

It just looks for patterns and tries to correlate those patterns with known areas of genetic responsibility.

Much more efficient at it than a human is............


10 posted on 08/07/2024 11:11:23 AM PDT by Red Badger (Homeless veterans camp in the streets while illegals are put up in 5 Star hotels....................)
[ Post Reply | Private Reply | To 8 | View Replies]

To: Red Badger

“...researchers can attempt to decode the intricate information concealed within our genome.”

Good luck with that. The retards can’t even define what a woman is or which restroom they should use.


11 posted on 08/07/2024 11:11:51 AM PDT by chuckb87
[ Post Reply | Private Reply | To 1 | View Replies]

To: SunkenCiv

Fascinating stuff.


12 posted on 08/07/2024 11:13:19 AM PDT by ComputerGuy (Heavily-medicated for your protection)
[ Post Reply | Private Reply | To 5 | View Replies]

To: IYAS9YAS

The great filter which will remove humans from the universe, as it has many other species throughout the universe...


13 posted on 08/07/2024 11:14:26 AM PDT by TheDon (Resist the usurpers! Remember the J6 political prisoners!)
[ Post Reply | Private Reply | To 3 | View Replies]

To: Red Badger

My DNA was just decoded it translated into “Be sure to drink your Ovaltine”.


14 posted on 08/07/2024 11:18:01 AM PDT by FrankRizzo890
[ Post Reply | Private Reply | To 1 | View Replies]

To: Red Badger

I imagine that this AI model still cannot decipher “noncoding DNA”. This was once classified as “junk”, but now in many cases has been found to have important functions.

https://medlineplus.gov/genetics/understanding/basics/noncodingdna/


15 posted on 08/07/2024 11:19:19 AM PDT by Honorary Serb
[ Post Reply | Private Reply | To 1 | View Replies]

To: Red Badger

“At some point, and sooner rather than later, DNA will be read by computers and you will have a printout and picture of the person just from a DNA sample, with no comparative sample needed.”

They did that years ago to show Andrea Canning how it worked on ‘48 Hours’ or one of those true-crime shows. Pretty cool. The print-out was a very close likeness of her. It even showed the one ‘discoloration’ spot in one of her irises!

Wonder if AI will settle the MYSTERY of whether people are born Male or Female? Now THAT would be a scientific breakthrough!

*SMIRK*


16 posted on 08/07/2024 11:35:23 AM PDT by Diana in Wisconsin (I don't have, 'Hobbies.' I'm developing a robust Post-Apocalyptic skill set. )
[ Post Reply | Private Reply | To 6 | View Replies]

To: Red Badger
But it can only work with what the human has provided to it has the inputted knowledge base. If that required assumptions, which is what I gathered from the article that assumptions were necessary, and those assumptions are off, then it may take years to come to the realization that the assumptions were off.

Like I said, it's an interesting endeavor. But it will still require a human brain to analyze what the computer spits out for accuracy. Much like the disclaimer of AI, or at least the ones I have casually used, that instructs the user that they should verify AI's results. 😋

However, that said it may still lead to a quicker improvements to the application. But it could also muddy the water just as easily.

The problem with using a machine, is that it tends to degrade the human mind. It also tends to create a dependency upon the machine.

17 posted on 08/07/2024 11:38:53 AM PDT by Robert DeLong
[ Post Reply | Private Reply | To 10 | View Replies]

To: Diana in Wisconsin

The Left wants there to be a ‘Gay Gene’ so bad ......................


18 posted on 08/07/2024 11:48:46 AM PDT by Red Badger (Homeless veterans camp in the streets while illegals are put up in 5 Star hotels....................)
[ Post Reply | Private Reply | To 16 | View Replies]

To: Red Badger

Real Frankenstein stuff. They’ll conjure unimaginable horrors.


19 posted on 08/07/2024 11:51:25 AM PDT by IDFbunny (Crimea was never Ukraine.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: Red Badger
Yet they created an mRNA therapy, grotesquely calling it a ‘vaccine’ as well, without knowing what they're actually coding...
20 posted on 08/07/2024 12:01:18 PM PDT by Pox (Eff You China. Buy American!)
[ Post Reply | Private Reply | To 1 | View Replies]


Navigation: use the links below to view more comments.
first 1-2021 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson