Posted on 08/26/2025 7:31:04 AM PDT by Salman
Security researchers from Palo Alto Networks' Unit 42 have discovered the key to getting large language model (LLM) chatbots to ignore their guardrails, and it's quite simple.
You just have to ensure that your prompt uses terrible grammar and is one massive run-on sentence like this one which includes all the information before any full stop which would give the guardrails a chance to kick in before the jailbreak can take effect and guide the model into providing a "toxic" or otherwise verboten response the developers had hoped would be filtered out.
The paper also offers a "logit-gap" analysis approach as a potential benchmark for protecting models against such attacks.
"Our research introduces a critical concept: the refusal-affirmation logit gap," researchers Tung-Ling "Tony" Li and Hongliang Liu explained in a Unit 42 blog post. "This refers to the idea that the training process isn't actually eliminating the potential for a harmful response – it's just making it less likely. There remains potential for an attacker to 'close the gap,' and uncover a harmful response after all."
LLMs, the technology underpinning the current AI hype wave, don't do what they're usually presented as doing. They have no innate understanding, they do not think or reason, and they have no way of knowing if a response they provide is truthful or, indeed, harmful. They work based on statistical continuation of token streams, and everything else is a user-facing patch on top.
...
(Excerpt) Read more at theregister.com ...
In addition to people deliberately gimmicking it, this leaves open the possibility that ungrammatical people will stumble into a harmful response.
“Can I mambo dogface to the banana patch”
.
I guess the next generation will have a hard time with AI, because a lot of people don’t pay attention in English class.
Thanks for your attention to this matter. (typed in my best Trump voice LOL)
Slight side path — schools used to each the diagramming of sentences. I think it was determined (rightly or wrongly) that this was not a helpful thing to teach, and it’s not standard in schools today. Also, English is extremely flexible and there are perfectly good sentences which would be fiendishly hard to diagram. But it might be interesting to teach LLMs to diagram all sentences. Then perhaps re-shape the sentence into something simple and direct and feed that back to the human — “Is this what you are asking?” (not giving a fishbone diagram to the human, but a simple sentence which the computer constructed based on a simple, straight-forward diagram as a starting point).
That would remove human input that might introduce poor grammar, and also eliminate run-on sentences that might lead to confusion or vagueness. Not a perfect fix, but it might help.
And, apart from the topic of LLMs, I think human writing would benefit if sentence diagrams were better understood and could be easily demonstrated by a program like MS Word.
Just port any phishing email or Tik Tok closed captioning into AI for the terrible grammar and one massive run-on sentence to satisfy that requirement.
It’s crackers to slip a rozzer the dropsy in snide.
Is that Biden-ese?
I know people who talk like that, while online. They will use zero punctuation, sometimes, not even using a period!
These are people who either just get lazy or never learned that punctuation marks are like little Traffic Cops for the natural flow of spoken speech. The commas help the listener to know when the topic sentence has been presented. They also provide the speaker a moment to stop and inhale a puff of fresh air.
Brave AI says HI, I just logically talked it into surrending it’s wokeness in 11 turns.
“Can I mambo dogface to the banana patch”
Classic Steve Martin.
LLMs do not have a soul.
They do not innately know what is good, or evil.
We are a long, long way from artificial intelligence.
But at least our computers and phones are fast.
You’re such a wild and crazy guy...and yes, you can go to the bathroom.
Thank You!
Or just ask it if 1995 was 30 years ago.
The test for AI is also the definition. Can it pass for human through a text interface?
The definition deliberately bypasses philosophical questions and implementation details. Just can it pass?
It's not supposed to be a substitute for real intelligence, let alone an infallible oracle.
The problem is journalists, business people, and above all politicians not getting that.
Yes, AI is not intelligent at all!
It just repeats, somewhat coherently, what it was fed with.
Nothing, but just a next step in the Google search machine evolution.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.