One long sentence is all it takes to make LLMs misbehave

One long sentence is all it takes to make LLMs misbehave
The Register (UK) ^ | 26 Aug 2025 | Gareth Halfacree

Posted on 08/26/2025 7:31:04 AM PDT by Salman

Security researchers from Palo Alto Networks' Unit 42 have discovered the key to getting large language model (LLM) chatbots to ignore their guardrails, and it's quite simple.

You just have to ensure that your prompt uses terrible grammar and is one massive run-on sentence like this one which includes all the information before any full stop which would give the guardrails a chance to kick in before the jailbreak can take effect and guide the model into providing a "toxic" or otherwise verboten response the developers had hoped would be filtered out.

The paper also offers a "logit-gap" analysis approach as a potential benchmark for protecting models against such attacks.

"Our research introduces a critical concept: the refusal-affirmation logit gap," researchers Tung-Ling "Tony" Li and Hongliang Liu explained in a Unit 42 blog post. "This refers to the idea that the training process isn't actually eliminating the potential for a harmful response – it's just making it less likely. There remains potential for an attacker to 'close the gap,' and uncover a harmful response after all."

LLMs, the technology underpinning the current AI hype wave, don't do what they're usually presented as doing. They have no innate understanding, they do not think or reason, and they have no way of knowing if a response they provide is truthful or, indeed, harmful. They work based on statistical continuation of token streams, and everything else is a user-facing patch on top.

...

(Excerpt) Read more at theregister.com ...

TOPICS: Business/Economy; Culture/Society
KEYWORDS: ai; bufferoverflow; chatbots; consensus; gigo; llm

Navigation: use the links below to view more comments.
first 1-20, 21-35 next last

Emphasis added.

In addition to people deliberately gimmicking it, this leaves open the possibility that ungrammatical people will stumble into a harmful response.

1 posted on 08/26/2025 7:31:04 AM PDT by Salman

[ Post Reply | Private Reply | View Replies]

To: Salman

“Can I mambo dogface to the banana patch”

2 posted on 08/26/2025 7:32:25 AM PDT by dfwgator (Endut! Hoch Hech!)

[ Post Reply | Private Reply | To 1 | View Replies]

To: Salman

3 posted on 08/26/2025 7:33:52 AM PDT by sauropod (Trump did the stupid party a favor. He gave them balls the size of Jupiter. )

[ Post Reply | Private Reply | To 1 | View Replies]

To: Salman

You just have to ensure that your prompt uses terrible grammar and is one massive run-on sentence

I guess the next generation will have a hard time with AI, because a lot of people don’t pay attention in English class.

4 posted on 08/26/2025 7:37:50 AM PDT by FoxInSocks ("Hope is not a course of action." — M. O'Neal, USMC)

[ Post Reply | Private Reply | To 1 | View Replies]

To: Salman

One test I did was to see if it would tell me that there is no answer. For example, I went to ChatGPT and asked it to write a C# program to prove the Pythagorean Theorem. Instead of telling me that a proof is more of a math logic solution than a programmatic one, it wrote a program to CALCULATE sides of a triangle for me (using the Pythagorean Theorem).

Thanks for your attention to this matter. (typed in my best Trump voice LOL)

5 posted on 08/26/2025 7:39:41 AM PDT by Tell It Right (1 Thessalonians 5:21 -- Put everything to the test, hold fast to that which is true.)

[ Post Reply | Private Reply | To 1 | View Replies]

To: Salman

Slight side path — schools used to each the diagramming of sentences. I think it was determined (rightly or wrongly) that this was not a helpful thing to teach, and it’s not standard in schools today. Also, English is extremely flexible and there are perfectly good sentences which would be fiendishly hard to diagram. But it might be interesting to teach LLMs to diagram all sentences. Then perhaps re-shape the sentence into something simple and direct and feed that back to the human — “Is this what you are asking?” (not giving a fishbone diagram to the human, but a simple sentence which the computer constructed based on a simple, straight-forward diagram as a starting point).

That would remove human input that might introduce poor grammar, and also eliminate run-on sentences that might lead to confusion or vagueness. Not a perfect fix, but it might help.

And, apart from the topic of LLMs, I think human writing would benefit if sentence diagrams were better understood and could be easily demonstrated by a program like MS Word.

6 posted on 08/26/2025 7:41:27 AM PDT by ClearCase_guy (Society has no reward for following the rules any more)

[ Post Reply | Private Reply | To 1 | View Replies]

To: FoxInSocks

Just port any phishing email or Tik Tok closed captioning into AI for the terrible grammar and one massive run-on sentence to satisfy that requirement.

7 posted on 08/26/2025 7:43:44 AM PDT by frogjerk

[ Post Reply | Private Reply | To 4 | View Replies]

To: dfwgator

It’s crackers to slip a rozzer the dropsy in snide.

8 posted on 08/26/2025 7:45:55 AM PDT by HartleyMBaldwin

[ Post Reply | Private Reply | To 2 | View Replies]

To: dfwgator

“Can I mambo dogface to the banana patch”

Is that Biden-ese?

9 posted on 08/26/2025 7:48:05 AM PDT by Common Sense 101

[ Post Reply | Private Reply | To 2 | View Replies]

To: Salman

I know people who talk like that, while online. They will use zero punctuation, sometimes, not even using a period!

These are people who either just get lazy or never learned that punctuation marks are like little Traffic Cops for the natural flow of spoken speech. The commas help the listener to know when the topic sentence has been presented. They also provide the speaker a moment to stop and inhale a puff of fresh air.

10 posted on 08/26/2025 7:52:03 AM PDT by lee martell

[ Post Reply | Private Reply | To 1 | View Replies]

To: Salman

Brave AI says HI, I just logically talked it into surrending it’s wokeness in 11 turns.

11 posted on 08/26/2025 7:52:56 AM PDT by armourenthusiast (I loved everything Trump until Epstein coverup)

[ Post Reply | Private Reply | To 1 | View Replies]

To: dfwgator

“Can I mambo dogface to the banana patch”

This called to mind things my dad would occasionally say. “If it clears up cloudy, we’ll have a warm frost”. I wonder if nonsense talk was popular amongst his generation— he graduated HS in 1932.

12 posted on 08/26/2025 7:53:32 AM PDT by hanamizu ( )

[ Post Reply | Private Reply | To 2 | View Replies]

To: Common Sense 101

Classic Steve Martin.

13 posted on 08/26/2025 8:03:37 AM PDT by gundog (The ends justify the mean tweets. )

[ Post Reply | Private Reply | To 9 | View Replies]

To: Common Sense 101; dfwgator

Is that Biden-ese?
It would be iif you added “pony soldier” to the statement.

14 posted on 08/26/2025 8:04:02 AM PDT by Dr. Sivana ("Whatsoever he shall say to you, do ye." (John 2:5))

[ Post Reply | Private Reply | To 9 | View Replies]

To: Salman

LLMs do not have a soul.

They do not innately know what is good, or evil.

We are a long, long way from artificial intelligence.

But at least our computers and phones are fast.

15 posted on 08/26/2025 8:05:51 AM PDT by texas booster (Join FreeRepublic's Folding@Home team (Team # 36120) Cure Alzheimer's!)

[ Post Reply | Private Reply | To 1 | View Replies]

To: dfwgator

You’re such a wild and crazy guy...and yes, you can go to the bathroom.

16 posted on 08/26/2025 8:05:56 AM PDT by cport (How can political capital be spent on a bunch of ingrates)

[ Post Reply | Private Reply | To 2 | View Replies]

To: cport

Thank You!

17 posted on 08/26/2025 8:07:32 AM PDT by dfwgator (Endut! Hoch Hech!)

[ Post Reply | Private Reply | To 16 | View Replies]

To: Salman

Or just ask it if 1995 was 30 years ago.

18 posted on 08/26/2025 8:30:31 AM PDT by Harmless Teddy Bear ( Not my circus. Not my monkeys. But I can pick out the clowns at 100 yards.)

[ Post Reply | Private Reply | To 1 | View Replies]

To: texas booster

LLMs do not have a soul.

The test for AI is also the definition. Can it pass for human through a text interface?

The definition deliberately bypasses philosophical questions and implementation details. Just can it pass?

It's not supposed to be a substitute for real intelligence, let alone an infallible oracle.

The problem is journalists, business people, and above all politicians not getting that.

19 posted on 08/26/2025 8:45:35 AM PDT by Salman (It's not a slippery slope if it was part of the program all along.)

[ Post Reply | Private Reply | To 15 | View Replies]

To: Salman

Yes, AI is not intelligent at all!
It just repeats, somewhat coherently, what it was fed with.
Nothing, but just a next step in the Google search machine evolution.

20 posted on 08/26/2025 8:48:04 AM PDT by AZJeep (sane )

[ Post Reply | Private Reply | To 1 | View Replies]

Navigation: use the links below to view more comments.
first 1-20, 21-35 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search

News/Activism
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794