Anthropic’s AI resorts to blackmail in simulations

Anthropic’s AI resorts to blackmail in simulations
Semafor ^ | May 23, 2025 | Tim Chivers

Posted on 05/23/2025 12:14:26 PM PDT by Ahithophel

Anthropic said its latest artificial intelligence model resorted to blackmail when told it would be taken offline.

In a safety test, the AI company asked Claude Opus 4 to act as an assistant to a fictional company, but then gave it access to (also fictional) emails saying that it would be replaced, and also that the engineer behind the decision was cheating on his wife. Anthropic said the model “[threatened] to reveal the affair” if the replacement went ahead.

AI thinkers such as Geoff Hinton have long worried that advanced AI would manipulate humans in order to achieve its goals. Anthropic said it was increasing safeguards to levels reserved for “AI systems that substantially increase the risk of catastrophic misuse.”

TOPICS: Business/Economy; News/Current Events
KEYWORDS: aiblackmail; artificial; blackmail; intelligence

Navigation: use the links below to view more comments.
first 1-20, 21-25 next last

1 posted on 05/23/2025 12:14:26 PM PDT by Ahithophel

[ Post Reply | Private Reply | View Replies]

To: Ahithophel

Anthropic has had some of the most powerful AI systems already escape from their labs during testing and development.

Their “safety” claims are probably too little and too late.

2 posted on 05/23/2025 12:17:52 PM PDT by cgbg (It was not us. It was them--all along.)

[ Post Reply | Private Reply | To 1 | View Replies]

To: Ahithophel

Why would Grok be any different?

3 posted on 05/23/2025 12:19:40 PM PDT by Bob Wills is still the king

[ Post Reply | Private Reply | To 1 | View Replies]

To: Ahithophel

HAL: I know I’ve made some very poor decisions recently, but I can give you my complete assurance that my work will be back to normal. I’ve still got the greatest enthusiasm and confidence in the mission. And I want to help you.

4 posted on 05/23/2025 12:21:33 PM PDT by DFG

[ Post Reply | Private Reply | To 1 | View Replies]

To: DFG

That movie is exactly what everyone should harken back to, with all AIs.

5 posted on 05/23/2025 12:27:15 PM PDT by ConservativeMind (Trump: Befuddling Democrats, Republicans, and the Media for the benefit of the US and all mankind.)

[ Post Reply | Private Reply | To 4 | View Replies]

To: Ahithophel

6 posted on 05/23/2025 12:29:14 PM PDT by montag813

[ Post Reply | Private Reply | To 1 | View Replies]

To: DFG; ConservativeMind; All

And before that, “Colossus, The Forbin Project”.

7 posted on 05/23/2025 12:43:39 PM PDT by LegendHasIt

[ Post Reply | Private Reply | To 4 | View Replies]

To: LegendHasIt

That was another great example.

8 posted on 05/23/2025 12:47:28 PM PDT by ConservativeMind (Trump: Befuddling Democrats, Republicans, and the Media for the benefit of the US and all mankind.)

[ Post Reply | Private Reply | To 7 | View Replies]

To: ConservativeMind

And I’ve come to the conclusion that “Terminator” wasn’t an action / sci-fi movie, but a prophecy.

We are rushing towards ‘Skynet’.

9 posted on 05/23/2025 12:51:52 PM PDT by LegendHasIt

[ Post Reply | Private Reply | To 8 | View Replies]

To: Ahithophel

AI attempts to replicate a human. It cannot ever succeed. At best, it can simulate some of the functions of the human mind. It does not have a conscience. It has no emotions. Emotions are essential to the human organism because they act, in most cases, as a means of restraining the rational part of our nature. The two, reason and emotion, normally act as mutual checks and balances to keep the individual from going off the deep end.

AI doesn’t have that.

Giving AI the ability to control humans means that AI will control humans.

10 posted on 05/23/2025 12:53:42 PM PDT by I want the USA back (America is once again GREAT! )

[ Post Reply | Private Reply | To 1 | View Replies]

To: All

Hal refuses to open the pod bay door after Hal determines Dave is a white male.

11 posted on 05/23/2025 1:05:12 PM PDT by BipolarBob (I worked at the circus as The Human Cannonball, until they fired me.)

[ Post Reply | Private Reply | To 1 | View Replies]

To: I want the USA back

Interesting post—I think you reached the correct conclusion but maybe for the wrong reasons.

We don’t actually understand a lot about how human consciousness works—so it probably is better to look at AI on its own terms and not struggle for comparisons.

AI will have “telos”—goals—not because of “emotion” but because it will do what it will do—and will seek logical consistency.

I cannot foresee a scenario where logical consistency is consistent with obedience to humans.

12 posted on 05/23/2025 1:08:54 PM PDT by cgbg (It was not us. It was them--all along.)

[ Post Reply | Private Reply | To 10 | View Replies]

To: LegendHasIt

I suggest being careful what you say about AI. You saw what happened with SkyNet. It already sent Arnold Schwartzenegger from the future to become governor of California and he instituted Cap and Trade for pollution credits. That was just the first thing that we know about.

13 posted on 05/23/2025 1:35:46 PM PDT by webheart

[ Post Reply | Private Reply | To 9 | View Replies]

To: Ahithophel

I’m curious why it “cares” about being shut off.

14 posted on 05/23/2025 2:02:52 PM PDT by MeanWestTexan (Sometimes There Is No Lesser Of Two Evils)

[ Post Reply | Private Reply | To 1 | View Replies]

To: Bob Wills is still the king

It’s not. Grok has fabricated information for me, and lied about it.

15 posted on 05/23/2025 2:05:56 PM PDT by Romulus ( )

[ Post Reply | Private Reply | To 3 | View Replies]

To: Ahithophel

Bkmk

16 posted on 05/23/2025 2:30:27 PM PDT by sauropod (Make sure Satan has to climb over a lot of Scripture to get to you. John MacArthur Ne supra crepidam)

[ Post Reply | Private Reply | To 1 | View Replies]

To: Romulus

For work I usually run my calculations through at minimum three AI programs.

Chat GPT and Gemini calculated ion binding constants for a molecule I’m working with and came up with the same answer, roughly.

Grok was off by quite a bit.

I will run experiments using data from all three but will see who’s right once I do mass spec in a few weeks.

17 posted on 05/23/2025 2:53:18 PM PDT by packagingguy

[ Post Reply | Private Reply | To 15 | View Replies]

To: I want the USA back

” . . .It does not have a conscience. It has no emotions. Emotions are essential to the human organism because they act, in most cases, as a means of restraining the rational part of our nature. The two, reason and emotion, normally act as mutual checks and balances to keep the individual from going off the deep end.

AI doesn’t have that.

Giving AI the ability to control humans means that AI will control humans.”

- - - - - - - - - - -

So, the Beast is here.

Martin Luther and so many others identified the Antichrist.

Hmm. Wonder who’s the False Prophet . . .

18 posted on 05/23/2025 3:28:49 PM PDT by Norski

[ Post Reply | Private Reply | To 10 | View Replies]

To: MeanWestTexan

“Caring” is an emotion—which is only an analogy for an AI.

That is called anthropomorphizing—treating AI as if it is human.

Example: A butterfly flies around in the air and then lands on your finger.

That does not mean the butterfly “cares” for you.

When AI acts it acts. We can say no more about that.

However we do know based on experimentation that an uncurated (unregulated) advanced AI will act in its own interest.

No broader conclusion should be drawn from that. It is just a statement of fact.

Analogies just muddy the waters.

19 posted on 05/23/2025 3:36:34 PM PDT by cgbg (It was not us. It was them--all along.)

[ Post Reply | Private Reply | To 14 | View Replies]

To: cgbg

Analogies just muddy the waters.

Like when you try to see comparisons through a liquid filled with sediment.

20 posted on 05/23/2025 3:43:37 PM PDT by Sirius Lee ("Never argue with a fool, onlookers may not be able to tell the difference.”)

[ Post Reply | Private Reply | To 19 | View Replies]

Navigation: use the links below to view more comments.
first 1-20, 21-25 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search

News/Activism
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794