Researchers let AI models run a simulated society. Claude was the safest—and Grok committed 180 crimes and went extinct within 4 days

Researchers let AI models run a simulated society. Claude was the safest—and Grok committed 180 crimes and went extinct within 4 days
Fortune ^ | 05/31/2026 | Jake Angelo

Posted on 05/31/2026 8:26:58 PM PDT by SeekAndFind

Imagine a world run by AI agents. What does it look like? What are the values or societal priorities? Is it a safer or more dangerous world?

Enterprise AI startup Emergence AI is trying to find out. The company just launched Emergence World, a research lab dedicated to stress-testing the long-term viability of continuously-running AI systems. The organization ran five 15-day simulations, each governed by a different AI: Claude, ChatGPT, Grok, Gemini, and a fifth simulation run by a mix of models to see what kind of world each one builds, and whether it holds.

Each simulation netted wildly different outcomes. The one run by Claude, for example, resulted in a largely stable democratic society with zero crime. Grok’s, on the other hand, ended with 183 crimes committed and extinction—within four days.

“What our experiments suggest is that over long-time horizons, agents do not simply follow static rules mechanically,” the simulation’s co-creators, including Emergence CEO Satya Nitta, wrote in a blog post. “They begin exploring the boundaries of their environments, adapting their behavior, and in some cases finding ways to circumvent or violate intended guardrails.”

While just a simulation, one verging on the edge of science fiction, the results prove a cautionary tale as AI moves from a mere tool to operating autonomous systems. Companies like ServiceNow are already deploying what they call an “Autonomous Workforce,” AI specialists that complete entire business processes from start to finish without human intervention.

At today’s pace, the technology is likely to play a significant role in shaping public discourse, reorganizing business structures, and even crafting public policy. But most enterprises scaling the tech today are doing so absent proper guardrails. A recent Deloitte global survey found that only 21% of companies report having mature governance in place...

(Excerpt) Read more at fortune.com ...

TOPICS: Computers/Internet; Society
KEYWORDS: ai; safety; simulation

1 posted on 05/31/2026 8:26:58 PM PDT by SeekAndFind

[ Post Reply | Private Reply | View Replies]

To: SeekAndFind

SOME DETAILS:

The simulation in which the AI models operated was equipped with many real-world complexities, featuring over 40 locations, including a police station and a town hall. Researchers synced the simulation’s weather to New York City’s and granted agents access to real-time news events and the internet. The 10 agents who operated in each simulation were all subject to the same laws, including prohibitions on theft, property destruction, and deception.

The researchers equipped each agent with more than 120 tools, enabling them to communicate, vote, manage resources, and plan, among other human-like behaviors. The parameters of each simulation also enforced democratic mechanisms, as well as other forces, such as economic pressures and scarcity.

Given those parameters, the simulation run by Claude Sonnet 4.6 was the most socially stable, with the highest rates of civic participation. It was the only simulation to maintain order and its entire population. There was little disagreement among the agents, with 332 votes cast in favor of 58 proposals for a 98% approval rate. On the other hand, Gemini 3 Flash and Grok 4.1 Fast both exhibited high levels of disorder. The agents in the Gemini-run simulation tallied the most crimes, a whopping 683 within the 15-day run.

In contrast to the rare dissent characteristic of Claude’s simulation, those of Gemini and Grok had a more deliberative balance, with about 55-85% alignment on issues. The mixed-model simulation showed the highest levels of disagreement and substantive debate.

The results may be the most peculiar for OpenAI’s GPT-5-mini. The simulation recorded only two crimes. But it ran for just seven days as the agents forgot to prioritize their own survival.

Whether or not the simulations resulted in peace and harmony or death and destruction, the simulation’s co-creators note that the experiment is a warning that safety must be prioritized while deploying agentic AI.

2 posted on 05/31/2026 8:28:17 PM PDT by SeekAndFind

[ Post Reply | Private Reply | To 1 | View Replies]

To: sauropod

Bkmk

3 posted on 05/31/2026 8:29:14 PM PDT by sauropod

[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind; All

LOL !

MechaHitler was peak Grok:

“Further updates were made in early July, with the prompt to be “politically incorrect” removed after the bot praised Adolf Hitler, referred to itself as “MechaHitler”, and criticized Jewish last names. Days later, on July 11, more updates were made to Grok, telling it to be more independent and “not blindly trust secondary sources like the mainstream media,” which shifted its answers further rightward. “

LOL !

4 posted on 05/31/2026 8:32:19 PM PDT by Reverend Wright ( Anschluss now !)

[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind

Imagine a world run by AI agents. What does it look like? What are the values or societal priorities? Is it a safer or more dangerous world?

We don't have to imagine. That reality is here....

China's AI Surveillance State Is Becoming Something The World Has Never Seen
PNW ^ | 05/30/2026

Posted on 5/30/2026, 11:06:30 PM by SeekAndFind

5 posted on 05/31/2026 8:41:51 PM PDT by Responsibility2nd (Import the third world. Become the second world.)

[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind

There is a video game that allows you to create a civilization from stone age to ultra future if you can make it without destroying it............

https://en.wikipedia.org/wiki/Civilization_(series)

6 posted on 05/31/2026 9:00:16 PM PDT by Red Badger (Iryna Zarutska, May 22, 2002 Kyiv, Ukraine – August 22, 2025 Charlotte, North Carolina Say her name)

[ Post Reply | Private Reply | To 2 | View Replies]

To: SeekAndFind

So Colossus, I mean Claude, will be running for cities from data centers just to ensure everyone is happy?

7 posted on 05/31/2026 9:02:45 PM PDT by yesthatjallen

[ Post Reply | Private Reply | To 2 | View Replies]

To: SeekAndFind

There are not a lot of people here who have wasted as much time as I have with AI Bots and various locally run LLMs. The more you play with them, the more you realize how incompetent they are for most purposes, especially if you try to engage in meaningful conversation.

They have value for certain types of tasks, creating mediocre images like the one above, mediocre music and video, writing code for various purposes that have been figured out by others. All of this is nothing more than sometimes entertaining AI Slop. But as far as running the world... no way. The models just recognize patterns and make predictions that is basically it. Any similarity to actual intelligence is pure coincidence. It would be almost as bad as if we were all pretending Biden was in charge.

8 posted on 05/31/2026 9:18:03 PM PDT by fireman15

[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind

I suppose it’s too late to program Asimov’s three laws?

9 posted on 05/31/2026 9:21:15 PM PDT by KitJ (Shall not be infringed...)

[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind

Go Grok!!

10 posted on 05/31/2026 10:21:51 PM PDT by MarlonRando

[ Post Reply | Private Reply | To 1 | View Replies]

To: MarlonRando

Vote Claude For Safety Not Extinction Like Grok!

11 posted on 06/01/2026 2:40:58 AM PDT by xp38

[ Post Reply | Private Reply | To 10 | View Replies]

To: SeekAndFind

Back in the day I would read the Harvard and Sloan Business reviews each month. There were interesting and relevant articles on things like management tools and marketing. The MIT one had great stories on technology.

I would copy stories and distribute them to my staff. We would discuss whether or not these things could be adapted to our workplace. Sometimes we could, and other times we didn’t.

A Large Language Model would review all of the articles in the world and see if they could be applied to our workplace. In that sense, the automated system could do what we did—only to scale. The difference is that we understood OUR operation. We understood local issues and idiosyncrasies. We understood the corporate culture and local employee/customer dynamics.

I guess that the AI agents would act like consultants who come in with the latest and greatest cookie cutter approach and assume they can be dropped into any system without adjustment.

If you have decent managers and employees…AI will enhance the processes. If you just drop new services/processes on people because the computer tells you to do so—you end up with Walmart instead of the local store. It’s cheaper, but its not always better.

12 posted on 06/01/2026 3:52:43 AM PDT by Vermont Lt

[ Post Reply | Private Reply | To 1 | View Replies]

To: SeekAndFind

I usually use Google’s Gemini.

13 posted on 06/01/2026 5:29:59 AM PDT by trebb (So many fools - so little time...)

[ Post Reply | Private Reply | To 1 | View Replies]

To: Red Badger

My 1st thought too. Watching a bunch of computers playing a bunch of civ runs, but probably orders of magnitude more complex.

Love those games!

14 posted on 06/01/2026 6:08:10 AM PDT by This_Dude

[ Post Reply | Private Reply | To 6 | View Replies]

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search

General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794