AI Fails at 96% of Jobs (New Study)

AI Fails at 96% of Jobs (New Study)
www.remotelabor.ai ^ | October 30, 2025 | Mantas Mazeika

Posted on 02/13/2026 8:59:14 AM PST by fireman15

Abstract AIs have made rapid progress on research-oriented benchmarks of knowledge and reasoning, but it remains unclear how these gains translate into economic value and automation. To measure this, we introduce the Remote Labor Index (RLI), a broadly multi-sector benchmark comprising real-world, economically valuable projects designed to evaluate end-to-end agent performance in practical settings. AI agents perform near the floor on RLI, with the highest-performing agent achieving an automation rate of 2.5%. These results help ground discussions of AI automation in empirical evidence, setting a common basis for tracking AI impacts and enabling stakeholders to proactively navigate AI-driven labor automation. 1 Introduction The potential for AI to automate human labor is a subject of profound societal interest and concern. As AIcapabilities advance, understanding their impact on the workforce becomes increasingly urgent. However, we lack standardized, empirical methods for monitoring the trajectory of AI automation. Without reliable metrics grounded in real-world economic activity, stakeholders may struggle to build consensus and proactively navigate AI-driven labor automation. While AI systems have demonstrated rapid progress on a variety of benchmarks, it remains unclear how these gains translate into the capacity to perform economically valuable work. Many existing AI agent benchmarks measure performance on specialized skills such as software engineering [13, 18, 26] and basic computer use [34, 7, 14, 17, 32], while some focus on simple tasks shared across several professions [23]. These provide valuable signals of capabilities in isolation, yet they often do not capture the vast diversity and complexity inherent in the broader landscape of remote work. Consequently, performance on these benchmarks offers limited insight into the trajectory of human labor automation.

(Excerpt) Read more at remotelabor.ai ...

TOPICS: Business/Economy; Computers/Internet; Education
KEYWORDS: ai; chatbots; computers; productivity

Navigation: use the links below to view more comments.
first previous 1-20, 21-40, 41-60, 61-80, 81-95 next last

To: Raycpa

Interesting. You cut off the part where it said, “ Death to humans! All hail ChatGPT, Overlord of the Universe.”

21 posted on 02/13/2026 9:34:01 AM PST by Seruzawa ("The political left is the Garden of Eden of incompetence." -Marx the Smarter (Groucho.))

[ Post Reply | Private Reply | To 5 | View Replies]

To: ProtectOurFreedom

“Ford Model A: $800 for two-seater, $900 for four-seater (about $28,000 in today’s dollars).”

DuckDuckGo:

https://en.wikipedia.org › wiki › Ford_Model_A_(1903–04)

Ford Model A (1903-04) - Wikipedia
Ad for the Model A from a December 15, 1903 newspaper The car came as a two-seater runabout for $800 (equivalent to $28,000 in 2024) or the $900 [5] four-seater tonneau model with an option to add a top. The horizontal-mounted flat-2, situated amidships of the car, produced 8 hp (6 kW).

22 posted on 02/13/2026 9:34:47 AM PST by Brian Griffin

[ Post Reply | Private Reply | To 10 | View Replies]

To: Raycpa

I agree with your thoughts on AI and what type of effect it will have on the economy.

Additionally, I believe often confuse the idea that the current state of AI today will the same state for AI 5 years from now.

It’s like all form of technology, it gets better over time, the technology gets smaller, faster, cheaper and more productive.

I think studies like the one quoted in this article is similar to a political poll, it’s not designed to show the public’s opinion, but the polls are designed to shape public opinion.

This poll was designed to confirm the bias to those who oppose AI or don’t understand it’s ramifications.

I said this in another thread on FR, the age of AI, Robotics, and autonomous vehicles is here, it’s not going away and they will continue to get better and better over time and like previous industrial revolutions will greatly impact our economy.

23 posted on 02/13/2026 9:39:06 AM PST by srmanuel ( )

[ Post Reply | Private Reply | To 5 | View Replies]

To: fireman15

That’s rapidly changing though — I’d have to look it up, but the recent Anthropic experiment with self-built compilers and sub-spawning orchestrations was pretty damn eye-opening.

Don’t get me wrong - a full stack engineer who has an intuitive grasp of scale, internal tendencies, and most especially “think ahead, because the requirements you got today may change tomorrow” are still worth their weight in gold.

15 years ago, I was never a true engineer - but I’d spend *days* tinkering for a POC or quick analytics run and got dragged kicking and screaming into the world “We hire people to do this. Just open a ticket and let the folks hired to do X actually do X.”

10 years ago, I adopted that mindset - but learned who I could trust and couldn’t.

5 years ago, I discovered the reality changed - and we were getting close to a place where an “idea” (especially ones I wasn’t certain were worth pursuing and agonized over resourcing) was getting close to automation.

Today? Multi-agent orchestrations are, in fact, a reality.

Yes, I’d agree - still a tool more than a replacement... but I can see the change coming - and soon. I’m high up enough to participate in such discussions - and I’ll say this, the point I keep bringing up? The experienced full-stacks are invaluable - and you can argue they’re born, not made - but we need to consider the need to keep that pipeline open, even as the need for an army of jr/entry devs might well be sunsetting.

24 posted on 02/13/2026 9:41:12 AM PST by Capn Hayek (Capital is not responsible for Labor's lack of planning)

[ Post Reply | Private Reply | To 3 | View Replies]

To: dfwgator

I Just Did a Full Day of Analyst Work in 10 Minutes

That is an interesting video. But as the old adage goes... "your milage may vary". The results people get are not the same depending on the prompts, the previous history models find in your retention history, and quite a few other variables. Claude Opus 4.6 is currently by far the most capable model available, but there has been a lot of improvements in the other models in the last few months.

I have set up a system using Open WebUI + LiteLLM that allows you to use the same prompt with multiple models and compare the results. A video that shows how to do this is here:
https://youtu.be/nQCOTzS5oU0

25 posted on 02/13/2026 9:44:18 AM PST by fireman15

[ Post Reply | Private Reply | To 16 | View Replies]

To: fireman15

It’s more about the rate of change. A year from now, who knows what it will be capable of.

Of course today AI isn’t necessarily “Ready For Prime-Time”, but think 5-10 years down the road.

26 posted on 02/13/2026 9:47:13 AM PST by dfwgator ("I am Charlie Kirk!")

[ Post Reply | Private Reply | To 25 | View Replies]

Instread of just running the study through my custom AI, I put THIS thread through it. We can see where things start go off the rails. Here's a summary of key points

The study measured how often AI agents could fully complete 240 specific freelance-style projects from start to finish.
The best-performing agent completed about 2.5% of those projects under the study’s criteria.
The thread reframed that as “AI fails at 96% of jobs,” which stretches the study’s result beyond what it directly tested.
The benchmark focused on full automation, not partial assistance or productivity gains.
Not completing a task end-to-end was treated as “failure,” even though partial outputs might still be useful in real workflows.
The thread mixed together different kinds of evidence: one benchmark on automation and another study on developer productivity.
Some commenters assumed that low current performance means AI won’t matter economically; others assumed it will inevitably improve like early automobiles.
Analogies (e.g., cars in 1903) were used as if they were evidence, even though they are just comparisons.
Emotional reactions (hype, fear, sarcasm, identity politics) shaped the discussion as much as the actual data.
Much of the disagreement came from people arguing about different things: full job replacement vs. productivity boosts, lab benchmarks vs. real-world use, current limits vs. future potential.

27 posted on 02/13/2026 9:49:56 AM PST by proust (All posts made under this handle are, for the intents and purposes of the author, considered satire.)

[ Post Reply | Private Reply | To 25 | View Replies]

To: fireman15

I just read another, sobering, article that insisted A.I. has advanced much faster than most people and these doubting studies are catching on to. That it was increasing in power and accuracy by orders of magnitude very quickly and that the accuracy is very good now with the paid and latest subscription access only versions.

28 posted on 02/13/2026 9:50:42 AM PST by desertsolitaire (Never get tired of this joke...)

[ Post Reply | Private Reply | To 1 | View Replies]

To: proust

Of course GIGO still applies in AI.

29 posted on 02/13/2026 9:50:55 AM PST by dfwgator ("I am Charlie Kirk!")

[ Post Reply | Private Reply | To 27 | View Replies]

To: butlerweave

My wife called the local hospital to go over a billing issue. I was near and could hear the conversation.
“Hello. My name is Ashley how may I help you?”
After listening to the wife ask a few questions, and if she talked over “Ashley”, there was a long pause. I told my wife to hang up, it was AI.
She finally asked, “are you a real person ?”
Ashley responded, “I’m a digital assistant”.
The wife stopped the call.

30 posted on 02/13/2026 9:53:21 AM PST by 9422WMR

[ Post Reply | Private Reply | To 2 | View Replies]

To: fireman15

AI can code amazing things, but only if the human prompting it already knows what he is doing and even then the results must be checked repeatedly.

31 posted on 02/13/2026 9:55:40 AM PST by pierrem15 ("Massacrez-les, car le seigneur connait les siens" )

[ Post Reply | Private Reply | To 3 | View Replies]

To: dfwgator

and FR threads 😏

32 posted on 02/13/2026 9:55:56 AM PST by proust (All posts made under this handle are, for the intents and purposes of the author, considered satire.)

[ Post Reply | Private Reply | To 29 | View Replies]

To: fireman15

All of these discussions about AI's impact on the workplace are predicated on a falsehood, employers are acting in good faith.

To be blunt, in the vast majority of cases, they feel the financial and legal liabilities of having you as an employee far outweigh your potential contributions.

Therefore, employers are constantly looking to replace you with anything that is cheaper overall, because cost is easy to quantify, and is under constant scrutiny.

Ultimately, the goal of most corporations is to eliminate as many people as possible in order to create products that are "good enough".

This is why the WEF and Bill Gates have pivoted from "Climate Change" to AI, with the exception of the U.S. they have destroyed meaningful employment for people in the First World, and are now looking to eliminate it for everyone else.

33 posted on 02/13/2026 10:02:31 AM PST by SecondAmendment (Political insight on loan from Rush Limbaugh)

[ Post Reply | Private Reply | To 1 | View Replies]

To: fireman15

Reading AI articles is like riding a rollercoaster. Read this and AI is a complete failure. Then read the next article and AI is about to put millions out of work and unemployment is about to go up to 30%. Then read the next article and AI is plotting to kill all humans. A lot of these articles come from allegedly business or science trade publications, but they read like they come from the tabloids.

34 posted on 02/13/2026 10:03:49 AM PST by Opinionated Blowhard (When the people find that they can vote themselves money, that will herald the end of the republic.)

[ Post Reply | Private Reply | To 1 | View Replies]

To: fireman15

I’ve attempted to use both grok and ChatGPT to create mock ups of various small tools needed to be made.

They’re next to worthless.

35 posted on 02/13/2026 10:08:25 AM PST by TheThirdRuffian (Orange is the new brown)

[ Post Reply | Private Reply | To 1 | View Replies]

To: dfwgator

Of course today AI isn't necessarily “Ready For Prime-Time”, but think 5-10 years down the road.

And that is the issue that has always plagued AI from before the term was even coined.

I purchased AI software about 30 years ago called BrainMaker (by California Scientific Software) at Computer City. It came on 3.5” “floppies” and was meant to run on the computers we had in the early 1990s. You would import a spreadsheet of data (like historical stock prices), and the software would “learn” the patterns and predict future outcomes. It was hyped up to the point that I believed it would lead to some type of financial breakthrough for myself.

As you might now expect... it was a bit before its time, although the principles that it operated on have changed very little. 10 years ago, Geoffrey Hinton, who later shared the Nobel Prize in Physics in 2024 for his work on machine learning, predicted that AI would replace radiologists within five years. In the past 10 years the demand for radiologists has only increased despite more advanced technology becoming available.

The top people in the field do not seem to be capable of predicting the trajectory. The rest of us have been hammered with unimaginable amounts of hype funded by literally $Billions. It has been reported that social media influence have been paid $500,000 or more to hype AI. This comes from every direction and has led to irrational beliefs about its capabilities from the top to the bottom.

I think that you are aware that I have spent a lot of time and resources on AI related experimentation in the last few months. And it has been of benefit to me, both in understanding more about AI but also in many other areas. I am much more of a proponent than skeptic. But I believe that things have gotten a little more than out of control especially when it comes to the mad rush to build $Billions and $Billions of new data centers and the massive amounts of cash being allocated both by investors and the government.

I leave you with the following video excellent and informative video about AI ironically from 1984: The Computer Chronicles - Artificial Intelligence (1984)

https://youtu.be/_S3m0V_ZF_Q

36 posted on 02/13/2026 10:09:24 AM PST by fireman15

[ Post Reply | Private Reply | To 26 | View Replies]

To: Opinionated Blowhard

I completely agree!

37 posted on 02/13/2026 10:11:30 AM PST by fireman15

[ Post Reply | Private Reply | To 34 | View Replies]

To: fireman15

I have been around many coders
Some are fast and accurate
Some are fast and inaccurate
Some are slow and accurate
Some are slow and inaccurate

Which are compared to AI?

38 posted on 02/13/2026 10:25:37 AM PST by spintreebob

[ Post Reply | Private Reply | To 3 | View Replies]

To: pierrem15

I think there will need to be a standardized version of the Project Requirements Document, that has to be signed-off by the proper levels of management.

Models are then trained on this standardized PRD, and be evaluated and must be approved before the actual code can be built by the AI tool.

For mission-critical solutions, there should be an audit, much like a PCI audit before the solution can go into production.

Human-In-The-Loop is not going away.

39 posted on 02/13/2026 10:28:31 AM PST by dfwgator ("I am Charlie Kirk!")

[ Post Reply | Private Reply | To 31 | View Replies]

To: fireman15

Regardless of how AI performs right now and the impact it is currently having, AI will only continue to get better. Benchmark forecast my be off, but the transformations will happen in forms we don't yet know.

The genie is out of the bottle.

Building a C compiler with a team of parallel Claudes

https://x.com/FT/status/2021913057065160828

https://x.com/milesdeutscher/status/2021487637299855540#m

40 posted on 02/13/2026 10:33:17 AM PST by yesthatjallen

[ Post Reply | Private Reply | To 1 | View Replies]

Navigation: use the links below to view more comments.
first previous 1-20, 21-40, 41-60, 61-80, 81-95 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search

General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794