Posted on 02/06/2026 6:02:22 AM PST by yesthatjallen
I've been experimenting with a new approach to supervising language models that we’re calling "agent teams."
With agent teams, multiple Claude instances work in parallel on a shared codebase without active human intervention. This approach dramatically expands the scope of what's achievable with LLM agents.
To stress test it, I tasked 16 agents with writing a Rust-based C compiler, from scratch, capable of compiling the Linux kernel. Over nearly 2,000 Claude Code sessions and $20,000 in API costs, the agent team produced a 100,000-line compiler that can build Linux 6.9 on x86, ARM, and RISC-V.
The compiler is an interesting artifact on its own, but I focus here on what I learned about designing harnesses for long-running autonomous agent teams: how to write tests that keep agents on track without human oversight, how to structure work so multiple agents can make progress in parallel, and where this approach hits its ceiling.
Enabling long-running Claudes
Existing agent scaffolds like Claude Code require an operator to be online and available to work jointly. If you ask for a solution to a long and complex problem, the model may solve part of it, but eventually it will stop and wait for continued input—a question, a status update, or a request for clarification.
SNIP
(Excerpt) Read more at anthropic.com ...
You can run LLMs on your local machine. 80% of what most people need can be accomodated by running the model on your machine.
Then use the proprietary models when you need the power. But most simple tasks can be done with an open-source model running under Ollama.
Use the AI to refine specs and test cases and thoroughly review them.
I’ve been looking at the BMAD method, which looks promising as an approach to developing software, since it focuses up front on building the requirements documents.
Good questions. However, I think you’re on target with the academic exercise comment.
After every other input, the AI will play a 90 second video ad.
I’ve been wondering that myself. Considering the money being spent on AI data centers, how much work will they actually generate that is billed at a profitable rate?>>> As long as we subsidize the power generation it will be profitable.
Exactly. It just would create bloated executables.
Fun article.
Now I know why my lights were browning out the other night!
Claude is working on documentation; when he gets around to it?
” Over nearly 2,000 Claude Code sessions across two weeks, Opus 4.6 consumed 2 billion input tokens and generated 140 million output tokens, a total cost just under $20,000.”
I would like to see a detailed, line-item budget covering total energy usage, cooling cost, percentage of total costs, and administration.
A great C compiler produces outstanding assembly code. Complete with understandings of the cache, out of order execution, and all the intricacies of the target CPU. As a guy who learned BASIC and assembly at the same time I get what you’re saying, but yes, modern C compilers are good enough.
The Feds will bail them out and print more money to cover it.
Weimar and collapse.
[Luckily, I will expire before the wave come ashore. I have been in software for 20+ years and this latest push with AI is junk. Every time they task me with using it my velocity slows down because I am constantly correcting the AI. So, I turn it off or avoid it, except for looking up simple syntax issues that I have forgotten over the years.]
There is no way in this world or the next that they can generate that kind of revenue.
I saw a similar article touting the $1.5T CapEx. The OpEx was also in the few hundreds of billions already with limited revenue for it. They seem to be following the Build It and They Will Come model.
It seems more like a port of the existing compilers to Rust, which while a worthwhile task isn't the same as the claim that the AI tool wrote a compiler from scratch.
If the task was "write a C compiler that will compile the Linux sources" without any restrictions on the use of existing source code then any barely competent coder could take the GNU or Clang source and "write" a C compiler.
A more interesting test for the AI tools is to write something original.
"The fix was to use GCC as an online known-good compiler oracle to compare against. I wrote a new test harness that randomly compiled most of the kernel using GCC, and only the remaining files with Claude's C Compiler. If the kernel worked, then the problem wasn’t in Claude’s subset of the files. If it broke, then it could further refine by re-compiling some of these files with GCC. This let each agent work in parallel, fixing different bugs in different files, until Claude's compiler could eventually compile all files."
So much for the AI tool, its feedback is coming from a known correct implementation of the same functionality. Good for porting, but probably not so good writing original code.

"As one particularly challenging example, Opus was unable to implement a 16-bit x86 code generator needed to boot into 16-bit real mode. While the compiler can output correct 16-bit x86 via the 66/67 opcode prefixes, the resulting compiled output is over 60kb, far exceeding the 32k code limit enforced by Linux. Instead, Claude simply cheats here and calls out to GCC for this phase (This is only the case for x86. For ARM or RISC-V, Claude’s compiler can compile completely by itself.)"
Hmm, I think most programmers using the GCC code as a base could "write" a C compiler that "calls out" to GCC when needed. I think I could "write" that compiler in bash.
Suicidal software, the new frontier.
I definitely chuckled at that when I read the piece (which blows my mind, BTW).
Shows my age - and I was never an engineer anyway, but got somewhat forced into a role I wasn’t right for in a company making a digital move without hiring the right people. Also shows how horrid our infrastructure was.
But anyway... There was a constant little help agent/process that people abused - basically, used to dump logs to an email box (and at the time? MINE!)
I don’t mind helping but I got extremely annoyed to get the same damn error from the same damn people - especially when I took the time to explain what the error meant and how they could solve it themselves without dumping it to my team.
Anyway... root process I couldn’t touch.
But one day? I discovered I actually DID have a permission level to kill user sessions!
You can see where I’m going with this — and I wasn’t cruel, but I created a pretty simple log check against user IDs and some common codes. You got *two* in a week. *three* times on the same error? Your userID session just got terminated rather than spamming me and I also managed to work in a trap to force re-login to download the log and then challenge “Unread log”.
Like I said - shows both my age and how horrid our infrastructure is... but it took months until someone said “What the hell is spmxcushion.sh and who did this??!”
Spam-Executioner... It was a process to punish users that abused the “send log for help” option.
The best part was they’d have never found out if I hadn’t said “I did that” — and pointed out 1)whoever had control on permission levels totally screwed up, 2)count yourself lucky I’m not nefarious and didn’t abuse 1, 3)I’d have done it “better” if not for 1, and 4)here is the list of people who waste my team’s time by not listening
Fun stuff.
Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.