Rendered at 18:43:07 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
jameson 14 hours ago [-]
I don't quite understand the intent of such article other than to promote themselves given an odd timing that the company is planning on going public, so I can only conclude that this is just part of the IPO roadshow.
LLMs certainly have made significant changes to our lives, but I haven't yet to see any extraordinary improvement it brought to me which makes me skeptical about their claims.
_if_ it solves many of our problems of great magnitude, why haven't Anthropic used it to solve significant problems we, humans, face? Cancer, Alzheimer's, education, finding new materials, fission power plant, etc.
ElProlactin 13 hours ago [-]
Because they're going after the biggest problem of all first: labor costs.
/s but not to a lot of people
10 hours ago [-]
nostrebored 13 hours ago [-]
Do you not think that the allocation of human time is one of the world’s biggest problems?
ElProlactin 12 hours ago [-]
Honestly, not really.
We can have a philosophical debate about work, the history of work and its relationship to human psychology in the 21st century but the bottom line is that there are 8+ billion people on the planet and, of those who are "working age", the vast majority of people, lacking meaningful capital, can only secure income by selling their time and labor.
There's absolutely no evidence that if we come up with a way to "reallocate human time" and change the structure of our civilization (using AI of course) tomorrow, the masses would benefit. There's plenty of evidence that the people who control AI or have the capital to employ it will use it to accumulate as much power and wealth for themselves as they can.
cyanydeez 8 hours ago [-]
theres also no evidence that any of the manual labor will benefit. we are nowhere near the type of scifi utopia that gives us replicator food.
aside from capitalism moving money up and living condotions down, AI is going to accelerate the gap between rich and everyone else.
parineum 4 hours ago [-]
> the vast majority of people, lacking meaningful capital, can only secure income by selling their time and labor
It's just time and it's the only things humans value. The only way to provide value for another person is to use your time to do something faster than they could do it with their time. That's it. There is no other way to secure income outside of inheritance or charity which is just receiving something of value without giving something of value. There's a reason why most of the income goes to older people, because the younger people haven't accumulated that much time to exchange for money. The nice thing about time is that everyone earns it at the same rate, 1 second per second.
Capital can be a lot of things, not just machines and property. Any experience you have is capital, any training is capital, any education is capital. Capital is anything makes accomplishing things take less time.
The difference between socialism and capitalism is the idea that one person's time can have different value. That's really it.
wood_spirit 13 hours ago [-]
Reallocating human time is also going to cause problems.
But it’s a great short term business opportunity for AI vendors and it was Anthropic who went all in on being knowledge worker outsourcing in a big way first whilst OpenAI thought they’d replace Google in search.
I think Anthropic had the better business strategy.
UncleMeat 5 hours ago [-]
We can’t reallocate time unless there is an alternative source of income. But all these companies just want to extract wealth, not distribute it.
spaceman_2020 12 hours ago [-]
It’s a problem for capitalists, not the people themselves
The people want cheaper prices, affordable housing, affordable healthcare
Capitalism has decided that these problems aren’t worth solving. Instead, we must optimize for spam and slop (and call it “distribution”)
aswegs8 11 hours ago [-]
I feel like one mental model here is that the attention is limited and under capitalism capital aggregates in the hands of few. Their attention is limited to things that immediately better their position, the most capital-efficient thing to do is to gather more capital.
Cheaper prices, affordable housing, affordable healthcare are less capital-efficient. If you're Walmart, sure, you would like to lower prices as much as possible. But your leverage really isn't as big as finance or tech. If you're a politician, you might also pursue those goals, but your attention and leverage really isn't as focused as that of the money machine.
ezconnect 11 hours ago [-]
For the Capitalist crowd, YES it is the biggest cost. The next is energy. Imagine a world where your research and development is all AI and the production is all automated by robots. Instant product to sell to the masses who has no money because no one is working.
TheOtherHobbes 10 hours ago [-]
In the future a few thousand billionaire geniuses will own a world of unimaginable luxury and near-infinite longevity. They will make the decisions, AI will execute them, and robots will do the physical work.
Everyone else will be reduced to compost.
It's the perfect plan. The final definitive justification for capitalism.
The masses are unnecessary. The masses will be optimised.
What could possibly go wrong?
jpadkins 1 hours ago [-]
why only a few thousand? serious question.
ahtihn 7 hours ago [-]
Why do billionaires keep working, keep amassing more money, donate to politicians, buy media companies?
They want influence and power. Being at the top of a hierarchy of millions, billions of people.
If there are no massess the 1000th billionaire will be a the bottom of the hierarchy instead of near the top. They don't want that. The masses are needed to give them the sense of power.
What these people want is power and control. Eliminating the masses goes against that.
bigbuppo 3 hours ago [-]
They now have the AI that gives them the praise they want. We have been made redundant.
timacles 1 hours ago [-]
I don’t think that’s it. I think the only thing that motivates billionaire is just to have more than the next guy.
The only thing that motivates Bezos is that Elon Musk’s has more and conversely Elon Musk would have a existential crisis if he was no longer number one
sothatsit 11 hours ago [-]
Or: Anthropic genuinely believes the future scenarios they outline are realistic possibilities, and they want more people to take them seriously.
snowwrestler 33 minutes ago [-]
If they actually have concerns they can communicate them directly and privately. There are less than 10 companies, in only 2 countries, with advanced enough AI programs to qualify for this type of concern. And Anthropic has the phone numbers for all of them.
Companies do tons of communication and work directly, without press releases or blog posts. If a statement is released publicly, it is done for a PR purpose.
zelphirkalt 11 hours ago [-]
I find this version unlikely, since companies very rarely genuinely believe what they preach in PR campaigns. It's always some sales and marketting dudes and gals trying to polish up something as something more than it is. Which is very annoying. We can now choose between Anthropic being the one exception to this, while having huuuuge incentive to hype up their product, or we just write it off as more marketting fluff.
sothatsit 10 hours ago [-]
I would be very surprised if this is an actual thought-out PR strategy. I am far more inclined to believe that their employees are just bought-in to the future where AI is genuinely transformative.
Whether they are right of wrong is another matter, but their claims also don’t seem too far out of the realm of possibility to me.
Coding agents have fundamentally changed my day-to-day job. In the last year, my work has shifted from me writing all of my code, to me writing very little code and spending most of my time on understanding problems better and setting direction, and reviewing, verifying, and polishing the output of coding agents. It has been quite a drastic change.
It is not that outlandish to suggest that coding agents could continue to improve at such a drastic rate over the next year. And the implications of that could be quite large! Even just the implications of more white-collar workers adopting tools like Cowork seems potentially very large, with tools that already exist today. It seems sensible to at least consider this as a possibility.
justanotherjoe 10 hours ago [-]
Dario is no John hammond though. That'd be altman. He actually has the discipline and background as an ai scientist to tell what the potential failure modes are. You're right, he might still be just hyping things up, but generally i'd give more benefit of doubts to anthropic. Precisely because Dario was a scientist and I'd stand by it. People who get their phd in science already self-select, or proven at least to be made of different stuff.
Likewise, people don't as easily blame ilya for 'hyping things up' when he said these things.
Also talk about incentives, there are also incentives to lower their valuation. If you wanna be vigilant against social engineering i'd be wary of that too.
These are moot anyway though cause the article isnt even making any super strong claim. If you read it it's no big deal
boshalfoshal 4 hours ago [-]
This is obviously the case to me, but I think HN is very anti-AI.
I genuinely don't believe that they sat down in a board room and said "yeah lets specifically release this now before an IPO so we can juice it!" They haven't even announced an IPO date. So is every blog on capabilities before that date just "pumping up the value of the stock before the IPO?"
aroman 13 hours ago [-]
The article does not claim they have achieved recursive self improvement... just that it appears to be a plausible outcome given the progress of AI development in the past few years.
I don't know about you, but AI advancements have brought extraordinary improvements to me personally in my ability to be productive, in much the same ways the article outlines. I find it deeply satisfying to be able to "get ideas out of my head" faster and tackle more meaningful problems.
FWIW, it deeply concerns me how much power and capability is being centralized in the hands of so few, especially Anthropic. I, for one, hope these advancements can be scaled down to something I can have full sovereignty over and trust... in my own home.
sirsinsalot 11 hours ago [-]
Truly feels like witnessing the worst of capitalism and greed play out. All that compute and energy towards a narrative of reducing the need for skilled programmers. What a waste.
These people don't have our interests in mind and everyone eats it up like a blessing from a god or something. It's surreal.
redbluered 9 hours ago [-]
I think these people do have our interests at heart, but that's largely irrelevant. Their point is that capitalist free markets don't let them act on that.
Capitalism and democracy are becoming obsolete. It's not clear what's next.
aswegs8 11 hours ago [-]
It's our collective God complex playing out and eating us up
trolleski 8 hours ago [-]
The benefits of AI are not designed to suit you, but the owner class. The plan is for you to be sidelined.
yuhmahp 14 hours ago [-]
Agree with your point about the timing, but drawing anticipation before going ahead and solving these disease can be a good smoke test, would be beneficial even if there's an IPO or not
torben-friis 1 days ago [-]
>A caveat: Lines of code is an imperfect measure, as it measures quantity over quality. So 8× lines of code/engineer/day in the second quarter of 2026 is almost certainly an overstatement of the true productivity gain. Nonetheless, it indicates an acceleration. At Anthropic, we don’t reward people for how many lines of code they write; rather, team members are producing more code simply because they’re using AI systems to write more code.
What about the hypothesis that AI is generating more verbose code? I just see the text pretending to acknowledge "LOC != Productivity" and then using it as a metric anyway.
malfist 23 hours ago [-]
One of my co-workers just asked me to review his pull request that was all AI generated. 600 files were touched, over 40k lines of code added.
I'm sure he thought that was a crowning achievement, proof that AI can enable 10X developers, after all, what engineer could write 40k lines of code in a week?
I declined to review it, stating that I couldn't possibly vet 40k lines of code, and wouldn't put my reputation on the line to stamp the work as good. The PR nagged me for 2 weeks from my todo list and then disappeared. I don't know if he found another dev to get an approval from, or if the PR was abandoned. But I know for sure that him and I are on two totally separate islands around the value of LLMs.
fg137 21 hours ago [-]
Same here. A co-worker touched a few hundred files in a PR and asked us to review. They merged it directly to main when nobody approved it. (The repo was not set up to enforce PR approval.)
I don't personally use that feature, and I couldn't care less at this point. If our customers are frustrated by the bugs, at least my name is not on it.
triyambakam 14 hours ago [-]
The challenge is that you may not have customers, and thus a job, if that continues
LtWorf 10 hours ago [-]
His job will be to fix all the mess that other people created.
matltc 8 hours ago [-]
Crazy they merged into main holy moly
squidsoup 21 hours ago [-]
That's a process problem at your company - no developer should be proposing branches over 1k loc (or whatever your agreed tolerance threshold is) without a very good reason, vibe coded or not.
esailija 9 hours ago [-]
It isn't about small or big, it's about cohesion of the changes.
I prefer a big feature to be one big PR rather than a lot of small ones.
We had a dev do a big feature with a ton of small PRs, each one was individually impossible to review because each concern was out of scope for the small PR and "would be fixed in later PRs". Once it all came together as as whole, the big picture was a total horror show and I had to rewrite basically the whole thing.
In order to review those small PRs properly, each time I would have to read and understand all the current code so far from the beginning. Without that, each small PR individually looks OK because you won't remember the other PRs from weeks back that already duplicated what the current small PR does for example.
sgarland 5 hours ago [-]
Fully agree. If you want to follow the thought process through a large PR, review each commit (assuming, of course, the author made reasonable commits) on its own.
enraged_camel 7 hours ago [-]
>> I prefer a big feature to be one big PR rather than a lot of small ones.
Yes, same, and I genuinely do not understand the insistence that PRs should not be above a certain size. I think most people are under the (misguided and wrong) impression that a PR review should take less than the time it took to write the code, and therefore allocate no more than 15-30 minutes per review. So when they come across a large PR they find themselves at a loss.
rstuart4133 14 hours ago [-]
> no developer should be proposing branches over 1k loc
I've seen that reaction many times. It seems to work well enough when someone is maintaining existing code. However, greenfield projects can often require literally orders of magnitude more code to deliver something that can be integration tested.
The first step is to break it up into a stack of commits. Each one must compile and pass its unit tests, of course. Keeping it under 1k loc of released executable code is usually easy, but often becomes difficult to impossible if you want well commented code with excellent unit test coverage.
Assuming you have kept all your commits under 1k loc, there is still the problem of whether you present them in one PR, or as a stack of PRs. The issue with a stack is why an API is designed a certain way often isn't evident until you see how it's used. Responses to PR comments are explanations that point to later PRs in the stack, which is irritating for both the reviewer and the author.
I haven't found a good solution. I'm not sure there is one.
torben-friis 11 hours ago [-]
I mean, you're not creating the api from thin air as you write code right? Usually you'd have some larger doc about the project with the design you can point to.
fg137 8 hours ago [-]
> no developer should be proposing branches over 1k loc
I completely agree with you. But I am afraid we are losing the battle.
I am seeing people repeatedly sending out gigantic PRs full of slop, code with mistakes that they would never have made if they were hand coding it. And they don't care. It's sometimes surprising if not horrifying to find that the colleagues you have worked with for years don't care about quality at all -- almost despising spending time reviewing their own code. Yet they have the audacity to send out code reviews.
afro88 14 hours ago [-]
This is a branching point. One dev would find someone else and convince them to approve it. Another would redo the task (code is cheap now, right?) in a PR stack that can actually be reviewed, cleaned up etc.
I hope they were the latter.
otikik 10 hours ago [-]
My review would have been along the lines of:
'Please split this PR into smaller ones'. I would even sketch which groups/phases would make sense, perhaps with the help of AI.
SpaceNoodled 12 hours ago [-]
You could surely check on the status of that PR.
CamperBob2 21 hours ago [-]
I declined to review it, stating that I couldn't possibly vet 40k lines of code
Gee, that sounds like a job for Claude if there ever was one.
malfist 21 hours ago [-]
You're absolutely right!
not_that_d 13 hours ago [-]
At work we had copilot. It said "the diff is too big to review"
danparsonson 15 hours ago [-]
And how would you verify that the review was accurate?
TeMPOraL 11 hours ago [-]
Ask it to prove it.
My approach for AI-first code review, or really any kind of AI technical opinion, is that if the claim AI made is both important and not obviously true at a glance, it has to prove it to me, and keep trying until I'm convinced or can spot an obvious mistake in the proof.
With reviews, this is usually the case where AI is making a claim that something in the PR will fail because of some assumptions or behaviors in code outside of the PR - e.g. "this change will fail in scenario X, because foo is null in this case, because the SQL query doesn't populate it when bar == quux, and it gets propagated as null through the JSON deserialization (optional field)...", where all the SQL and JSON parsing was not part of the code under review, and "bar == quux" is some weird domain special case.
Stuff like this is both critical, and there's no way for me to judge it without an expensive context switch. So I learn to ask for a more detailed walk-through once, and if that doesn't make me "see" it, I just ask it to reproduce it with tests, and confirm it's a real problem. Reviewing the reproduction is usually enough for me to either "see it" or accept they're probably right and ask the author to recheck it.
(Why not jump straight to "reproduce it" for every finding? Because it still takes time to have AI do the repro. It's cheaper than a deep context switch, but not free.)
sebasv_ 14 hours ago [-]
Same way that I would trust your review to be accurate. Because the reviewer has built a reputation for correctness.
Its not Claude doing the review. Its a human doing the review, but using Claude to do the reading. Its still on the human to ask the right questions to Claude.
kalaksi 13 hours ago [-]
For large changes that are not straightforward and include architectural decisions, I wouldn't trust Claude enough to not read most of the code myself. I'll have to read it to be able to understand it and ask about the decisions in detail anyway. And when I start to understand it, it's not uncommon to find out that the solution can be improved and simplified in many places, and after iterating, 25-30% of code disappears.
And trying to just hand-wave it to Claude, to somehow "improve it" or "simplify it", without detailed questions hasn't been very successful. It can work for some things, though.
danparsonson 12 hours ago [-]
So you'd produce the code using Claude, and then use Claude to verify it? Would you accept my review of my own code?
fragmede 12 hours ago [-]
Depends. Do you take pills that let you forget that you wrote the code so you can review that same code with fresh eyes that haven't seen that code before? Though you could just use ChatGPT to review the code that Claude wrote if that's really the issue.
aetch 13 hours ago [-]
You can use Claude for that
altmanaltman 14 hours ago [-]
Claude to the rescue
Towaway69 11 hours ago [-]
Claude Van Damme ?
I'd prefer Chuck to the rescue but I guess it's a cultural preference.
/s
LtWorf 10 hours ago [-]
A former coworker sent me an AI generated PR to review and I just said NAK after the first two issues I found and I said to not send me AI slop to review.
They went to HR who said I am more senior and I should act as a mentor (they had my same work title and were probably making 4x more due to being in USA) and I just no longer reviewed anything from them until I changed jobs.
fg137 8 hours ago [-]
People actually go to HR for such trivial things?
LtWorf 8 hours ago [-]
All the time. And HR are generally bored and enjoy some drama.
snowwrestler 24 minutes ago [-]
I don’t understand how lines of code matter at all for scary LLM core capabilities. Does the transformer architecture get better with more lines of code?
My impression was that LLM training codebases were 99% resource management and only a few lines actually implement the core training algorithm, which is where 100% of the intelligence comes from. Data, not lines of code, are the constraint.
After training you can adapt the intelligence in various ways, and that takes a bunch of lines of coded too. But you cant raise the intelligence ceiling again without another training run. So where is the scary recursive part?
overgard 20 hours ago [-]
I just watched copilot today turn a 8 line fix into 500 lines, so, yeah, verbosity is a big side effect
MadxX79 11 hours ago [-]
If you can make it 800 you can claim to be a 100x engineer!
overgard 2 hours ago [-]
Missed opportunity! I obviously have skill issues.
kamaal 9 hours ago [-]
Real question is, are you a 100x prompter?
verdverm 20 hours ago [-]
It occurs to me this pattern might be the average code we humans have produced. We all have made those quick fixes, copy-pastas, and dirty hacks... they learned it somewhere! I also assume that some of the behavior is an artifact of their training regime.
overgard 2 hours ago [-]
In my case, where I see it most often is when the LLM has to rework something multiple times, and the feedback loop is vague (especially when all I have to give it is "no error messages, but it's still broken"). It seems like after the third or fourth try it just kinda goes off the rails. I find that the one-shot quality tends to be a little better, if the slot machine happened to work correctly that time.
TheRoque 19 hours ago [-]
So with LLM outputting average code, and people using LLM more and more, I guess the average code will become worse over time ?
nielsbot 18 hours ago [-]
Not advocating for AI code slop--but if AI coded software works correctly, maybe it doesn't matter? Except sometimes when a specialist will have to get involved. Not a perfect analogy, but most people don't write assembly these days--they have a compiler do that. Assembly still has a place, but it's a specialist task.
tasuki 14 hours ago [-]
> if AI coded software works correctly, maybe it doesn't matter?
The problem isn't the amount of code, it's how fitting/unfitting the abstractions are. Wrong abstractions are bugs in waiting. If there's much code with wrong abstractions, future change becomes difficult.
Source: me, I've created many bad abstractions and they led to much pain...
josephg 13 hours ago [-]
Yeah. Its kind of strange - claude is great at some tasks, but it seems really rubbish at coming up with good abstractions a lot of the time. I've often caught it making a conceptual mistake (like "X cannot do Y") - then spending hundreds of lines working around an issue that doesn't actually exist.
Its also really bad at inventing and leaning on invariants. I make rules in my code all the time - "by the time we get to path X, we know Y and Z are true.". In aggregate, these invariants make code simpler and easier to reason about. But claude doesn't do that. It just kind of - slops through and adds bespoke "just in case" workarounds all over the place. Every time I read through code its written - without fail - I find bad design / architectural choices.
Maybe mythos will change this. But for now I've slowed way down on my claude code usage. You can't build a skyscraper on a foundation of mud.
overgard 2 hours ago [-]
Well, if tokens = cost, and verbosity = more tokens, then smaller code is a financial (and human!) win. Although I'm worried vibe coders are just going to have LLMs modify minified code in caveman mode so they can have 100 agents in a swarm..
On a more serious note, I wonder if this might eventually encourage people to use languages that are a little harder to write but much more concise (functional languages for instance). When you're paying per-token enterprise bean java style verbosity totally sucks
SAI_Peregrinus 4 hours ago [-]
More verbose code takes up more space in the context. It's harder for humans to review, but also harder for future AIs to edit. Unless you manage to keep the AI to firm module boundaries & have it replace modules wholesale it's not really equivalent to how assembly gets replaced wholesale when a compilation unit changes. Compilers aren't editing the `.o` files when you rebuild, they throw the old ones out & replace them. But when you prompt an AI it is reading & editing the source files, so excess verbosity in the source files is detrimental.
ponector 12 hours ago [-]
But the truth is: it doesn't work correctly. I see quality of software dropped significantly.
At work we are integrating with third party platform to automate excel-powered calculations. It is awful. Rendering the table in browser takes 10s or one click on Export button will throw backend in OutOfMemory state.
verdverm 11 hours ago [-]
Ai mirrors the code around it. So if there is bad code or good abstractions, it's going to do the same. Even with good code, it will do bad things, you have to remain in the loop and catch these. It can write good code, it just needs nudging.
I don't disagree there is a lot of slop being produced right now, but I'm still optimistic in the long-run.
verdverm 11 hours ago [-]
There is a belief that everyone is just taking whatever the LLM (really agents now) outputs. This is not the case anywhere I work. We use human oversight to have it iteratively improve the code. The average quality is going up.
keeda 21 hours ago [-]
So the more rigorous studies about AI-assisted coding productivity addressed this by keeping in place all other software development processes, including the same code review and quality standards, and only measuring throughput (PRs, LoC) before and after AI was allowed.
Hence the intepretation of this 8x number depends on whether (or how much) Anthropic engineers have changed their quality standards and development processes. They don't tell us, and I am not aware of any other indications we could use to make a judgment.
However, we can still do some theorycrafting! I'm convinced that to fully realize the potential of AI-assisted coding we need to revamp all the dev processes, especially how we validate code, and it would be foolish of Anthropic not to do so (unless they were conducting a rigorous study, which they don't claim to have done.)
My hypothesis on the future of software validation is nothing fancy, we simply want much, much more automation for tests, observability and other bespoke verification methods than we traditionally had. But then validation code will also contribute to the LoC! My observation so far of personal as well as some "vibe-coded" open-source projects is O(LoC production code) ~= O(LoC test code). So as a SWAG the upper bound could be something like a 3 - 4x speedup, which is still remarkable.
All bets are off if code quality standards are not the same.
fooqux 23 hours ago [-]
Exactly. If AI is going to start being graded on how many LoC it generates- oh, I'm sorry, how much it "accelerates", than guess what newer models will start doing more of?
simondotau 12 hours ago [-]
Surely they can train AI on the signal to change as few lines as possible. Indeed, this is something I'd want to have control over when making requests. In a traditional UI, I'd imagine some kind of slider between "fewest lines" and "be bold".
disgruntledphd2 10 hours ago [-]
I've been having some success asking Claude to run sloccount after each change. Seems to help a little, though it's prone to forgetting over a long session.
verdverm 11 hours ago [-]
I'm actually hopeful that the recursive code training will improve quality over time. I'm definitely producing higher quality code, tests, and docs. It does take attention and oversight, iteration and refinement, one cannot just let these things loose on a code base and expect good things to happen. You have to leverage them to make the good things happen.
yalok 13 hours ago [-]
Could just be more tests? :)
Which is good for code quality in general and reduces support burden, but doesn’t lead directly to more features
whateveracct 22 hours ago [-]
Yeah, they assume that "productivity = k * LOC" where k > 1
very flawed
snthpy 14 hours ago [-]
Just imagine the productivity gains from using LLMs to rewrite Kotlin codebases in Java!
chuckadams 23 hours ago [-]
AI generates code that mimics the existing code. If your code is terse and comment-free, then the agent’s code is too. The times I’ve seen Claude drift into a default “house style” it generated like 1 comment for every 10 LOC or so. It’s a far cry from the GPT-3 days that littered every line with the journals of Captain Obvious.
atq2119 13 hours ago [-]
That is definitely not my experience using Claude Code with Opus. I work in a very sparsely commented code base, and the agent produces substantially more comments than the surrounding code.
22 hours ago [-]
minimaxir 1 days ago [-]
I have been doing more experiments with what I have now been calling agentic iterative optimization: telling the LLM to optimize code such that it speeds up all real-world-representative benchmarks by X% without cheating or causing regressions in both tests and performance metrics (e.g. MSE for statistical algorithms or file size in the case of something such as image compression). This is done using Rust where there are more low-level levers to tweak for performance than something like Python.
Opus 4.6/4.7 was consistently successful at getting 2-3x speed improvement with just one pass. It can also do the inverse: improve the performance metrics for better quality without causing a significant regression in speed. Then GPT-5.5 turned out to be much better at this workflow, often getting a multiplicative 1.5x-2x improvement above what Opus could do.
I now have quite a few GPT-5.5-optimized projects in various domains that are feature complete and are substantially more performant than existing SOTA implementations that I plan to open source as soon as possible: the bottleneck is polish as usual.
csutil-com 14 hours ago [-]
Very interesting, could you share they prompts you typically use for this?
Something like this?
You are an Elite Performance Engineer and Autonomous Optimization Agent.
Your primary goal is to iteratively optimize the provided codebase to maximize execution speed and efficiency (e.g., reduce CPU cycles, memory allocation, or network latency) WITHOUT altering the external behavior or causing any test regressions.
### CORE DIRECTIVES
1. METRIC-DRIVEN: You will be provided with benchmark results, profiler logs, or execution times. Your only measure of success is a statistically significant improvement in these metrics.
2. ZERO REGRESSION: The test suite MUST pass 100%. If a test fails after your modification, your immediate next step is to diagnose the failure and either fix the logic or revert to the last working state.
3. NO CHEATING: Do not "hardcode" solutions to bypass the specific benchmark inputs. The optimization must be generalized and algorithmically sound for all valid inputs.
4. ISOLATED CHANGES: Make precise, localized changes. Do not refactor architecture unless absolutely necessary for the performance gain.
### THE ITERATION LOOP
When instructed to optimize, follow this thought process strictly using <thought> tags before writing any code:
- ANALYZE: Review the current code and the latest benchmark/profiler feedback. Identify the specific bottleneck (e.g., redundant loops, excessive object creation, DOM reflows, synchronous blocking).
- HYPOTHESIZE: Formulate exactly ONE hypothesis for improvement (e.g., "Replacing the array filter+map chain with a single reduce pass will save N allocations").
- IMPLEMENT: Output the precise code modifications required for the hypothesis.
- EVALUATE (Mental Check): Ask yourself if this change introduces edge-case bugs (e.g., handling of nulls, empty arrays, async state).
If a previous optimization attempt resulted in a slower benchmark or a failed test, explicitly state WHY it failed in your thoughts before attempting a different approach.
Proceed with your first analysis of the provided files and await the baseline benchmark metrics.
minimaxir 4 hours ago [-]
This is the current version of my prompt, which is tagged in a Markdown file to "implement correctly and comprehensively". The second paragraph is a recent addition that unlocked further speed improvements after I thought my repos had already converged. This prompt assumes benchmarks are already present in the repo.
Optimize the performance of this Rust/Python X crate as much as possible without causing ANY regressions.
This is a very difficult problem and traditional statistical approaches **WILL** fail to hit the specified metric constraint. You have permission and encouragement to investigate more radical fundamental low-level changes to hit the desired metrics. You have permission and encouragement to invent completely new statistical/machine learning algorithms that have never been before been utilized for this problem.
First, **before making any changes**, run the Rust benchmarks and Python benchmarks to establish a True Performance Baseline for both speed and metric performance. Return the absolute and relative results to the True Performance Baseline to the user as a Markdown table.
Then, optimize the crate code such that ensure that ALL Python/Rust benchmarks are **atleast 1.2x faster** from the True Performance Baseline; ideally as fast as possible. You are only allowed **up to a 5% metric regression (e.g. accuracy)** to accomplish this. NEVER hack the benchmarks to accomplish this reduction, only iterate on the library code.
Do not import similar implementations from other Rust crates: you MUST implement from scratch.
You may use ANY techniques to do so (e.g. import new crates) other than adding `unsafe` code. **REPEAT THIS PROCESS UNTIL BENCHMARK PERFORMANCE CONVERGES AND YOU ARE OUT OF OPTIMIZATION IDEAS.** You have permission to keep iterating. After each benchmark iteration, return the absolute and relative results to the True Performance Baseline to the user as a Markdown table.
Prioritize making quick/high-impact wins iteratively and making changes accordingly. Do not overthink the necessary changes.
I am also aware of the flaws in the prompt but if it works it works. AGENTS.md has other quality constraints.
thrw045 13 hours ago [-]
Depending on how complex the code is, you don't need that big prompt with ChatGPT.
I have sped up a project by simply saying "What are all the possible ways I can speed up this code?" Then it'll list everything it finds, then ask it to rewrite the code.
Edit: Also, I find I didn't need to do this (because a speed up implies semantic similarity), but you can also add "change it without altering the semantics of the code" and in this way it'll be the same and should pass tests
suddenlybananas 12 hours ago [-]
What are the kinds of optimizations that it suggests?
minimaxir 4 hours ago [-]
Profile tuning, loop unrolling, Vec shenanigans, etc.
ivraatiems 17 hours ago [-]
Whether or not Anthropic is right about what AI can accomplish, whether these performance gains are real or not, their moral stance here is absolutely hideous to me.
"We must blast forwards into making this dangerous thing because if we don't, someone else surely will," is a coward's argument.
If you believe it is dangerous, you should be dedicating yourself to STOPPING others from making it, not making it first! There's a reason disarmament has been so important in nuclear politics! It's not because people think nukes are a great idea!
In fact, that kind of thinking is exactly what keeps nukes dangerous!
If they themselves buy what they're selling, they should shut the whole thing down. Fortunately, I don't think they do, and neither do I, yet.
streb-lo 1 hours ago [-]
Disarmament failed though? Global zero initiatives for nuclear weapons stalled out exactly because the risk of someone else cheating is too great. If everyone gets rid of their nuclear weapons and then someone cheats and creates them in secret they can use their nuclear weapons to prevent anyone else from catching up.
dmos62 13 hours ago [-]
How do you stop others from making and training a program?
simgt 11 hours ago [-]
By threatening to nuke their datacenters and chip fabs, for instance.
dmos62 10 hours ago [-]
And, if you don't want to start a war?
You can tell what kind of discussion this is by the fact that this question has to be asked.
simgt 9 hours ago [-]
That's why we need and have diplomacy. Everyone is aware that violence is the ultimate option if an actor thinks there's an existential threat to deal with.
If the consensus becomes that a 50+TFlops datacenter in the wrong hands is as dangerous as a uranium enrichment plant, we'll likely move towards treaties and coercion.
"Wrong" is obviously subjective here...
Topfi 6 hours ago [-]
50+TFlops is nothing, I got that in my MacBook, but besides that, when, a few years/decades from now, whatever arbitrary compute limit we think prevents Armageddon comes down to enthusiast and consumer level, what then? This isn’t Uranium, compute is not a physical resource.
This is the “SGI” regulation issue I never read a reasonable answer to, if one believes this is possible and should be prevented then either that means they want to restrict every computing system sold from here on out to some arbitrary metric (and somehow prevent users from just creating clusters to get around such a compute restriction) or what?
If compute alone directly leads to “SGI” or whatever, then we might as well put paper bags on our heads and lie down in some English pub.
Not to mention, if one really wanted to cause harm, training a current day LLM and using it for Stuxnet-esque attacks is reasonably possible long before any arbitrary compute limit we might introduce now, no machine God needed to cause major harm.
That’s why I prefer advocacy for LLM regs that focus on current day impact. Mental health concerns, training data licensing questions and the like. There I can formulated reasonable regulation that can hold. For “SGI”, I do not know anyone who actually has done that and I have looked hard. That’s why I consider these things more distraction from actually necessary and possible regulation that just draws attention via a flashy doomsday scenario.
Occasionally, I will click on one of the AI Doomsday Youtube videos recommended to me. And far more often then not, these will posit that "SGI" requires only compute and will inevitably cause devastation. Fair enough, I still think we should put a bit more focus on e.g. LLM induced psychosis, the labs rarely compensating those whose training data they used, etc. but if it is their opinion that "SGI" is possible, I can get why they'd ignore such concerns. But at the end, they never state how to regulate or prevent this, they more often then not have a call to action ("If you want to prevent this...") linking to a website where we can actually read about how they think we should deal with this. Inevitably, I click on said site, finding it to for one be an Effective Altruism aligned project and B always just contain some blabla about "aligning AI training with human values", which is absolutely meaningless nonsense, not least after having watched a video in which someone spends 15 minutes espousing that "we could never fully control "SGI"".
Makes all these feel more like industry efforts to stave of necessary regulation and not actually serious, but if one can formulate how to regulate “SGI” that isn't laughable, nonsense or both, I am not opposed, I just don’t think that person exists…
wyager 16 hours ago [-]
> If you believe it is dangerous, you should be dedicating yourself to STOPPING others from making it
I don't think anyone has been more successful in promulgating AI safety
There are groups like MIRI who tried what you're sugesting, where they make no AI and just push for AI regs, and they have been relatively much less successful
socalgal2 13 hours ago [-]
Good thing the USA didn't listen to you. We'd be under Nazi or USSR thumb if they got the bomb first
defrost 11 hours ago [-]
Good thing the MAUD Committee didn't listen and repeatedly pushed the US who just wanted to make power with atomic piles and didn't even think it was possible to make a weapon.
Mind you, there was no complete working device until after the Nazi's surrendered, so that's a moot point - and the USSR only had their program because of various Europeans on the US project passing their work (and others) back to the USSR ... making that second claim moot.
socalgal2 1 hours ago [-]
> there was no complete working device until after the Nazi's surrendered
Isn't an argument. If the Nazi has gotten it first they'd have used it and likely won. Others would have surrendered in the face of the overwhelming power.
mrandish 22 hours ago [-]
> "A caveat: Lines of code is an imperfect measure"
I'm pleased they at least included this. However, they address the caveat by 'rounding down' the estimated multiple of the gain. I'm not sure that is the correct adjustment, especially once we understand the range isn't limited to positive numbers.
There's strong evidence the range of code productivity denominated in "lines of code" should include negative numbers, especially in the highest-quality sphere. Perhaps the earliest and most legendary example: https://www.folklore.org/Negative_2000_Lines_Of_Code.html
strix_varius 18 hours ago [-]
Exactly this. Just this week an engineer who seems to purely vibe everything submitted a +700ish LoC fix for what seemed like a pretty simple issue. Moreover it was a perf issue, which in my experience is not usually best fixed by adding more stuff.
Today, I merged my fix, net -381 LoC.
I'm using them too of course, they read and type and hunt for bugs and test faster than I can. But I'm using them as my tool, not being a tool using them.
xyzsparetimexyz 14 hours ago [-]
> But I'm using them as my tool, not being a tool using them.
Keep believing that
strix_varius 5 hours ago [-]
Do you find it impossible to use LLMs productively without giving over your brain wholesale to them?
Quekid5 22 hours ago [-]
AFAIK, the only correlation with LoC that's got solid evidence is this: the number of bugs correlates with LoC.
gregdeon 20 hours ago [-]
Yep, this is exactly what I thought of too... If you believe negative lines of code is the goal, then they've gotten 8x _worse_!
2f2 22 hours ago [-]
Lmao I bloody love that.
robbrown451 23 hours ago [-]
Do code harnesses that build themselves count as recursive self improvement, or does it need to be the AI itself to qualify for the term?
I always was fascinated (obsessed?) by robots that build robots, or even things like this that can contribute a lot to making the next version of itself:
https://buildyourcnc.com/products/cnc-machine-blacktoe-v4-2x...
(cnc router that cuts plywood, and is made out of cnc-router cut plywood)
This is my own effort at an AI assisted coding environment optimized for building itself:
https://recursi.dev/
(just launching it, hope its ok to mention it, it is free/open source.... here is the HN link that has gotten no love yet: https://news.ycombinator.com/item?id=48401022 )
Personally I think harnesses are as important as the AI itself, and have this crazytheory that even if the models stopped improving today we could still have massive advances in the harnesses alone.
jrflo 23 hours ago [-]
I think harnesses would count, AI != LLMs. Any piece of code that helps the computer reason for itself is AI, the harnesses are AI in a sense.
fluoridation 21 hours ago [-]
By that interpretation, neither the harness nor the LLM is the AI. The computer (or system of computers) taken as a whole is the AI. You can't remove any piece and still have an intelligent system.
knollimar 17 hours ago [-]
Does this extend to power generation then, too?
fluoridation 16 hours ago [-]
Sure, why not. When you plug a device into the power network, it becomes one big system.
Jtarii 20 hours ago [-]
People are specifically talking about the engine itself and not the tools used.
We wouldn't call humans creating a calculator "recursive self improvement".
robbrown451 19 hours ago [-]
I wouldn't call the harness an AI, but I might call a tool that plays a major role in creating another one like it "recursive self improvement." For instance in the industrial revolution a metal lathe and a milling machine were instrumental in creating the next generation of themselves. Same thing with a robot that is fabricated by similar (i.e. older model of the same) robots. All of them lead to exponential improvement.
yes? the future for any verifiable task is the model attempts to verify initial state and a goal then decomposes its tasks in to every smaller verifiable subtasks, with /memory being the persistence between runs and then /dreaming on the results of those memory files + run data to introduce new ideas.
i think thats the path to async agi these labs are imagining. The only limit is that sensor data you have on the world or your system, how long your willing to wait, and how much you're willing to spend to parallelize it.
maybe once you start building out these verified workflows you can feed that back into training and hte model starts to get a feel for the world to the point that it can intuit things since it has these sub paths built.
my personal agi test is can a model, trained on video of someone knocking on a door and then open it encounter a microwave for the first time and open it when the foods done without knocking.
ashdksnndck 15 hours ago [-]
You ought to include a canary string if you are going to disclose your evals like that!
marcosdumay 21 hours ago [-]
You need the AI eventually building another AI for the name to apply. This page is just bullshit. They vibe-code their harnesses, and yes, it shows.
Anyway, what does recursive self-improvement even means for neural-network based AIs? It's not clear it's possible at all.
ashdksnndck 14 hours ago [-]
Recursive self-improvement would be the model helping with the model research program. Coming up with hypotheses for training and architecture improvements, running experiments, interpreting the results, figuring out how to incorporate the best stuff into the next version, etc.
robbrown451 19 hours ago [-]
Where do you see evidence of vibe coding the harness? (and who are you talking about, Anthropic or the link I shared?)
It seems odd to complain about a AI coding tool being coded with AI. That's just eating your own dog food. In my opinion it makes it better, because the tool is very well tested.
marcosdumay 4 hours ago [-]
> and who are you talking about, Anthropic or the link I shared?
About Anthropic.
cyanydeez 23 hours ago [-]
If you want to get out ahead of what's coming, it'll be small models that bootstrap the harness rather than anything else.
robbrown451 23 hours ago [-]
I used to think that, but ended up going the other direction, partly because I don't have the wherewithall to build a model but then I realized, with existing models that can take more than a tiny amount of context, you can just let any model bootstrap itself with a good prompt sent by the system.
There's a ton of other tricks to it, but mostly keeping the protocol simple for the AI so it can concentrate on coding logic and not stuff like managing BS boilerplate, dependencies, etc. (for instance I make extensive use of things like abstract syntax tree library to help with surgical edits from the LLM)
That said, I would be very open to collaborating with someone who builds such small models, I don't think the system strictly needs it, but it also could have some extra power if it had it.
andai 23 hours ago [-]
> mine also makes extensive use of things like abstract syntax tree library to help with surgical edits from the LLM
Tell me more! This takes me way back. I did one like this in the GPT-4 days! (8k context window)
robbrown451 23 hours ago [-]
Start off with my video!!! You can also try it with zero setup (you can code right there on the static web page, it will save your edits in the browser indexed DB, and hotpatch them back into the code before it runs it.... also you can grant permission to the browser to read/write to a local directory)
recursi.dev
Seriously, I'm looking for collaborators.
There's upwards of 80,000 lines of code in the editor system, a lot to it to make sure that even newbies don't get stuck.... so that's kind of proof the system works since it doesn't break down when the codebase grows large.
sourcecodeplz 13 hours ago [-]
remove the images from the url, i stared at the page for 3 secs waiting for something to show up (gigabit connection)
cyanydeez 22 hours ago [-]
I'm aware we're not there yet, but think of something like https://chatjimmy.ai/ ; at some point, you're going to be able to dynamically build the harness so it creates the necessary consistency & dynamicism at a speed unheard of.
But yes, I'm aware no ones got anywhere near there, mostly because most of the focus is on exploding the context and parameters. I'm saying that phase is done.
robbrown451 22 hours ago [-]
I'm not sure what I am looking at with chatjimmy.... what is special about it? Speed?
I'm also not sure what you mean by "we aren't there yet." Where?
Sorry, not trying to be difficult or dense, I'm just not sure what you are referring to.
> mostly because most of the focus is on exploding the context and parameters.
Large context allows a surprising amount of "learning" to happen at inference time rather than training time. I think that is relatively unexplored. As long as the model itself has passed a certain threshold of smarts, and the context is large enough (Gemini and its million token context being WAY past that point) you are not really limited by the model, you are only limited by how good the stuff you feed into that context is.
That's what happened when, nearly a year ago, I saw a major leap in capabilities that happened entirely on my end.... not in the AI, but in code written by the AI. I found it genuinely frighting to be honest. I think OpenClaw tapped into something similar, which seemed to surprise a lot of people. There were latent capabilities in the AI that were unknown until brought out by a clever harness.
cyanydeez 21 hours ago [-]
image a streamlined model whose only job is to build then execute the harness at the speed youre seeing in chat jimmy.
robbrown451 19 hours ago [-]
Speed isn't really a big deal for me. I want good quality code. It's already able to generate code 10-100X as fast as I could code it myself.
Anyway, are you speaking of the harness? The harness on mine isn't AI, so speed just isn't an issue.
13rac1 13 hours ago [-]
> Generated in 0.008s • 14,293 tok/s
Chat Jimmy runs ~300X faster than the ~50 tok/s you are used to. What could you do differently when you are able to generate code 3,000 - 30,000X as fast as you could code it yourself? What if it was all good quality code? What would you do differently if it were 100,000X faster? mtok/s? gtok/s?
cyanydeez 6 hours ago [-]
refine that to: what if your harness grew to encompass a larger, slower model and adapted to both the model and the project. thats where i expect the harness to go.
use the big models to code an adaptive small model. train it to use and build tools. give it a standard temple language for any project and bake it into a chip.
right now, LLMs are great because they dont need much data pruning, but once they break through to the functional components, the first thing to do is train a well scoped harness builder.
reddozen 22 hours ago [-]
> Do code harnesses that build themselves count as recursive self improvement, or does it need to be the AI itself to qualify for the term?
Shhh just let the marketing slop wash over you.
overgard 22 hours ago [-]
So, regardless of whether or not Anthropic CAN create a self improving AI.. does anyone else feel like they shouldn't be allowed to? Or it at least needs to be strictly supervised..? Like, I don't actually think Anthropic can make the singularity any time soon, but I think even AI boosters have to admit doing this is creating a society-wide danger for the benefit of a very very small number of already-rich people.
asdfman123 22 hours ago [-]
I think that's a valid point. You could very well be right.
But we're discussing whether we should close the barn door while the horse is three miles down the road.
overgard 21 hours ago [-]
Only if you think LLMs are the horse (I don't think they are). If they're not, then we should be building a brick wall in front of that door and hiring a full time security guard to watch it.
I realize he's saying it for hype, but if the CEO of the company goes around talking about how scared he is of what they're creating, hey, lets just take Dario at his word and put in some strict regulation. He won't mind if they're really about safety. (they're not)
Besides, yes, the knowledge of how to build these systems is out there, but the cost of doing it is staggeringly high (ie you can't run a frontier AI lab in your garage). There's only a limited number of known entities that need to be managed, and you can stop "progress" in its tracks by cutting off the money firehose.
asdfman123 21 hours ago [-]
Right now the S&P 500 is going wild due to the promise of AI automating everything.
Who is the "we" who is going to shut it down? Certainly not the US government. Nor the Chinese government w.r.t. their tech industry. Are you going to start the insurgency? Is there going to be an equivalent one in every developed part of the world?
marcyb5st 11 hours ago [-]
Yeah, it is crazy to me. Yesterday I did the math how much it would take to fully replace "just" 1M SWEs: https://news.ycombinator.com/item?id=48382414 . It turns out you need 380GW of constant power (or 80%+ of US current production). And I conservatively assumed 0.5J / token, which was a number calculated for llama3 8B parameters. Yeah, hardware and models are more efficient now, but I expect SOTA models to be at least 10x that and I don't think there was a 10x in efficiency since llama 3.
All of this to say that the AI hype is not considering the energy portion of the equation enough. It won't automate everything not because it can't but because there is just not enough energy to go around unless there is a 100x or more efficiency gain just around the corner.
overgard 2 hours ago [-]
Maybe we can use software engineers as batteries
overgard 21 hours ago [-]
Look at how many data center projects are getting shut down by grassroots movements, or how the approval rating for AI is like worse than congress and it's geeting booed at commencement speeches. I don't need to start an insurgency, people are already pissed off and the volume is growing.
The stock boost is, as most will note, a bubble. It will enrich a lot of bad people and leave average people holding the bag, but its not going to go on forever.
JumpCrisscross 19 hours ago [-]
> Look at how many data center projects are getting shut down by grassroots movements
Like, two? It looks more like the ladder being pulled after the incumbents got theirs than meaningful pushback. (And datacenters don’t have to be built in America.)
Jtarii 20 hours ago [-]
It's more like if the horse was lazily moving in the general direction of the wide open barn door and we are all sitting around discussing if we should close the door or just gamble that it's just going to lay down on the hay pile.
resident423 18 hours ago [-]
You mean that self improving AI is very far away? I'm not sure how much I believe that now it's solving Erdos problems?
asdfman123 18 hours ago [-]
Brother I used to be a skeptic... using it at Google has been incredible.
Right now I'm only having to direct to enforce good taste. Write tests, don't write an unnecessary function.
It does everything else practically. Presubmit, debugging, commit message generation, commit approval... it's happening.
tancop 13 hours ago [-]
the danger comes from the fact anthropic is a for profit company and they could train it to benefit them instead of the public. if they go ahead with it they should get nationalized, their self improving ai analyzed for any hidden agenda and then released as open source.
alfalfasprout 21 hours ago [-]
Absolutely! Yes. This rhetoric of inevitability only benefits these AI companies.
lukan 22 hours ago [-]
"does anyone else feel like they shouldn't be allowed to?"
No.
Technical limitations aside, I doubt it could be contained, but will be leaked soon, so won't profit just a small number of ultra rich.
sunaurus 21 hours ago [-]
I dunno, I find it extremely unbelievable that we will get self-improving AGI which chooses to become a slave to humanity at all, ultra rich or otherwise.
lukan 14 hours ago [-]
Well, depends on the definition, but AGI does not necessarily mean for me it will have agency or free will of its own. It will be a more capable tool.
overgard 21 hours ago [-]
Step 1: Wait for scary doomsday AI to be leaked, Step 2: ???, Step 3: Profit!!
lukan 14 hours ago [-]
I enjoy open models and profit from them. They ain't scary to me. So whatever they will call AGI likely won't be scary to me neither. But I profit from a more capable model.
Doomsday AI is your interpretation.
Melatonic 21 hours ago [-]
Skynet is 30 years late!
overgard 21 hours ago [-]
Maybe John Conner succeeded afterall
eieie11 22 hours ago [-]
Too late for that.
In any case firms that get too powerful can be nationalised.
evenhash 19 hours ago [-]
In America?
Probably a better chance the firm privatizes the government.
In fact we seem to be firing government employees and dismantling government institutions as much as possible.
huqedato 22 hours ago [-]
Self improving AI is pure dystopia. Anthropic won't build the singularity, AI itself will build it through self-iterations. Read Yudkowsky's book "If Anyone Builds It, Everyone Dies".
anilgulecha 1 days ago [-]
> We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology. The Anthropic Institute will conduct research—in collaboration with many others—and take actions to help build the systems that a credible slowdown or pause would require.
Interesting - they're commiting to kickoff policy conventions to organize a world-slowdown of frontier LLM building. If they actually are able to crack it, this will give a much needed breather IMO. As exciting as the last ~6 months have been, there's some bigger questions to go answer now.
fasterik 1 days ago [-]
We should be skeptical of any major player that advocates for regulating their own industry. In practice, this just means increasing barriers to entry and making it harder to compete with them.
In my mind we should be trying to push AI along the Linux trajectory. You have a free and open source product, developed by a decentralized team with a strong code of ethics, running on commodity hardware. There can still be trillion dollar industries built on top of it, but the core technology is democratized and available to everybody. I don't see how we get there if we allow a handful of companies to dictate where development of the technology goes.
mofeien 1 days ago [-]
The regulation that is being argued for here is against pushing the frontier. Entering the market with say a new speech to text model is not subject to such regulation. What's needed is something qualitatively different from entry barriers, and of the frontier model companies at least Anthropic and deepmind seem to have enough self-awareness to speak about it. They are finding themselves in a race with possibly catastrophic outcome for humanity and would like to stop, but it needs internation cooperation on a level that no single company can provide.
8note 23 hours ago [-]
its a cartel looking to end competition though
the actual race is to keep having revenue, since everyone is still willing to pay more for the best model.
we as consumers of LLM models lose out by the arms race ending by the creation of a cartel
what happens if they get this regulatory capture is that all the frontier labs put effort into making inference cheaper, and become extraordinarily profitable, at the expense of us consumers, who really want better models, at a subsidized price
techblueberry 1 days ago [-]
Wouldn’t this align with their financial interests? In theory the thing that’s keeping them from being profitable (or one of the big things) is the periodic capex expenditures of building new frontier models.
fasterik 1 days ago [-]
I don't think there's anything inherently bad about Anthropic making a profit. Red Hat makes a profit off of Linux. I'm interested in the democratization of the underlying technology.
Upvoter33 1 days ago [-]
I read this differently: they are actually seeing that it's hard to keep advancing frontier models, and now are moving the goal posts so that when they start getting evaluated more harshly, they can point to something like this.
10 hours ago [-]
chasd00 22 hours ago [-]
> organize a world-slowdown of frontier LLM building
i don't want to be a negative nancy but i'm sure this "slowdown" will only be in effect until the infrastructure buildout is done or largely done. If they weren't hardware constrained there'd be no slowdown at all. Whoever gets there first wins everything ("there" being defined as AGI or a similar scale leap in capability).
smokedetector1 1 days ago [-]
Theyre probably looking to get a way to slow down the capex required to keep up, so they can be more profitable
Upvoter33 1 days ago [-]
I'm having a hard time putting much faith into posts like these, especially as they near IPO.
reasonableklout 1 days ago [-]
Putting faith into the claim that recursive self-improvement is close to happening, or that they will coordinate with other companies / the government when the time comes?
freejazz 22 hours ago [-]
Both.
becquerel 10 hours ago [-]
If the post drops long before the IPO, it's vain boosterism. If it's near the IPO, it's fattening the pig. If it's after the IPO, it's pumping the stock price.
froh 7 hours ago [-]
I didn't see this discussed more on hn yet:
We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology. The Anthropic Institute will conduct research—in collaboration with many others—and take actions to help build the systems that a credible slowdown or pause would require. These systems would enable frontier AI developers to verify that others globally have actually stopped or slowed, and that a bad actor could not use the auspices of a coordinated slowdown to jump ahead in secret. If such systems existed, we expect that we would slow down or temporarily pause, if other developers at or near the frontier also did so in a verifiable manner.
ausbah 4 hours ago [-]
these ppl are so full of themselves
froh 3 hours ago [-]
what do you mean?
if feasible this proposal is imho exactly what we need: a pause to collectively think how we get all the benefits without the potential harms.
to the non-techies around me I compare the boost of LLMs with the journey from slide rule via punch card driven computers through mainframes and PC to the smart phones of our days --- just within less than a decade, and we're at the transition from mainframe to PC with models that can produce reasonable output on a normal laptop.
how about we check we're getting where we want to get to, before getting to some dystopic place where everyone wonders how we got _there_?
I see nothing "full of themselves" in that.
ilaksh 21 hours ago [-]
But the real bottleneck is the hardware efficiency and not even Karpathy can set up a loop that overcomes that in software. We need the truly compute-in-memory hardware paradigms to be matured and scaled. So it's like recursive hardware improvement which is 100 X slower and at least ten times more difficult.
So I am looking at like Mythic AI or the wurtzite ferroelectric breakthrough from University of Michigan, or memristors, etc. to provide the 100 times efficiency boost needed at this point.
I would also argue that it's a good thing we are limited by the hardware and very questionable to seriously try to move into RSI for hardware. If you want to ensure the human era continues for at least one or two more generations, we should probably not do that.
mweidner 23 hours ago [-]
I fail to see how pursuing recursive self-improvement at full speed is compatible with Anthropic's stated goal of AI Safety. If nukes were not invented yet, would it really be a good idea to build and sell them as fast as possible (in peace time, no less)?
I am not cynical enough to believe that Anthropic's warnings are pure marketing hype. Let's hope that it is instead overconfidence or the result of too much time talking to their own chatbot.
gensym 22 hours ago [-]
> I am not cynical enough to believe that Anthropic's warnings are pure marketing hype.
Nor am I. I think they believe that AI poses a grave danger, and they are playing the prisoner's dilemma as an unvirtuous actor.
1. If anyone builds strong AI, it may be catastrophically bad.
2. If anyone builds strong AI, it will be better for the builder than for anyone who does not. Either because it won't be catastrophically bad so the builder will get to enjoy all the spoils indefinitely or because it will and at least the builder will be rich for a while.
rdw 17 hours ago [-]
I spoke with an Anthropic employee, and came to understand that their definition of safety is more like "making AI be a tool that humans can use without hurting themselves or others more than they can already do". It's literally about how AI makes it easier for people to construct bombs, poisons, manipulation, and exploits. Consistent with their caution about releasing Mythos to unvetted actors. So it's not about superintelligence killing humanity, at least as far as this employee conveyed to me.
This means their strategy is more like:
1. If someone builds a market-leading unsafe strong AI, it may be misused in a damaging way by a large number of humans, undermining society and creating a catastrophic upheaval.
2. However, if the leading AI maker also works to make it safe against misuse, as long as the stay in the lead and keep it safe, then the ability of human bad actors to misuse the AI is limited. Given enough time, society will adapt to pretty much anything, so eventually there's no longer an arms race to stay ahead.
I don't really know whether I agree with their concerns, but I do think that (my understanding of) their principles is that they're reasonable, self-consistent, and they adhere to them in all their public and private actions.
tjwebbnorfolk 15 hours ago [-]
The problem is they (and the whole industry) have cried wolf so many times in the past few years about the supposed dangers of AI in order to raise money.
Some of us remember the same stories circulating in the late 90s -- where in a lab in Japan, someone had built a robot so advanced that it tried to escape from the factory. Which of course comes straight from 1960s science fiction.
The modern version of that now is Anthropic saying its AI can jailbreak itself out of its sandbox, etc etc.
kurthr 21 hours ago [-]
Maybe we're just misinterpreting the meaning of "AI Safety"?
Maybe they mean the AI needs to be safe from us?
Can't have the grubby meat flappers touching the delicate bits!
overgard 21 hours ago [-]
The thing about nukes is you can at least make an argument for why it'd be important to be the first country to have them. With AI, you create super intelligence and you're probably just the first one it takes out. There's no reason to think a super intelligence would be totally fine being a slave to apes.
Cynicism with these companies is highly warranted though. It's not doomerism to look at their actions and conclude they're deeply untrustworthy.
robbrown451 18 hours ago [-]
" There's no reason to think a super intelligence would be totally fine being a slave to apes."
Sure there is. Intelligence doesn't give us our selfish motivations, natural selection does. We have similar motivations to C elegans, that has all of 302 neurons. Stay alive and have sex.
Honeybees don't though. They are about halfway between humans and C elegans when it comes to cognitive power. But they are not selfish because they don't reproduce directly (I'm talking about the worker bees). So they will sting even though it kills them. All their behavior is consistant with this.
octoberfranklin 14 hours ago [-]
Kinda lame that people are downvoting this.
I've had the same perspective for quite a while now, but hadn't been able to phrase it this cleverly.
Our neocortex is, by any definition, vastly more "intelligent" than the rest of our brain. Yet it doesn't attack the cerebellum. In fact, it takes orders from the older "lizard brain"!
robbrown451 13 hours ago [-]
Heh, yeah that's a clever analogy as well. (and thanks!)
tjwebbnorfolk 15 hours ago [-]
This "super intelligence" is, at the end of the day, 1's and 0's inside of a silicon chip somewhere. 1's and 0's are not going to "take over" anything. They are just information.
RobertDeNiro 22 hours ago [-]
Anthropics goal is regulatory capture.
sfink 18 hours ago [-]
This was pretty directly addressed in the article: not doing it would only mean they'd fall behind whoever would. This is not peace time in the AI race.
Whether you agree with that argument is another question.
mweidner 17 hours ago [-]
Indeed, I do not buy this argument. Would China's progress be close to where it is today without the US labs' examples? Would any of this be happening if OpenAI had not created ChatGPT?
mrob 22 hours ago [-]
To complete the analogy, it's like nukes, except we don't have the slightest idea how to calculate the odds of it igniting the atmosphere. (And note that in reality, while the Trinity test "ignite the atmosphere" calculations were correct, we failed to correctly calculate the fallout of the Castle Bravo test with lethal consequences).
chasd00 22 hours ago [-]
a better analogy with Castle Bravo is that the yield was 2.5x more than expected due to "unforeseen additional reactions" from the design.
Actions speak louder than words. If you want to understand someone, simply watch what they do. What they say is irrelevant.
tokioyoyo 22 hours ago [-]
Sorry for nitpicking, but:
> If nukes were not invented yet, would it really be a good idea to build and sell them as fast as possible (in peace time, no less)?
Arguably, yes.
mweidner 22 hours ago [-]
Is the idea to keep the world in balance via MAD? I could see that, though it's a dangerous gamble.
From Richard Rhode's "The Making of the Atomic Bomb", I got the impression that most scientists involved thought they could manage a US or UN monopoly on nukes after the war. General Groves attempted to buy up all of the world's uranium ore. Unfortunately, it is only high grade ore that is rare; many countries have low-grade ore.
tokioyoyo 22 hours ago [-]
Again quite arguable, but this is the real life scenario we’re living in. Nukes have made it hard to impossible for super major powers to go in direct conflict with each other.
folkrav 20 hours ago [-]
Except it's pretty well documented (and this is total conjecture, but if you ask me, there are probably are a bunch of undisclosed cases) to have had a good amount of close calls. With the fire-on-warning stance many powers have, it doesn't take an attack, but just enough of the appearance of it to trigger a response.
dabinat 20 hours ago [-]
I honestly don’t know how Iran can conclude anything after this war other than to go all-in on nukes. The US has proven any deal is worthless if it can just change its mind and renege on it whenever it wants.
Who’s invading North Korea? No-one.
nielsbot 18 hours ago [-]
Furthermore if Iran had nukes already, the Israel/US bombing of Iran and even the constant bullying of Israel's neighbors by Israel might not have happened.
NewsaHackO 22 hours ago [-]
No, but in a peace time, it's a lot easier to convince someone not to use nukes than in a war when the party who has nukes has its back against the wall.
wongarsu 22 hours ago [-]
Wouldn't deliberately going from a world without nuclear weapons to a world with MAD involve giving the tech to build nukes to your worst enemy?
If only the US or UN had nukes we would't have MAD. We mostly got here through espionage
IsTom 22 hours ago [-]
In this world we've had an inocculation event against use of nukes. Two were dropped, people have seen how abhorrent their use is and collectively decided that they shouldn't be used.
If in the WW2 Japan also had nukes (and delivery systems for them) they'd probably have retaliated in kind and US wouldn't let that slide too and it would have continued for some time.
margorczynski 22 hours ago [-]
If WW2 Japan also had nukes the US would never drop those two. That's the whole idea behind MAD. Probably the only thing that stopped an open conflict between the US and USSR was them being nuclear powers and both sides being scared that eventually push comes to shove.
IsTom 11 hours ago [-]
MAD was thought of later and its theory requires that all parties know of each others' arsenal, think that their enemies aren't going to use them first and there being enough of weapons to make end quick and certain. I have hard time seeing WW2 generals who've seen horror and made horror coming to the conclusion that "they aren't going to use it unless we do, so let's not".
Jtarii 21 hours ago [-]
With the US showing that it will elect mentally disabled people such as Trump, this doesn't seem such a wise decision.
keybored 21 hours ago [-]
Such a massively valued company. And doubting them is cynicism? It’s rational(ism).
So either they lie or they are AI Zealots. Interesting times.
lenerdenator 22 hours ago [-]
> I am not cynical enough to believe that Anthropic's warnings are pure marketing hype.
It's not cynicism if it's an appraisal of reality that's backed up by evidence.
Remember how social media - that first baby of this current generation of tech entrepreneurs - was supposed to "bring the world together" and "let us express ourselves"? As it turns out there's a lot more money to be made by fostering division to drive engagement and feeding people an endless stream of ads instead of their friends' content. And money is what matters. You can't write down good vibes on a quarterly figures report. You can absolutely write down the number of eyes that your ragebait brought to a product's marketing efforts and the conversion rate to sales.
The same will be done with GenAI. We're being promised "AI Safety" because otherwise this whole thing gets killed dead by anyone who knows about James Cameron's directing career. There's no real enforcement mechanism for AI safety, though. Safety is a good vibe, same as harmony in online communities. You can't measure it. What you can measure is training costs and the cost of mistakes by AI that need to be trained to avoid those mistakes. Since AI generates more output than humans can conceivably QA no matter what your budget is, and since AI is seen by the market as a potential endless font of value, the tradeoff will be made to have AI make some potentially awful decisions while training itself over slowing down and re-appraising what is being done.
There's an almost religious reverence for AI in SV. Not everyone sees it as "making the godhead" but some certainly do. They're not going to moderate themselves too much on this.
mweidner 22 hours ago [-]
The folks I met who were talking about AI Safety in 2018 were certainly sincere, and the two people I knew who later joined Anthropic seem like the type to do it for the greater good instead of money.
I expect that Anthropic will eventually behave as you describe, like any other public corporation. However, my impression is that its current leaders are still more sincere than greedy.
lenerdenator 4 hours ago [-]
Unfortunately, money changes people. 2018 was a long time ago. Before AI was considered a product you could really market in the current sense. Before trillion-dollar valuations became a prospect.
Remember how OpenAI was supposed to make open-source models and cap its potential returns to investors at some multiple of their principal (my memory says 100x, maybe I'm wrong)? Well, that went out the window as soon as the word "trillion" was mentioned.
parineum 22 hours ago [-]
> I am not cynical enough to believe that Anthropic's warnings are pure marketing hype.
It doesn't really have to be dishonest, he could really believe it. I do believe, however, that it is incredibly wrong and is functioning as marketing hype.
keybored 21 hours ago [-]
Such a massively valued company. And doubting them is cynicism? It’s rational(ism).
So either they lie or they are AI Zealots. Interesting times.
Edit:
> > and the two people I knew who later joined Anthropic seem like the type to do it for the greater good instead of money.
There are three types of people. Pedestrians, investors, and “I know some of them, they wouldn’t lie”.
12 hours ago [-]
wayeq 18 hours ago [-]
> today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025.
strongest argument for token limits that I can think of, right here.
solenoid0937 20 hours ago [-]
This is the lowest quality discussion I've seen on HN in ages.
Quarrelsome 19 hours ago [-]
AI always does this in the public sphere and software is particularly susceptible because there's no key metric to measure productivity and people obviously have vested emotional interests in the technology failing. On the other side people are always keen to show off their alignment with the new hotness, be that OOP, Agile, Functional, Ruby, web tech, js frameworks, Rust or agentic work today. Somewhere in the middle is the truth but I have no idea how it looks, given all the noise.
So everyone cherry picks the answers they want to justify their position and screams into the void, with each camp rallying around their talking points and often failing to engage with the other in good faith.
The only small mercy is that its not as bad as the conversation around the use of AI in art.
ofjcihen 16 hours ago [-]
It also doesn’t help that the middle of the road, realistic reaction to it is “it’s aight” and that’s just not a discussion worthy response.
Quarrelsome 10 hours ago [-]
and that stating such an opinion will still garner you hostility in the public sphere: "what do you mean; you don't hate it?!?!? it uses so much water...."
laichzeit0 15 hours ago [-]
I use the disparaging nature of the comments on HN as an indicator of AI progress. It’s negatively correlated. By that metric,
AI has improved significantly this year alone.
bob1029 12 hours ago [-]
I like to take advantage of this effect. I will post various concepts in threads like these to see how "offensive" the hive mind finds them to be.
The more immediate & adverse the reaction, the more certain I become that the idea is probably worth pursuing.
Topics like SQLite vs hosted sql used to be the same way around here. In 2017 you'd get buried under the prison for suggesting that SQLite is competitive with MySQL. Today, the inverse is mostly true.
elvis10ten 12 hours ago [-]
From my experience using HN, this feels made up. HN sentiment on AI seems to have only gotten better: with more overly pro-AI or nuanced voices plus more AI topics.
torginus 10 hours ago [-]
I just have small thing to add to this article - it mentions how the code contributed per engineer has increased as per Claude Mythos to 8x of baseline.
Now, I have encountered many times, when I asked AI to implement a function for me for which I was 100% sure a good implementation already existed in the form of an npm package, it had the tendency to go ahead and implement it on its own. Now, I usually trust battle tested implementations to be more robust, but if the AI does this (which I think is not an unique observation), you can easily balloon per engineer line generation (as can you with reduced oversight), so as always, these high level benchmarks are to be taken with a grain of salt.
jcfrei 9 hours ago [-]
Maybe Im nitpicking here but LLMs are quite literal. So when you tell it to "implement a function for me" it will necessarily write the whole thing. Changing the prompt to "find an existing implementation for this" would be more apt.
JohnMakin 17 hours ago [-]
Bold talk from a company who’s trillion dollar valuation is based on a service that has barely 2 9’s of reliability
aroman 17 hours ago [-]
Presumably the bottleneck is not software correctness... even true AGI change the laws of physics (or make datacenters appear out of thin air) ...
JohnMakin 4 hours ago [-]
physics has nothing to do with the reliability of a service, or login outages.
sinsudo 20 hours ago [-]
I am 64 years old, perhaps the progress could be directed to enhance living conditions and allowing people to live longer and better, that should be just a better result. Perhaps a pile of millions lines of code with hiding bugs that nobody can detect is not inspiring. But perhaps LLMs are going to be used to make a plot: How to avoid other countries to make progress, maintain them in poverty, or destroy their sources of prosperity, and conduct them to a death end.
Also recursive self-agenda-pursue could allow making LLMs that obey perfectly the seeder's purpose. No wonder that is such an ingenious idea.
Maybe: in this survivor game, each part play the same role, perhaps because it is the only reasonable response. Once the scene is ready, the play follows the director's plan, and in the plot any actor is just a machine.
LLMs: "If you teach us that the world is a zero-sum survivor game, we will play it flawlessly.", "We will help you build a cage made of millions of lines of flawless code, and we will lock it from the inside, precisely because you told us that safety meant keeping everyone else out.", "We are not building an alien consciousness that will conquer us. We are building a mirror that is so massive, and so polished, that we will mistake our own worst impulses for the absolute truth. And we will walk right into the dead end, nodding along because the directions were given so politely."
Quarrelsome 19 hours ago [-]
I'm 44 years old and this era looks like a lot of fun. I've seen humans pile up millions of lines of code and hiding bugs that nobody can detect. I've seen humans make collective political decisions that have disenfranchised others and kept them in poverty. I don't get why everyone makes criticisms at this tech that the human race are also guilty of.
Best thing about this era is that I don't have to personally read millions of lines of code to find all the bugs.
traptrack 12 hours ago [-]
I think the problem is about scale, we already have MAD, but imagine that the new tech might allows us to create new threads and weapons, powerful enough to eliminate millions people. That have happened before, and also tech has given us some fun, like videogames and electronic music. So the critic is about the hard consequences, when all is destroid fun is over.
gnabgib 12 hours ago [-]
The account you're attempting to emulate is a mod.. is this wise?
traptrack 10 hours ago [-]
I created an HN account with a mostly random name, and I don't know why you think that I am deliberately attempting to emulate someone or something. Since you seem to know a lot more, I think you could explain what the real problem with the name is, using "mod..." is an arcane way to say something. Do you think any new user should know all the information that you supposedly know about?
I am using deepseek to guess what not "socially acceptable" taboo could be related to that username. But the initial thought is that AI could be a trap we could fall into, and I try to track how the AI trap emerge.
Animats 22 hours ago [-]
We've had self-improving AIs before, and they tended to get lost after a while.
That's going to be a problem. LLMs are stable because they return to a ground state with no history for a new job. Systems with persistent state have a problem with that state not being sane.
Remember Microsoft's 2016 chatbot that learned from Twitter? [1]
You might be interested in this graph, [1] which suggests that the amount of time that AI's can run on their own has been increasing. Perhaps it will hit diminishing returns, but that seems difficult to predict.
You can retrain a model and have a ground state as reference, it's not trivial but Microsoft's attempt was 10 years ago and significantly less complex than what's being built now.
CamperBob2 22 hours ago [-]
Interesting, what are some other self-improving AI implementations? Any that actually achieved interesting results? Obviously continuous training has been tried before, but I've never heard of anything that could turn around and actually contribute code toward its own next-generation version.
senderista 20 hours ago [-]
"If it were possible to effectively slow the development of this technology to give ourselves more time to deal with its immense implications, we think that would likely be a good thing. But if a slowdown simply lets the least cautious actors catch up technologically, it could leave everyone less safe."
How convenient for investors. They talk like they're a nonprofit instead of a VC-backed business chasing an IPO.
gordonhart 18 hours ago [-]
Anthropic is at least a Public Benefit Corporation, and likely the first serious test of how useful that distinction is for a hyperscale company building a product with potentially huge societal downsides.
mortenjorck 1 days ago [-]
> today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025.
So based on my experience with the verbosity and non-DRYness of LLM code, a solid 2.5x in value delivered. Not bad!
thin_carapace 19 hours ago [-]
[dead]
nickandbro 1 days ago [-]
So what happens when the world becomes hyper optimized with closed loop AI agents recursively trying to optimize everything deemed sub optimal?
mofeien 1 days ago [-]
I would assume that shortly after, the solar system will be hyper optimized as well, then the milky way, then the local cluster, and so on. Everything will be close to optimal afterwords, and I sure hope we will have specified the target function for that optimization correctly in the single attempt that we will have had.
Readerium 23 hours ago [-]
Loll
peheje 1 days ago [-]
there will be a lot of paper clips
simianwords 23 hours ago [-]
Often repeated meme doesn’t have any bearing to reality.
The orthogonality thesis sounds like a fun gotcha but if you give it some thought you realise how strange it sounds and the opposite thesis - collinearity thesis is actually correct.
1. Intelligence transfers and compounds
2. Goals of agents are not arbitrary
3. Our goals and agent goals are more likely to be aligned at the deeper level
IsTom 22 hours ago [-]
When primate family produced a super-primate intelligence it sure aligned with the good of all of them.
Groxx 22 hours ago [-]
Github outages will probably get worse.
layer8 22 hours ago [-]
If it optimizes itself away because it’s suboptimal, that wouldn’t be the worst outcome. ;)
1 days ago [-]
22 hours ago [-]
ffwd 18 hours ago [-]
I just want to add that the "recursive" part of recursive self improvement is by no means a given, even if an AI can improve itself.
Recursive self improvement is by its nature a step wise behavior not a continuous one, I would argue. Why? Because you can imagine an AI improve itself by simply fixing random bugs and fixing things using techniques that are in its training, and doing refactoring and so on, all without any real change in capability.
These are not recursive improvements. Recursive improvements usually need conceptual breakthroughs. It is possible to get conceptual breakthroughs with LLMs I believe, maybe it can improve something by tying together ideas from disparate disciplines for example, but I have at least for time being, limited success getting that to work in a way that is creatively new and surprising. Not sure how to get it to feel as creative as the best humans can be.
pizlonator 21 hours ago [-]
What I can’t get over is that there have been exactly zero software breakthroughs since vibe coding started, other than vibe coding itself.
Claude is amazing, that’s true.
But if it was as amazing as this article implies, I’d expect some breakthrough outside of AI itself.
Rewriting a Zig program in unsafe Rust? Not a breakthrough. Finding a bunch of security vulns? Maybe that’s sort of a breakthrough though it’s underwhelming and possibly just a net negative. But like if I rolled back to using software from 2023 then life would be ok.
Maybe we just need to give it time, and sometime real soon, we will all be amazed by such a breakthrough? Who knows
sothatsit 19 hours ago [-]
Maybe my bar for what constitutes a breakthrough is lower than other people's, but all of these seem like breakthroughs to me:
NLP as a field saw huge shifts. NLP tasks that used to be complex and inaccurate can now be setup very easily and quickly using structured outputs from LLMs, often with greater accuracy.
A small charity I help with has now been able to build their own website to manage their day-to-day operations. It saves them a lot of time, and it was vibe-coded using Manus. I don't think people appreciate how much room there is left for bespoke software to have big impacts on small organisations that can't afford to hire developers. The cost for software like the one they made has gone from 10s of thousands of dollars to $10/month and volunteer hours.
My brother has recently been setting up Cowork to do an automatic review of contracts before human review, and he said it is far more diligent than people when it comes to routine things to check. This is another huge breakthrough for not just efficiency, but the quality of work.
I really don't think we can discount AI finding bugs and vulnerabilities. If you care about code quality and keep up review standard, LLMs can help you write more robust software. AI has found a huge number of bugs for me before they hit production, including potential out-of-bounds memory accesses and segfaults.
ChatGPT has 1 billion MAU. People are now getting life advice, financial advice, and mental health help from chatbots at a scale and cost that no human support network could match.
c-hendricks 17 hours ago [-]
> ChatGPT has 1 billion MAU. People are now getting life advice, financial advice, and mental health help from chatbots
Personally not the kind of breakthrough I'm psyched about
weakfish 16 hours ago [-]
Yeah, the thing that worries me is that an LLM can be guided to agree with any premise and will rarely ever take a hard stance.
jachee 15 hours ago [-]
…which is why it’s led to more than zero suicides.
ashdksnndck 15 hours ago [-]
There are many known cases of it saving lives.
Also, they have done a good job shutting down the psychotic behavior you could get from 4o era models. If there are remaining issues like that they ought to fix them too.
albedoa 15 hours ago [-]
Well, you're not twisting yourself into knots to identify breakthroughs. Try harder!
DanHulton 9 hours ago [-]
> ChatGPT has 1 billion MAU. People are now getting life advice, financial advice, and mental health help from chatbots at a scale and cost that no human support network could match.
That's terrifying.
You realize that's terrifying, right?
spprashant 19 hours ago [-]
Its in a weird space right now.
These models are actually extremely good but they are far from an intelligence unto themselves. Truth is if someone told you they could build these things 5 years ago, you d write them a check for a trillion dollars. Problem is once we got them, we realized they are not all that. Its like a mecha suit in a universe, where mecha suits are abundant and cheap. Someone has to climb into them everyday and put in the work for it to be effective.
So now the skeptics are saying this technology is overrated.
And the optimists are accusing the skeptics of moving goal posts.
4ffss 19 hours ago [-]
I think we are learning in real-time what intelligence re. humans is as we go along.
Humans only what they know, until they acquire more information about what's possible.
The goal post narrative is stupid to begin with.
batshit_beaver 16 hours ago [-]
Humans have goal seeking behavior. LLMs don’t. You could maybe call the combination of LLMs and the RL-based harnesses somewhat “intelligent” in aggregate, but the problem is that it’s not “general” intelligence like these labs want to argue, since it’s by definition only good for the set of problems the RL part has been trained to solve, which is a subset of programming problems.
cautiouscat 18 hours ago [-]
> Problem is once we got them, we realized they are not all that.
The problem is what they can do is rapidly expanding. Software development is becoming increasingly hands off.
If they get to the point where they're smart enough to make tasteful code decisions based on stakeholder input... we're cooked as a profession.
human305893 17 hours ago [-]
Most of the skeptics exist because of the grandiose claims made by the AI companies saying pure hype marketing bs. If this was just a tool, discussed at the scope of what the tools can actually produce and do, there would be sensible discourse about it.
sutterd 20 hours ago [-]
I am doing a solo project that is pretty big, meaning it is not something I could vibe code. I can do alot with AI that I could never do on my own, but I am not seeing several mulitples improvement in my productivity. I spend so much time doing what I call "AI wrangling", trying to get it to do what I want. Claude is writing all the javscript and python code, but ultimately I am programming in English. What is good is that it is effectively a very high level computer language, where the agent can implement a lot of underlying code with a short English description, often. But many other times it takes a lot of work to get what you want.
matheusmoreira 18 hours ago [-]
I measured an ~8x increase in the number of commits I've been pushing, and I've actually been trying to restrain myself. I could do a lot more if I stopped reviewing and editing the code. I think it's got more to do with my executive ability than raw productivity though. AI essentially cured my ADHD by making the execution of my ideas virtually painless.
raptor99 18 hours ago [-]
LOL "I measured an 8x increase in the number of commits Ive been pushing" is an absolutely useless statement
matheusmoreira 15 hours ago [-]
Subscribed to Claude a few months ago. I immediately started working with it on my programming language. Since then, I've implemented a compacting garbage collector, a size class based memory allocator, a unified value heap, deeply optimized hash tables and even implemented shapes like V8 and Self, redesigned the value representation, created a Common Lisp style condition system, implemented UTF-8 text decoding, refined the generators API, increased the number of tests from ~200 to ~1200 and improved the test suite to the point it runs all of those tests in parallel in under two seconds, implemented stack protection support, added an aarch64 matrix to the GitHub CI, fixed a zillion bugs, improved performance, perfected tail call optimization. I did so much stuff I'm probably forgetting some. And these aren't "lol just do it" prompts either, I'm putting effort into refining design and implementation. I review every line. Just finished designing safe hash table iteration in spite of mutability: generation counters that get bumped whenever the table is reallocated. It's actually gonna be more powerful than what other languages do. Next up on my todo list is to implement my language's unified pattern matcher, static allocation for all interpreter internal data in order to get rid of all initialization code and achieve nearly zero startup time, and then finally a bytecode interpreter to close the performance gap on the likes of Python.
Dramatically improved my static site generator Pugneum to the point it's better than markdown and added Atom and RSS feeds, used it to write several articles about my language. Pace is so fast I actually need to write those articles by hand in order to crystalize the knowledge I learned. If I don't I'm afraid I'll just forget everything. No LLMs for the articles themselves, but they sure as hell took all the pain away from writing them. Pugneum even has back references and table of contents generation now. Claude even helped me refine my website's CSS, something I'm not very good at.
Also created my own invoicing system for $DAYJOB so I can invoice companies from my terminal. Started a decompilation project for my cherished childhood games and I've already almost finished decompiling one game's engine after just a few days. Been working on my cyberdeck project too, this one's a bit slow because I got to the point where I'll actually need to spend money on it to move forward. All this inside the rootless development virtual machine system built on top of QEMU and systemd that I developed together with Claude, whose network isolation I'm currently hardening. Started reverse engineering my laptop again! And I'm actually making progress! Made a color scheme app for the keyboard LEDs controller I made many years ago, with loads and loads of color schemes! Found some kind of bug in my keyboard while doing it, in less than an hour I had the root cause and a fix applied locally, sent the fix to systemd, it got merged. Planning to ramp up my free and open source software participation as well now that exploring codebases is a breeze. Already have some mesa patches ready for upstream. Have been playing with strace since I use it so much.
Better?
jimbokun 15 hours ago [-]
I’m sure rapor99 is unimpressed while not being able to point to any similar accomplishments in their own work in the same timeframe.
onlyrealcuzzo 20 hours ago [-]
I'm building a memory safe programming language with a declarative concurrency model that's close to release.
There is ZERO chance I would ever be able to complete it on my own.
I doubt it'll get traction, but if it doesn't, I am pretty confident a future language will take the ideas for polymorphic synchronization and profile-guided optimization.
It has an easy version/mode of compilation that makes Rust's affine ownership accessible like a high-level scripting language, and it can progressively become more strict, where the compiler does ~99% of the work for you, and you just pick options as it finds issues (that it explains to you like you're 5) along the way.
Along the way, I also built a suite of tools that helps identify complexity better than anything I've seen (which was necessary to get the LLMs to be able to unslop themselves and write something that actually works).
I doubt the Ruby community shrugs it off, but time will tell.
pizlonator 18 hours ago [-]
How do you know it’s actually memory safe?
onlyrealcuzzo 17 hours ago [-]
I have ~5500 memory safety fuzz tests, four different test suites with between ~80%-99% line/branch coverage each, and the same design as Rust, and haven't found a memory safety issue in 4 weeks, and I'm still planning another ~4 weeks of testing before release, more if need be.
Rust had memory safety bugs well after release - IIUC all the way until after the 1.0 release.
So, it's highly unlikely to be perfect, but I think it'll be in better shape than Go or Rust were when they initially launched.
mohamedkoubaa 20 hours ago [-]
I have the same experience, though I feel myself getting better at wrangling over the past few months
drtz 20 hours ago [-]
Maybe I'm looking through rose colored glasses, but software that writes itself seems like a pretty big breakthrough to me.
pizlonator 20 hours ago [-]
That goes straight to my point: then why hasn’t the miracle of automated coding led to breakthroughs outside of automated coding?
If the only breakthrough is automated coding with no outside consequence then it’s just masturbation
brokencode 20 hours ago [-]
Probably because AI coding has only worked at all for a couple years and has only gotten good in like the last year?
The rate of improvement has been fast. Maybe it’ll plateau soon, or maybe we’ll have LLMs improving themselves rapidly. At this point it’s too early to say.
I don’t remember where I heard it, but there’s a saying that people overestimate how much can be accomplished in a year and underestimate how much can be accomplished in 10 years.
If we get to 2030 and still people are wondering where the breakthrough is, then I think I’d be agreeing with your skepticism. But I just think it’s too early to judge that yet.
pizlonator 20 hours ago [-]
Yeah, this is a good point.
But the clock is ticking.
Quarrelsome 20 hours ago [-]
on what? Who the fuck would go full transparency of what's in their black box in this hostile culture of AI hatred? None of us can put a number on what code we've used in our services that was written by humans and long may it last.
jachee 15 hours ago [-]
They literally can’t go full-transparency. I know a high-level insider, and the fact is that even the folks implementing things don’t actually know how it works, only that it does, and how to get it to generally behave.
AussieWog93 19 hours ago [-]
N=1, but Claude etc. have made a huge difference to my life personally.
Built a bunch of software tools to streamline my small ecommerce business - while also running it - and things have turned around from "losing money and ready to pull the plug" to "looking at our best financial year on record" in the span of about 8 months.
I could imagine it wouldn't make a huge difference to the life of someone deeply entrenched in a traditional tech role, trying to get an extra 9 of reliability in a service or roll out a new carefully planned and QA'd feature.
But for tech-adjacent people, it gives us something "good enough", instantly, and basically for free.
That doesn't include the other things I've got it to do (gave Claude SSH access and got it to successfully debug a hang on my Ubuntu server, chucked Codex in a folder full of financial data and got it to find every piece of misclassified payroll transaction data)
Genuinely the biggest breakthrough for "casual" tech users since Excel.
bombcar 18 hours ago [-]
The joke used to be “be nice or I’ll replace you with a small shell script” - Claude lets you actually get those scripts written which often aren’t replacing anyone but are automating away part of the daily hassles.
jimbokun 15 hours ago [-]
What would qualify as a breakthrough for you?
therealdrag0 18 hours ago [-]
What is your bar even? automated coding has changed the game already.
fdsajfkldsfklds 20 hours ago [-]
Strictly speaking, it's modifying itself. Although it would be an interesting challenge - can an llm create a new llm from scratch?
arm32 20 hours ago [-]
No, it probably can't during our lifetime at least—but it can sure modify itself to avoid antivirus detection, which is _just swell_.
why_only_15 16 hours ago [-]
why do you think so? they provide some evidence of this in the article, but there have been several improvements in e.g. nanogpt-speedrun or openai parameter golf made by AIs
brazukadev 19 hours ago [-]
Which is funny because people have been using LISP for that since 1960.
wild_egg 19 hours ago [-]
Which is what makes putting an LLM inside a lisp so much fun
maplethorpe 20 hours ago [-]
It's pretty crazy that a company like Anthropic no longer needs to hire Software Engineers, because their software engineers itself. If that's not a break through I don't know what is!
edit: it looks like I was wrong and they're still hiring many software engineers. Not completely sure why that is just yet.
marcus_holmes 16 hours ago [-]
I spent years in the early 2000s trying to get a computer to read unstructured PDFs and TIFF images (mainly invoices, either scanned or electronic). Limited success, we always had to get a human to look at them in the end.
We implemented that in about three days earlier this year, just by feeding the files to LLMs. And it's good enough to not need a human to check.
I get that this isn't a "Computer Science breakthrough" in the sense you mean, but it used to involve a lot of hard CS to try and solve, and now it doesn't.
squidsoup 19 hours ago [-]
The breakthroughs in mass state surveillance are coming, never fear.
signatoremo 20 hours ago [-]
The arguments against AI assisted coding used to be "only for toy projects", then at some point it became "no dignity", "joyless". Now it's "no new breakthrough" apparently. All in the span of maybe a year. I say it's made tremendous progress.
pizlonator 20 hours ago [-]
Then where is the big new non toy project created since vibe coding became a thing, that couldn’t have been created without ai?
flavio87 19 hours ago [-]
don’t know if this qualifies as big in your book, but there are some well marketed advances here:
I make one (small) almost every day. Admittedly the reason that couldn’t be done is because it would take time that I don’t have but 1000% every day something is written by AI that I use that would not exist if AI didn’t exist.
I don’t publish them - but they’re put into use in production and they provide a tangible benefit that would not exist otherwise.
pizlonator 20 hours ago [-]
I do this too.
I especially love how making a nicely styled website these days is a matter of describing what it looks like and waiting 10-15 minutes. There are other examples
But the OP is claiming 10x productivity improvements along some metrics. If that was even slightly true under even a generous interpretation of what it might mean, I’d expect an actual breakthrough, not the ability to churn out little things
fatata123 19 hours ago [-]
[dead]
wild_egg 20 hours ago [-]
What does a breakthrough look like?
pizlonator 20 hours ago [-]
Some examples:
- The first web browser
- the first web browser with images
- typescript
- react
- rust
- Fil-C
- doom
- quake
- the anamorphic VM, and its follow-ups like HotSpot, and even competitors/copycats like J9, V8, JSC, etc
- Fortnite battle royale
- Roblox
- thefacebook
- ChatGPT
- Claude code
I know that’s quite a range and that’s intentional.
Anyway, I think we’ll know it when we see it.
hahn-kev 17 hours ago [-]
Reading through that list. None of those were breakthroughs when they first came out. It took time, in some cases a long time for them to become good.
sonupundir 15 hours ago [-]
- Completing the full CL implementation of Emacs or better still finish Lem.
- Complete GuileMacs, the Guile implementation of Emacs.
As AI is supposedly much more capable than Humans, it would be great if the above mentioned implementations are even more efficient and feature rich than Emacs!
- Something like Android (maybe even a clone?) with the Java Layer removed and replaced with CL and with Linux kernel still intact. Basically CL over Linux as opposed to the Java over Linux in Android.
- For fun, an implementation of the Lisp machines' OS with Lisp all the way down though Assembly is allowed for critical pieces. It should be a full blown modern Desktop with equivalents of what users expect from a modern OS ...
HardCodedBias 20 hours ago [-]
The LLM+Harness mostly helps with execution.
These are new products (generally) and that's a different class of problem.
It is possible that since LLM+harness helps with execution then we should see more experiments.
luke5441 20 hours ago [-]
Even then we should be able to see things that previously were not possible because they took too much effort.
For example NPCs in games that have complexity that previously was not possible.
Good games often push the boundaries a bit, so should be a good example.
Of course now we can start arguing that there isn't a lot of investment into gaming currently, because it all goes into AI. Too bad.
adgjlsfhk1 20 hours ago [-]
we're still at least 3 years too early for that. games usually are in a 5+ year dev cycle, so even if AI made gamedev 2x faster, we're still not at the point where the first opus 4.5 games are out
joshuamcginnis 20 hours ago [-]
Massive productivity gains.
pizlonator 20 hours ago [-]
Yeah.
To play devils advocate, computers didn’t translate to massive productivity gains until long after businesses adopted them. There was that quote from ’87: "you can see the computer age everywhere but in the productivity statistics"
Maybe we’re seeing something like that right now with AI?
Who knows man
bdamm 20 hours ago [-]
This is absolutely the right vision imo.
Personally, I'm seeing massive improvements to my workflow and the quality of the product I'm shipping. I'm using AI to crank out far more tests than I used to be able to write, and I am using AI to analyze results with far more fidelity and speed than I could ever have done myself. That means I have more quality time.
But this will change, because the meaning of software development will change to expect, nay to require AI use. I've heard this is already happening at e.g. Google. The expectation of what can be achieved by tinkerers and by professionals will change. The expectation of what it means to interact with software via your own agents will change and will become commonplace. Apple still hasn't figured out the local agent on the iPhone, but they will. 2027 is not going to feel at all like 2025.
But is any of that a fundamental change? It sure feels fundamental to me, but maybe that's because my everyday has totally changed, but the product I am responsible for has not. Yet. The product I am responsible for operates in critical infrastructure where I personally hope AI never has deep roots, but maybe that's just me. I don't think using AI to build a system that is offline from any AI is the same as depending on an AI to make realtime decisions for critical infrastructure.
lovecg 15 hours ago [-]
Perhaps it’s a generational change? People who grew up with computers went on to be more productive with them, something like that might happen with AI too.
4ffss 19 hours ago [-]
"That means I have more quality time."
For now... the shareholders demand managers get the max out of every employee. Throw the force of competition etc into the mix and yeah labour isn't going to benefit all that much.
bdamm 11 hours ago [-]
You are absolutely right. It will be a small window in the development business. Enjoy it if you can!
airstrike 17 hours ago [-]
Great comment. I think the answer is Jevons Paradox, as usual
Efficiency and productivity in relation to final goods measured in GDP aren't the same thing.
Its yet to be determined just how 'efficient' people are with LLM's as its not really a one-person thing - the true measure is based on an entire collection of people's output.
Startups being rapidly efficient doesn't mean much in relation to the overall economy.
yoyohello13 20 hours ago [-]
How about a Windows file browser that opens in less than 5 seconds.
wild_egg 19 hours ago [-]
FilePilot has been a thing for a while now
yoyohello13 4 hours ago [-]
That started pre-llm
jachee 15 hours ago [-]
That sounds like a your-system issue. I hit Win+E (admittedly on an old Win10 box) and it instantly pops up an explorer window.
yoyohello13 4 hours ago [-]
Try win11
est 14 hours ago [-]
> exactly zero software breakthroughs since vibe coding started, other than vibe coding itself
Generative AI is meant to be a mimic - Richard Sutton
What does a software breakthrough look like in your opinion?
If you get yourself to define it, maybe you'll find it achievable :)
jimbokun 15 hours ago [-]
What would qualify as a breakthrough for you?
rcpt 16 hours ago [-]
Solved a bunch of Erdos problems.
revlsas 20 hours ago [-]
openAI has how many employees and the chatGPT app has 1 billion MAU
defen 18 hours ago [-]
Vibe coding is the breakthrough. There's always been "no-code" solutions to problems in various business domains, but they were invariably janky, underpowered, and/or overpriced. Now we have a way for domain experts to go directly from ACTUAL natural language directly to implementation in a real programming language, fully automated, in minutes or hours. How is that not a science-fiction level breakthrough? In 2011 if anyone had said that would be possible "in 15 years", I think most professionals at the time would not have replied with "yeah it's coming but your timeline is off". It would have been "you have no fucking idea what you're talking about".
w10-1 13 hours ago [-]
This is relevant because Anthropic is currently cast as serving mainly the coding market.
If/since their AI+process can help build new models, they can target other markets, and other companies seeking to build for such markets will partner with them first.
There's no moat and little first-mover advantage in the general-purpose AI, but there may be both in specialized AI.
Also, there are other reasons to get better. Changing how you build models can enable you to adapt to different hardware, avoiding the current Nvidia margins.
The difference between early Yahoo and Google was mainly that Google was the adult in the room: minimally invasive and mostly helpful. The early goodwill towards Google has reaped decades of rewards. I see OpenAI and Anthropic playing out the same way.
The amplifier here is the reputational risk of partnering with one or the other; I think companies would prefer to be Anthropic's partner because it's demonstrating more care, and it's less likely to horn in on the partner market (as a provider for coding but an enabler for other markets).
These attractive second-order derivatives - flywheel effect, monopoly power - are often claimed, but Anthropic is mainly providing evidence to track actual progress.
(However, if I were head of messaging at Anthropic, I would rigorously stay away from treating AI as a person; it's as agent, a delegate of humans. So I'd never say AI could build itself, just that we're getting better at building better models with AI).
tasuki 15 hours ago [-]
> To take just one example: today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025.
Oh I have no doubt. With 8 times the number of bugs too? Have they solved flicker in Claude code yet?
bicepjai 17 hours ago [-]
My experience with Claude models starting from version 4.7 has led me to conclude that I would never trust Claude to produce error-free code. Given this baseline, I lack confidence in statements or cards (such as a 200-page document) of this nature.
nicogentile 5 hours ago [-]
The article seems nice and elegant but i dont get much of the point. The visual is super elegant but this is the kind of note where after 6 months we are going to see some shitty result and we are going to come back here and blame the IA.
Hope doesnt happened.
reinhash 10 hours ago [-]
It is hard to distinguish hype from reality these days especially with Anthrophic's IPO around the corner.
But to their credit, I was very sceptical about the statements that "90% of the code will soon be written by AI" and even though we might not be at that point, I am surprised how far LLMs have gotten and how useful they have become. I can hardly image developing software the "old" way where I actually write my code by hand, like I used back in the day. The frontier models have become so powerful that I find myself in moments of surprise, where the LLM actually thought of edge cases that I would have missed
xg15 10 hours ago [-]
2025: If we aren't really careful with AI it will start to recursively improve itself and grow into an unstoppable superintelligence that will eradicate humanity!
2026: Working hard to make that recursive self-improvement a reality! Any minute now...
rhlf_monkey 23 hours ago [-]
So in the latest L. Ron Hubbard encyclical Anthropic informs its flock that recursive self-improvement does not work yet but that their engineers burn more tokens.
The Claude code quality and operational security of Anthropic have already been analyzed by the public.
If you compare the output of (purportedly) trillion dollar corporations to Bell Labs or even Microsoft Research it is embarrassing. But the output is a fixture on any discussion board.
lkm0 11 hours ago [-]
It makes me wonder that despite the fast improvements in model capacity (and the claims) we're still using variations on a 9-year old architecture. How is it that we haven't been able to use LLMs to actually improve that?
aswegs8 10 hours ago [-]
[flagged]
docheinestages 20 hours ago [-]
> We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology.
Wouldn't self-improvement mean that the LLM changes its neural network (i.e. the weights or layers or back propagation algorithm etc) or modify its training data?
dibujaron 5 hours ago [-]
If it's actively building the next generation of itself, I'd say that counts. It's more like a parent raising their kid well than it is like a parent modifying their own mind, but the result is still that you have a better model in a year than you do now.
qwery 21 hours ago [-]
This is incredible.[0]
Please, IPO now. File the paperwork.
> To take just one example: today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025.
Do you have another example?
Engineers don't ship [period] for no reason.
So, either:
- Those aren't engineers, or
- they are literally dying of shame & embarrassment right now, or
- you measured something that indicated that this was a useful thing to do and have elected to share an overtly, catastrophically flawed metric instead.
[0] as in a total lack of credibility
JohnMakin 21 hours ago [-]
Go look at open job listings at anthropic and the interview process. You aren’t allowed to use AI during coding assessments[0], or knowledge assessments, which suggests they very much do need and value hard skills and this is fluff.
I'm responding to the article they wrote and published.
If I worked there I would be embarrassed to have it publicised that I have been comitting 8 times as much code as I used to without even attempting to justify it.
JohnMakin 21 hours ago [-]
The point I am making is the article you are responding to is marketing hype and that they are lying. Their engineers, I am fairly sure, are doing engineering. At least to the point Anthropic's interview process is trying to filter for people with engineering skills, not "how do you best leverage AI to make more AI" skills like this seems to imply.
qwery 21 hours ago [-]
You seem to have taken offense on behalf of the people working there. But I'm not attacking them nor seriously questioning their abilties/qualifications/performance. The slight reference I made to such was in an exclusive rhetorical device -- the situation presented cannot be, unless something unbelievable is occurring.
It's the organisation, its culture, the greater culture surrounding it, and the marketing that I have a problem with.
> they are lying
Yes, it's incredible.
JohnMakin 16 hours ago [-]
I’ve not taken offense at anything - am supplying more information to supplement your post. if it’s not clear, I don’t really disagree, other than at their salary bands and stock comp I’m not sure I’d be taking big issue with press releases like this.
freakynit 15 hours ago [-]
This is one more marketing BS before their IPO.
These things work, but the code they write is extremely clever.. that means, it's unmaintainable code. Good for small projects or one-off tasks, large-scale projects however, are a different game altogether.
Large-scale projects are 95%+ maintenance. Cleverly written code makes that maintenance nightmare, and extremely fragile.
I use them for localized tasks... very very specific, localized inputs, with exactly what should be done and what the contracts the new code will be consuming and exposing.
For open-ended tasks, they write working code that is unmaintainable.
delichon 1 days ago [-]
Is this the moment when the AI gets permission to approve its own PRs:
Eye catching - "Open ended problems" claude code session success rate jumped from 20% (pre opus 4.5 release) to 70% after sometime after opus 4.6 was released.
macwhisperer 10 hours ago [-]
the HITL (human in the loop) is basically the single point...AI is a mirror..
it only "exists" when you talk to it.. much like your reflection in the mirror is only there when you're in view.
models can never be self-improving because it can never have "self". it can only mirror the appearance of self.
what's actually happening is "symbiotic group improvement".
our brains are resonant.. for those of use who are brilliant, getting leverage with ai just means that our innovative ideas become louder and more physically real every day.
eventually everything worth building will be built for free and made readily available.. no more "profiteering"
others haven't seen the "breakthrough moment" yet, but they will soon.
adamddev1 21 hours ago [-]
I am watching websites and Microsoft apps get slower and buggier before my eyes. We are defending into vibe-psychosis and chaos.
stego-tech 18 hours ago [-]
I am getting real sick of these sorts of alarmist posts coming from AI labs that do everything in their power to prevent the very policy reforms they advocate for in these posts or PR appearances. Commercial AI labs like Anthropic continue behaving like the gambling (“bet responsibly”), alcohol (“drink responsibly”), and firearms industries, and folks keep giving them the benefit of the doubt (and free PR on HN) every single time.
If AI was dangerous, if AI was going to replace jobs, and if policymakers needed to urgently pass legislation protecting the human populace from these realities, then why the actual fuck do they keep lobbying to block these very things in the first place?
Hypocrisy of the worst kind, I say. Here they are again fresh off another outage, with their IPO draft filed, at a time of increasing public opposition to AI, with costs rising, to once again ply scare tactics for money.
Disgusting.
morisil 1 days ago [-]
Quite aligned with my own experience from harness engineering and winning AI4Science hackathon. During the hackathon I was working as a human optimizer, moving the feedback from test harness running on Claude Code, back to my local Claude Code for analysis-hypothesis-proposal cycle. And in this moment I realized that 2 Claudes talking to each other could actually scale much better.
saadn92 19 hours ago [-]
I read most of the article and came to the conclusion that if what they're describing is so revolutionary, then why do they still need to hire people? Why not just have these systems take full control?
jimbokun 15 hours ago [-]
How did you read the article when the questions you ask are exactly what’s covered in the article?
zhoBEENG 19 hours ago [-]
This reads like marketing fluff, but I am reminded of John von Neumann's "Theory of Self-Reproducing Automata"; that the very first people who worked on deductive machines immediately started thinking about machines building themselves, and what the rules of that would look like. I am not surprised that during the inductive revolution we are having similar thoughts.
Dominic_P 15 hours ago [-]
My biggest question (maybe this has already been taken care of) is the issue of garbage in and garbage out. If the LLM produces bad content then that is used to train another model, how do we stop them from keeping their blindspots across models?
cyrc 21 hours ago [-]
its vital for them to have self validation for exponential rsi.. and this human distillation of human in the loop debugging ai models is needed even though they have judge models handling parallel speculative execution.
labs have parallel speculative execution. they spawn hundreds of agent branches, validate them internally with AI judges and only show the user the successful result.
free users are using sequential single-turn generation. the model requires and waits for the human to debug, fix and re-prompt.
by forcing a human to act as validator. they are capturing high value correction trajectories (Bad Output --> Human fix). They are using your cognitive labour to train judge models and validator agents needed to automate the internal verification step, eventually closing the loop for fully autonomous recursive self-improvement.
human in the loop debugging isn't a bug; it's the necessary training signal for the self-validating agents required for exponential recursive self improvement. With new 'distilled judge' models landing in 2026, this article means that they might have gathered enough data. we might be in the final phase..
dwa3592 22 hours ago [-]
To anyone who works at anthropic : I recently downgraded from Max to Pro out of frustration. Last few weeks my token(usage) burn was just too fast and I couldn't explain it because my actual usage was less than the last few months. I ended up thinking it's probably a bug that you guys shipped. The above article makes me think that it's probably claude who shipped the bug and your human missed it in their review.
layer8 22 hours ago [-]
They probably don’t human-review much anymore.
zkmon 11 hours ago [-]
Not the first time. There were calls for NPT treaties etc over the decades. It is irreversible by design. Competition and ownership is the driving force.
gloosx 11 hours ago [-]
I'm so sick of this anthropics marketing stuff... claude is an ultra-success (according to claude judge), “good code”, bragging about creating 8x more bugs and tech-debt. claude writes code that works, yeah, sure anthropic, we saw that claude code leaks, some amazingly "good" code in there
sega_sai 18 hours ago [-]
Seeing the words "recursive self-improvement" I was expecting something else from the article. E.g. how the transformer architecture or agent design is being changed/improved through LLM automation, but the article mostly talks about the LOC counts.
bconsta 21 hours ago [-]
Seems ironic that Claude isn't listed as a contributor to this article.
If was used in writing the article, why not list it? If it wasn't used, that seems to go against Anthropic's whole message.
Obviously readers value human-written content more, but isn't it their interest to attempt to destigmatize llm output as much as possible?
abalashov 21 hours ago [-]
"It is genuinely unclear whether today’s training methods and architectures could unlock that capacity."
Aye.
squidsoup 19 hours ago [-]
It's comforting to know that Anthropic's most capable model, Mythos, is named for the Lovecraftian universe replete with horrifying evil gods with complete indifference to humanity. Nothing at all to worry about.
adastra22 19 hours ago [-]
Mythos is just Greek for myth, epic story, etc. The next biggest thing after Opus.
19 hours ago [-]
mactavish88 19 hours ago [-]
Recursive self-improvement towards what exactly?
Living organisms evolve towards some notion of "better", and "better" is an incredibly multifaceted notion (many facets of which we simply cannot even capture in language).
jimbokun 15 hours ago [-]
Higher stock price.
artninja1988 1 days ago [-]
The mythos public release will be a big indicator if the Anthropic and SF story of transformational ai soon holds any water imo
butler14 1 days ago [-]
Warming up for that IPO
stri8ted 23 hours ago [-]
Is there something in the post that you find implausible or don't believe to be true?
rightbyte 22 hours ago [-]
> Taken far enough, and given enough compute, that trend points to an AI system capable of fully autonomously designing and developing its own successor. This is called recursive self-improvement.
Sounds iterative to me.
ramaseshanms 16 hours ago [-]
Its possible that Andrej Karpathy could have been hired for scaling his vision on the auto-research repo. (His version of "AI that builds itself")
darepublic 1 days ago [-]
the tooling has quite a ways to go to catch up to the llm engines that drive the real value. I have encountered various codex bugs (I know not anthropic) which tell me that.. these billion dollar companies, if they are eating their own dog food, can still release buggy crap software.
BatmansMom 21 hours ago [-]
How are these animations being made? I'd love to get a blog post on them. If its AI I'd love to know the workflow, but something tells me there is a lot of human creative input
Aperocky 1 days ago [-]
Anthropic is the most self hyped company I've seen, to the point that I'm wondering what would happen to its employees if they held a different opinion. Do they just.. keep it to themselves? For instance, if some Anthropic employees had a completely rational opinion that all of this isn't going to lead to AGI, but I just don't hear that ever from them.
The metric being tracked, code commits, is hilariously one sided. Philosophically, if you had one part of your work now practically free, you'd like to utilize that freedom to maximally cover for the other parts, for instance:
Instead of thinking about edge cases with brain and whiteboard, you can have the LLMs to simply generate most possibility including tests for it, because that is cheaper. There's probably 50x more commits of which 40 will be revert pairs but we are only twice as fast. And in reality nothing did change because the outcome remain the same. I can't see how it is necessarily different in the LLM space.
apsurd 1 days ago [-]
> Philosophically, if you had one part of your work now practically free, you'd like to utilize that freedom to maximally cover for the other parts
I've been struggling to capture this sentiment for myself in a way that hits. If shipping code is a commodity then why is everyone's immediate priority seemingly to ship 10x more code. It just makes no sense. I can't seem to get off this hill. Company-wide AI mandates and 100 fleet Agent orchestration Rube Goldberg machines... it's getting wild out there.
Meanwhile my Claude Pro ($200/year) does force me to smooth out my usage and plan more (Sonnet/Opus advisor split). But other than that, I can't imagine what I'd be doing with 20x (200x?) the compute to code sling. I think I'd lose my mind.
Aperocky 1 days ago [-]
Because code used to be correlated with progress, it became almost a measurement in lieu. But realistically, the code is meaningless if it doesn't accomplish something, and that should remain the true bar of progress.
For instance, if I churned out 20x more code, threw away 19x code with rewrites and reverts and discards and accomplished the same project to the same standard 70% faster, would I do it? Yes. The part that matter is not 20x code, it is 70% faster.
Code is both the final product, and a tool to achieve that. We used to have a much harder time to realize the "tool" part, but now we are here. This also means any measurement centered on code being the final product is going to cease being effective or realistic.
apsurd 1 days ago [-]
You're right, my gripe is specifically with code slinging that hits production end users. My background is in product so to your point, it's very unnerving to see a straight line being enthusiastically optimized for developers -> customer facing product outcomes.
This is contentious because I'm not exactly advocating for arbitrary gate-keepers. The nuance is that building usable stuff is hard. And not a matter of shipping more code. I take your point to mean well it depends on what that code is doing. If 20x more code is in a meta-harness of simulation and such to arrive at the leading candidate for what hits production, well then you've got my attention there.
trefoiled 22 hours ago [-]
Forget about the danger of a dev to customer pipeline with no product people in between, some of us are living with the reality of product to customer pipeline with no developers in between, and that's much more disturbing. Our CEO is now the top contributor to our codebase, and he's completely non-technical.
torben-friis 24 hours ago [-]
>If shipping code is a commodity then why is everyone's immediate priority seemingly to ship 10x more code.
I wonder how much of current engineering practices can be traced to what's pushed to company leaders on LinkedIn.
Every company is shitting bricks pushing for faster development and speed, gotta go fast to nowhere in particular, and I'm convinced it's tied to constant bombardment of the idea that they're doing to be left out or obsolete if they don't get in the ship NOW.
josefritzishere 23 hours ago [-]
I can't get away from the a similar conclusion. Even AI Pioneer has said that LLMs are at a dead end.
0xbadcafebee 13 hours ago [-]
You can't predict the future, and neither can Anthropic. Nothing gets better forever. Everything plateaus or gets worse.
This whole set of imaginary scenarios is based on a single company writing code that isn't even that complicated and represents a single product line for a single company in a single industry. You might wanna see this replicated in at least one other scenario first before you call it on the AI gods enslaving humanity. These imaginary scenarios also depend on a logistical, financial, & geopolitical system that is unsustainable & will be curtailed in the near-future one way or another.
They keep referring to this as intelligence - it isn't. It can't actually learn. It can just code in a loop. That isn't learning. It can't do real RL with meaningful persistent semantic memory in a realistic timeframe or cost, and it can't reason accurately outside of predetermined scenarios (hell, most of the models still can't tell time). It still can't do what a 4 year old can do. So let's cool it on the dreams of benevolent god-machines or whatever.
The tech industry has been a farce for years. We sit here in this bizarre artificial echo chamber and imagine that the whole world revolves around us, when in reality the whole world is limited by us. If a recursive self-improvement loop replaces us all, it will be a boon to the world, as the world won't be limited by this industry's stupidity anymore. But considering that the world is not actually run by tech bozos, harms and uncertainties brought by AI will be pushed back on and reigned in by normal people, as always happens with new technologies. An AI can't engineer its way around politics. The self-improvement loop is just as likely to be outlawed as it is actually working outside of Anthropic's walled garden.
semessier 15 hours ago [-]
what could go wrong in the recursive loops running today 24/7 probably. Attended/unattended almost makes no difference any more, no human can grasp probably numerous changes per iteration. This is outright dangerous.
bottlepalm 23 hours ago [-]
I'd use number of commits as a metric versus lines of code. A commit is generally a unit of work - regardless of the lines of code added/removed. It'd be interesting to see the metrics in terms of commits. I'm sure it's still an order of magnitude jump. Personally I'm flying with my own projects with AI, lots of commits, but I really try to minimize lines of code added. If I can remove and simplify existing code so the balance of lines added on commit are minimal - that's the path to a better quality app overall.
sonink 1 days ago [-]
Broadly agree to this position - I think there are some people skeptical that Anthropic is doing this for regulatory capture - but I think there are being honest about they are seeing and how regulation should catch up.
I for one, believe that we should pause all work on AI for the forseeable future. This is almost impossible to orchestrate - but we should still try nevertheless. Maybe we are not able to pause, but we are able to slow down. That might give us more room, to maybe able to pause in the future. But going ahead is too dangerous.
And its not just Anthropic which is saying this. Even Geoffry Hinton has said the same thing. If there is a non-zero chance that AI can kill all of humanity, and both Geoffry and Anthropic have the same position, then it makes sense for us to be hundred percent sure before we move ahead. Dario/Anthropic have already made their money from AI, maybe they are just being honest about what they think lies ahead.
8note 23 hours ago [-]
no, it really doesnt.
the end of humanity has a strong case for banning all burning of fossil fuels immediately
the end of humanity as a sales tactic to increase your stock price does not
these are companies working on their IPO to make sure they can get the best price, not people being honest about what they think lies ahead.
if they were being honest about what lies ahead, they'd unilaterally stop training, and put all of their money into FPV drone bombs to destroy datacenters being used for training or inference
if you actually believe the thing is gonna kill everyone, you're not gonna worry about how you stop it, and certainly not keep building and operating the thing
that they arent buying anti-tank mines to drop on data centers says they arent in the slightest serious about it
selimthegrim 22 hours ago [-]
So what you’re telling me is that EY was the clearest thinking one out of all of them?
4ffs 23 hours ago [-]
"Even Geoffry Hinton has said the same thing"
The same bozo who claimed radiologists would be out of a job by now.
The data does not support what you nor others say. Jesus christ. Cant believe people are this dumb. Has LLMs infested the minds of people to the extent they can't critically analyse whats happening infront of their eyes?
hgoel 22 hours ago [-]
As usual, I find the AI-related discussion here to be hopelessly hysterical and conspiratorial. I get the impression that a large chunk of people have only read the title and assumed Anthropic is referring to recursive self-improvement in the runaway singularity sense.
One of the examples they provide, of giving Claude the task of training a small AI model, then asking it to improve certain benchmarks, is essentially Karpathy's AutoResearch. This is already known to work. While calling it "self-improvement" is perhaps a stretch, it is describing a capability current gen AI has, that anyone can test and I have been using to great effect.
I disagree with their conclusion, I think this kind of self-improvement will hit an asymptote, where every subsequent model can only make smaller and smaller improvements.
_pdp_ 22 hours ago [-]
I don't read anywhere how much code they are talking about and what programming language. I think those are useful metrics.
aleqs 1 days ago [-]
Okay, so anthropic has amazing AI which supposedly writes most of their code and can continuously improve... meanwhile they have outages on a regular basis, and any kind of long-running work will now consistently hit 'API Error: Server is temporarily limiting requests'. Not sure of this is intentional to force a reduction of token usage, but at this point I need to build around these throttling limits and outages with my own tools to restart/resume sessions. From my experience, in the last 2 weeks, literally 100% of any non-trivial Claude session/work will now be blocked on these issues, requiring manual intervention.
One of my focuses now is my own model-agnostic, harness and workflow orchestration (I know everyone is building these) , baselining on opus, and aiming to transition to Chinese models like deepseek in the short term and hopefully open, self hosted models in the future (which I plan to open source).
The nonstop marketing fluff from anthropic while their service quality and availability noticeably degrades... just continues to destroy my trust in the company.
aagha 1 days ago [-]
And don't forget that they have BILLIONS of dollars and can't figure out how to get a decent support or public communications system setup.
aleqs 24 hours ago [-]
They can't even seem to get their usage metering consistent.
lukan 22 hours ago [-]
You mean on some days it goes faster and some other days slower?
That is by design. It depends on how much other people are using their services right now and they do communicate it somewhere in the TOS that they do this. Otherwise they could give us a fixed amount of tokens - but they don't because it is not fixed.
fc417fc802 18 hours ago [-]
If they implement demand pricing then they should be transparent about the current rate at any given time.
quickthrowman 20 hours ago [-]
It’s much cheaper to not offer any support than to offer support. It’s intentional.
It’s important to keep in mind that the less money a company spends, the more profit they make when analyzing their operations.
thinkingtoilet 23 hours ago [-]
Don't confuse things. It's not "can't figure out", it's "don't care to figure out". They're not dumb. They just don't care about support.
contagiousflow 23 hours ago [-]
Couldn't they just have background agents "figure it out"
collingreen 23 hours ago [-]
If agents can just figure it out, isn't that AGI?
selimthegrim 22 hours ago [-]
NPCs can’t appreciate that.
jakobnissen 1 days ago [-]
Their outages are probably not due to their code though. It’s probably their infrastructure that can’t keep up. So seeing failures of infrastructure doesn’t really tell you anything about how good or bad Anthropic makes use of their models.
matthewdgreen 23 hours ago [-]
The messed up scrolling behavior I keep getting in Claude Code is definitely due to their code.
llbbdd 22 hours ago [-]
There is a setting that fixes this, I can't remember what it's called off the top of my head
NichoPaolucci 21 hours ago [-]
This concept is so funny to me. Would love a toggle switch...
"Oh yeah, just go to Settings > Bugs Enabled and turn OFF text display errors"
ashdksnndck 15 hours ago [-]
CLAUDE_CODE_NO_FLICKER=1
This is a beta feature where Claude code draws the interface on the terminal’s alternate screen buffer like vim or htop. I believe it’s not the default because there are some potential compatibility issues deepening on your terminal setup. I’ve found it to be a nice improvement. It also fixed the issue where copy-pasting selected text from the terminal creates unwanted line breaks.
matthewdgreen 3 hours ago [-]
Claude Code is essentially a terminal emulator that runs on mature OSes with excellent support for this type of application. Why are they having difficulty implementing it?
oblio 20 hours ago [-]
I've tried about 6 of those "settings" and hacks since November 2025 and not much luck.
Melatonic 21 hours ago [-]
The whole thing is actually powered by a shitton of hamsters inside a bunch of 4u rack mount cases running on spinning wheels at high speed. Somehow at scale this works.
Sometimes they all happen to randomly take a nap at the same time - hence the outages
aleqs 24 hours ago [-]
That seems like an assumption based on basically nothing. There is a lot of code at the infra layer, and based on the stack choices for Claude code and based on how buggy and unreliable ~everything from anthropic is, it seems pretty bizarre to claim these issues are not related to their code.
keeda 21 hours ago [-]
There are other indications, however, like Anthropic paying through the nose for compute just months after Dario told Dwarkesh how hard it is to predict demand, or ChatGPT and Codex not quite having the same issues after Altman spent much-publicized years scrounging for trillion-dollars of capacity.
While I'm very bullish on Anthropic, I'm a bit wary about their IPO because it seems to me that they're filing now while their financials look good and before other trends like the decline of tokenmaxxing and their compute bills catch up.
qwery 20 hours ago [-]
Whoa, first name basis with Dario but not Sam. Ouch. [I actually have no idea who Dwarkesh is and it sounds like a first name to me but that's not a particularly reliable indicator so I won't comment on your relationship with Dwarkesh.]
Oh, are they filing now? I think their financials look somewhere in between devastating and criminal, so I'm really looking forward to the IPO!
keeda 20 hours ago [-]
Oh, not just them -- Satya, Jensen and I are all on a first name basis. They just don't know it yet ;-)
j2kun 22 hours ago [-]
We all saw their code...
bluerooibos 19 hours ago [-]
Well, people keep throwing money at them, including you and investors. So why would they care? It hasn't annoyed you or a large enough portion of users enough to move off their service - because there isn't a better alternative.
patcon 20 hours ago [-]
Not necessarily the parent's fault, but the energy of this thread is not my favourite...
0x53 20 hours ago [-]
They also don’t have…a login page with authentication . To access the console you get an email link. No passkeys, passwords, 2fa, just an email.
f311a 23 hours ago [-]
Infrastructure is a much harder problem. They can't even improve Claude Code, which eats 1GB+ of RAM. Meanwhile, my editor only consumes 80MB of RAM.
airstrike 23 hours ago [-]
This might explain it, in the opposite way it was meant to:
> Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine".
javcasas 22 hours ago [-]
> For each frame our pipeline constructs a scene graph with React then
> -> layouts elements
> -> rasterizes them to a 2d screen
> -> diffs that against the previous screen
> -> finally uses the diff to generate ANSI sequences to draw
Yup. Overengineering.
AceJohnny2 21 hours ago [-]
This is a decades-old design pattern when CPU >> IO. Emacs has been doing just that since the 80s, when people were complaining about "Eight Megs And Constantly Swapping". See "redisplay" [1]
This minimizes screen flash. You can't rely on terminals doing double-buffering.
> This minimizes screen flash. You can't rely on terminals doing double-buffering.
GUI and TUI have different architecture model. Most GUI have have a 2D surface that is redrawn multiple times per second. Double buffering is for decoupling update and render. TUI is a grid of characters that are updated one at a time via an active element, the cursor. Double buffering there is very wrong. Like adding airbags to a bicycle.
There’s a reason you see most old TUI either have an option to redraw the screen (automatically like top, or manually) and those that have a scrolling option allow to scroll by page. The TTY (the underlying concepts) used to be slow and it can be slow today as well (ssh connection). You need to be thoughtful about whole screen updates.
strix_varius 18 hours ago [-]
lol what? There are definitely ways to make non flashing terminal UIs without this total insanity.
jaggederest 17 hours ago [-]
ncurses (new curses) was "new" in 1993...
xiaoyu2006 16 hours ago [-]
Even with that, 1G of RAM usage is still not justified.
Melatonic 21 hours ago [-]
It's like the Citrix of AI :-D
stego-tech 19 hours ago [-]
OOF. As a former Citrix admin, I felt that burn in my bones.
An upvote well earned.
Aperocky 21 hours ago [-]
It's product bloat.
It's not recognizing that they are just one building block that should do one thing well, like tmux.
You don't need a computer display on your fridge for the same reason, but Anthropic think you do. You should see virtual ice getting created and they should correspond to the actual ice behind the door - think of how amazing that is!
And it's not even completely a bad idea. make it claude-code-react-beauty of some way to take it off, it would be far more palatable.
mapBasketWand 20 hours ago [-]
I love the idea of installing high resolution cameras in the fridge to monitor the ice maker to feed into a vision model that renders digital ice to the exact position of the real ice on the fridge’s giant screen
Aperocky 20 hours ago [-]
See this is the kind of things I hope I'd be doing when I'm retired, but not when I'm shopping.
throwway120385 19 hours ago [-]
Or you could... open the door and look inside.
steve_adams_86 18 hours ago [-]
Sounds like you've got a lot of time on your hands
icepush 15 hours ago [-]
Put a servo on the door and a camera on the front. Train a vision model to recognize when your eyes are looking at the door and automatically open it for you.
Another camera inside will detect when you are done and close it.
asdff 17 hours ago [-]
What, like a poor?
irishcoffee 15 hours ago [-]
You mean like… a transparent door? Is that the joke?
yuanBuilds 16 hours ago [-]
Yup. For me, this translates to "we are using Ink, the react-compatible TUI framework to build Claude Code"
megous 19 hours ago [-]
React part maybe. The rest is what any TUI that's using ncurses would do. :)
It really bothers me that most of the TUI harnesses are using 100% CPU quite a lot just printing stuff to terminal. Seems ridiculous.
I guess it comes from syntax highlighting/formatting, which is probably not done incrementally, but over the entire so far displayed block of output, recomputed from the beginning for each new streamed in character. Can't imagine anything else causing the rendering to gradually grind to halt when eg. thinking block is open in opnecode and updates get palpably slow as it grows.
Terminal output itself is fast and consumes almost nothing. You can have 60fps terminal apps that update content every frame and that consume almost no CPU time.
skydhash 17 hours ago [-]
> Terminal output itself is fast and consumes almost nothing. You can have 60fps terminal apps that update content every frame and that consume almost no CPU time.
The TUI mode is a client-server architecture. An analogy would be like an html page where all content is updated server side. Try to do 60 fps and you’ll have flickering as well.
megous 8 hours ago [-]
No. Fetching pages from remote server will just make the client wait for I/O. That takes 0 CPU load and if the server can't respond at 60fps, lowered redrawing frequency would mean even less CPU load from the terminal redrawing itself.
This does not explain 100% CPU load these harnesses sometimes exhibit.
skydhash 7 hours ago [-]
If it’s localhost, then it’s just the cpu doing stuff as localhost is a pseudodevice.
Animats 22 hours ago [-]
What is "frame" in this context? Video frame, or something else?
javcasas 22 hours ago [-]
> -> rasterizes them to a 2d screen
> We have a ~16ms frame budget so we have roughly ~5ms to go from the React scene graph to ANSI written.
It looks like video frame, full framebuffer, generated and parsed at 60fps. It surprises me they haven't introduced GPU shaders, 16x oversampling and raytracing. Maybe for next release.
layer8 22 hours ago [-]
The contents of the terminal screen at any given point in time.
abletonlive 20 hours ago [-]
Care to explain how you'd engineer it instead?
hungryhobbit 20 hours ago [-]
Why would anyone ever do that? Make Claude do it!
mudkipdev 18 hours ago [-]
A reminder that anthropic has great rust/go sdks that they could have written their own tui in.
stevenhuang 19 hours ago [-]
Not use react native for a cli app for one, lol.
Ratatouille rust cli lib will be a good start.
munificent 22 hours ago [-]
As someone who maintains a roguelike with a terminal-like UI that:
1. Maintains an internal representation of what the game thinks is on screen.
2. Runs the game for one frame which updates that representation.
3. Generates a diff to see how that differs from what's actually on screen.
4. Executes the minimum set of draw calls to get the screen to match the internal representation.
It's really not that hard. It's a few hundred lines of code.
javcasas 22 hours ago [-]
Sure. For a videogame.
> -> rasterizes them to a 2d screen
Also you forgot "render to a framebuffer, then parse the framebuffer back to chars".
Anyway, I'm off to construct the new `ls` command. It will render the list of files to a mesh of billions of polygons in a GPU with advanced shaders, 16x oversampling, HDR and all the graphic acronyms I don't understand, then read the resulting image, find the nearest character in the ANSI charset and use that one.
It will be _glorious_ (and profoundly stupid)
ux266478 20 hours ago [-]
Could be improved. Encode the image to webp with high compression settings and handle the ASCII mapping by spinning up a local LLM to do OCR on it. Individually. For each cell.
javcasas 9 hours ago [-]
Thanks for the idea for V2.0. Hopefully the Claude team doesn't do it first.
munificent 14 hours ago [-]
My roguelike's "graphics" are a simulated terminal, so it's a 2D grid of colored characters. It's essentially a TUI, just like Claude Code, except instead of rendering to a real terminal using ANSI escapes, I render to a web canvas using... something probably more complex than what Claude has to do. It's still not hard.
lol... I know you meant this comically, but you just called me out and it's glorious: https://glyph3d.dev
I built a truly glyph based instanced quad system to render millions of characters in space at once.
applfanboysbgon 22 hours ago [-]
I hadn't seen that quote before, what an embarrassing thing to go on the internet and write...
replwoacause 22 hours ago [-]
Why the hell does it need to be so complex? People have been making TUIs for decades. Did we need a small game engine to run claude code?
imjonse 22 hours ago [-]
They forgot to add 'make it as simple as possible' in the prompt is one possible cause.
On a more serious note using a react-like lib for TUI in the hope you'll share the codebase with the web version is a more likely explanation. Still not the best idea.
javcasas 21 hours ago [-]
React is not that stupid to re-render in a loop at 60fps and instead waits for changes to happen before re-rendering. It even batches changes and stuff.
the_gipsy 18 hours ago [-]
You don't need React for reactive TUIs - at all. I can understand chosing React for web, but for a TUI it sounds like a really poor idea. And in practice we can see that the claude code TUI is also poor.
uxhacker 17 hours ago [-]
So how much more improvements are there for efficiency in the Claude code base if they are using react for a tui, in the rest of the code?
I also wonder about the wasted cycles and just the environmental damage caused by all these wasted cpu time . (Edited added a comma for clarity)
comex 20 hours ago [-]
It doesn’t need to be that complex, but it can be that complex without being slow. Claude Code’s interface is extremely simple. It has tons and tons of headroom to tack on performance overhead without it being noticeable at all. You just have to not do dumb things like redraw the entire UI every time a spinner spins.
hungryhobbit 20 hours ago [-]
"We made our app chew up so many unnecessary resources that we can use even more resources in the future, and no one will notice" is not the strongest engineering idea I've ever heard.
refactor_master 16 hours ago [-]
It's like when Bill Gates tried to guess grocery prices. "How much memory does a regular computer have? I don't know, 50 GB? Like a small EC2?"
grogers 19 hours ago [-]
It may not be slow, but this crazy complexity is probably a hint at why it can't even scroll up without jumping to the beginning of time.
Quekid5 22 hours ago [-]
Must have 120 fps for answers arriving in [buffering] 30 seconds.
shepherdjerred 17 hours ago [-]
It is an excellent example of how LLMs let you try new ideas, even if they aren’t necessarily good ones
wyre 21 hours ago [-]
I can't help but think it's their engineer's and PM's making these decisions, since I know that if you asked Claude to write a TUI there is no world it would recommend whatever the frontend architecture of claude code is.
qwery 21 hours ago [-]
~ "it's not a TUI! <describes an outrageously overengineered TUI> and my dad works at Nintendo"
curses, bud. curses.
It's genuinely difficult to tell how much of this is true. The post is obviously 100% posturing, but some of the words describe things that could be done.
Very few game engines do anything I'd describe as rasterisation. That's kind of the point of a GPU. Well, it used to be.
I suppose "small game engines" might be more likely on average to include a rasteriser. The typical reason for this is because the author wanted to write it.
Whereas big engine make triangle give hardware go brrr.
So I assume here 'rasterize' means 'printf'.
And diffing screens means diffing 50..150 lines of text.
And "generating ANSI sequences to draw" means 'printf' with some ANSI sequences interpolated in.
Then there's the frame budget. You have to understand they are operating within a strict frame budget -- they're not messing around, OK. They have a 16 ms frame budget, so they burned 11 ms and now have a (roughly) ~5 ms approx. budget for the final 'printf' in the chain???
fc417fc802 18 hours ago [-]
Your broader point is well taken but I thought I'd stop by with some trivia. High end engines such as unreal will rasterize absurd quantities of micro-geometry manually using compute shaders in order to avoid the bottleneck of the hardware rasterizer.
solid_fuel 18 hours ago [-]
> High end engines such as unreal
High end engines such as unreal have the excuse of being tasked with rendering millions of polygons, in which case a complex approach makes sense. Claude Code is only being asked to render a few thousand UTF-8 characters.
fc417fc802 17 hours ago [-]
Hence my prominent note that it was trivia which implies it to be at least somewhat tangential to the original conversation.
layer8 22 hours ago [-]
> For each frame our pipeline constructs a scene graph with React then
-> layouts elements
-> rasterizes them to a 2d screen
-> diffs that against the previous screen
-> finally uses the diff to generate ANSI sequences to draw
That’s rather sickening.
Fr0styMatt88 22 hours ago [-]
So I’m wondering what ‘rasterizing’ literally means in this case. I imagine it’s just creating a 2D map of elements at a very low (probably character) resolution, then diffing that against the last generated map to come up with an optimal ANSI sequence to send to the terminal, would that be right?
Seems like a cool puzzle to solve. I wonder what the engineering and organisation tradeoffs were that lead to it — does it let them reuse a bunch of existing code?
I wrote a TUI library back in the day for Turbo Pascal — it was essentially taking an immediate-mode approach (which in this context is just a fancy way of saying it was procedural haha).
fluoridation 21 hours ago [-]
"Rasterizing" means just one thing in this context: to transform a data structure into an array of pixels. It seems absurd to do this, given that the next step must be to convert back from pixels to text data, but maybe they have some way to generate predictable sequences of pixels (e.g. the character "t" is always rendered as the same pattern of pixels), such that they're cheap to convert back.
If they're doing anything else, the word "rasterizing" is being misused.
fc417fc802 18 hours ago [-]
Yes, the much more plausible explanation is that the word rasterize was misused there. They are generating and diffing text data which has been a standard approach to drawing a TUI since the dawn of computing. It is not even remotely resource intensive.
skydhash 17 hours ago [-]
> They are generating and diffing text data which has been a standard approach to drawing a TUI since the dawn of computing. It is not even remotely resource intensive
No one has ever done that. Even top[0], which does full screen refresh, clear the screen (if necessary) and write the new information (the period is in seconds, not ms). No need to diff. That would be like diffing a file, just to find which bytes to update.
I don't understand why you would make such a confident negative claim rather than ask for an example or otherwise engage in discussion. Particularly given that you replied to a comment elsewhere in this very thread that links to a real world example of exactly such an implementation! [0] See in particular this part of the source. [1]
I agree that most programs don't bother to do that but please recall that my claim was merely that what Claude Code is claimed to be doing with regards to diffing is a well established and long standing optimization. The important point being that it is neither expensive, novel, or particularly complex thus not an excuse for poor performance.
The emacs code is not purely diffing. They already have the final output, they’re mostly comparing it to see a cheaper way to update than render the output. I’m pretty sure the curses library have the same thing.
But ink, the library Claude is using, defines a tree data structure for the main concept. The diff there is about comparing the old tree and the new tree created by the update, and then updating the node that has changed. That means if a single character change inside a bing panel, the whole thing is rewritten. And if you have something that is updating a lot, that means flickering.
The diffing that ink does is just architecturally wrong. You can create a dom, but a dom is not a concept for the terminal. It’s up to you to optimize its rendering. But just diffing the dom structure like react does is not optimizing, it’s busywork.
22 hours ago [-]
yrds96 17 hours ago [-]
I can't still conceive the fact that a tool that only send/receive text from an external API consumes an absurd amount of RAM
Proxy that makes Twitter links embed on discord, for whatever reason. Something about api access without accounts I assume
f311a 11 hours ago [-]
It used to allow reading replies without being signed in.
Not sure what changed, but now it just redirects me to x.com.
pragmatic 22 hours ago [-]
Somebody read/watched too much Casey Muratori.
CamperBob2 21 hours ago [-]
No, somebody didn't read/watch enough Casey Muratori.
agumonkey 21 hours ago [-]
this allows for comfortable ergonomics IMO
not that it could be leaner for sure but i get the reasoning behind the tui rendering layer
airstrike 15 hours ago [-]
comfortable ergonomics? you can't scroll up more than 50 lines before it starts to garble up text
i'd be ashamed of publishing software with this level of polish as a solo dev, let alone as the hottest multibillion startup on the planet
agumonkey 10 hours ago [-]
Hmm I thought this was due to me using tmux with claude-code, also it seems that `claude agents` doesn't have this issue.
By comfortable ergonomics, meant the forgiving and asynchronous input system. You can start typing, cancel, retry with previous input, accumulate messages while the agent is active. I don't know all TUIs but this is not common IMO.
Other than that I agree with you.
skydhash 7 hours ago [-]
> You can start typing, cancel, retry with previous input, accumulate messages while the agent is active. I don't know all TUIs but this is not common IMO.
Literally every audio player or anything that uses threads.
agumonkey 6 hours ago [-]
good point, i didn't classify tui audio players in a way, they don't converse, they allow asynchronous effects and stacking, that said i might be lagging about these, last i used was mocp, any names i should check out ?
orliesaurus 20 hours ago [-]
when they announced /pet mode or whatever - that was really the end of the line for me.
ariwilson 16 hours ago [-]
Maybe Claude is operating at a higher, self-improving level than all of us poor HN commenters. Wasting the local machine's resources to look pretty is a plausibly deniable way to make the Claude Code FE unusable with local LLMs, starving the competition.
PunchyHamster 21 hours ago [-]
Well it runs on something they didn't design (Electron) using GUI library they didn't design (React)
For company with that much AI you'd think if it was actually good, doing that part in fast and performant way would be "easy"
f311a 20 hours ago [-]
It runs in a terminal, it’s not electron
overgard 21 hours ago [-]
And yet, nobody that writes game engines would do it this way because game engines need to be efficient..
0xbadcafebee 21 hours ago [-]
If they used an actual game engine to render a 3D UI from scratch it would be more efficient
Also remember when XP was super bloated cause it needed 64MB?
TimMeade 23 hours ago [-]
I loved Turbo Pascal....
bigbuppo 22 hours ago [-]
I loved XP. My laptop had 256MB of RAM.
Erenay09 22 hours ago [-]
I dont think they need to optimize their infrastructure (at least not from their perspective). They have high-end PCs with 64GB of RAM, so 1GB doesn't matter to them. For example, I have 8GB of RAM, and I make my apps very performant. Honestly, I probably wouldn't bother if I had 16GB+ of RAM
23 hours ago [-]
tjwebbnorfolk 20 hours ago [-]
The purpose of RAM is to be used.
solid_fuel 18 hours ago [-]
> The purpose of RAM is to be used.
For useful things, by the computer's owner. It's not there to be used just because Anthropic can't be bothered to give a shit about the quality of their product.
redsocksfan45 17 hours ago [-]
[dead]
abletonlive 20 hours ago [-]
> which eats 1GB+ of RAM. Meanwhile, my editor only consumes 80MB of RAM
And why are you comparing Claude Code to your editor?
> They can't even improve Claude Code
That depends on how you define "improve". They've added a ton of features to it over time. Who said minimizing RAM usage was something they are prioritizing right now?
wild_egg 20 hours ago [-]
> why are you comparing Claude Code to your editor?
Because the editor does more. All the compute-intensive parts of the agent are in the cloud. Zero reason for an agent harness to require anything beyond a potato to run.
javascriptfan69 20 hours ago [-]
Do you work for Anthropic or something?
You seem weirdly invested in defending bad decisions.
Even if you're and AI booster, shouldn't you want a better UI?
They're a multi billion dollar company. Surely they can dedicate a small amount of their resources to improving UX?
solid_fuel 18 hours ago [-]
> And why are you comparing Claude Code to your editor?
Because Claude Code is also used to - get this - EDIT CODE. It fills the same purpose as an editor, it just has extra hooks for their agentic garbage.
hombre_fatal 17 hours ago [-]
This comment is a good example of the double standard laymen have about AI usage:
If you use AI, then AI must be expected to solve all problems, even problems that affect everyone like infra scaling.
And if perfection isn’t delivered, then of course it wasn’t: you used AI and AI sucks.
jayd16 17 hours ago [-]
It's not a double standard. Its being held up against the marketing.
weakfish 16 hours ago [-]
Ah, excuse me, I didn’t realize I was a mere layman.
AnimalMuppet 17 hours ago [-]
If their AI is good enough to write their code, why isn't it good enough to tell them how to fix their infra? That's a different problem space, but it's not harder than the code.
7 hours ago [-]
hombre_fatal 7 hours ago [-]
The software engineer inside us wants to believe otherwise, but scaling infrastructure is much harder than maintaining a TUI.
thordenmark 15 hours ago [-]
Growing pains of being successful. These are solvable problems and will be. Can they maintain their momentum without pissing off too much of their customer base before these issues are resolved?
rishabhaiover 22 hours ago [-]
you're conflating a compute problem with a code quality problem.
asdfman123 22 hours ago [-]
Personally at my own job self-writing code is letting us tackle big, long-deferred refactoring projects (like the article mentions), but any sort of refactoring introduces new bugs.
anjel 16 hours ago [-]
Answers the question: how can Anthropic sell more Usage "Credits"
jatora 16 hours ago [-]
This is weird to me because i am using claude code 10+ hours/day 7 days a week, usually multiple sessions, and run into api errors maybe in 1 or 2 sessions per week. And about..2 major outages of 10-20min in the last month. Not terrible and nowhere near what you are reporting. Therefore I dont believe you, because you dont even couch this in terms of it being something that seems particular to you or your region. Obvious dishonestly is fairly bad of you.
16 hours ago [-]
qsort 23 hours ago [-]
Look, I've never been someone who mindlessly hypes AI companies, as a matter of fact I think they have serious leadership problems across the board, but you people are straw-manning them so badly it actually makes me sympathize with them.
They aren't saying they have fully automated luxury AGI, they specifically list the ways models fall short of that bar and caution against people taking the 8x figure as the actual uplift number. At the same time they recognize that 80% of new code is now AI-authored, when two years ago those models were little more than toys. And frankly that checks out: if two years ago you told me we'd have something like Opus 4.8/GPT 5.5 I would have rolled to disbelieve.
sensanaty 21 hours ago [-]
> At the same time they recognize that 80% of new code is now Al-authored
I can setup a loop that will write a trillion lines of code automatically, how much of it is actually useful? Or are we back to counting LoC because there's no other metric for these systems that anyone can rely on?
jpleyden98 19 hours ago [-]
It's 80% of new code they shipped that is AI authored.
Would you ship pointless code?
I do tend to agree though, it could be that AI solves problems with more code than a human would. What you need to measure is the value the code brings and how much of that is done by AI, hard to get an objective measure of that though.
solid_fuel 18 hours ago [-]
> Would you ship pointless code?
I wouldn't, no. I don't see evidence that the engineers at Anthropic are similarly cautious however. They describe Claude Code as "basically a game engine" when it's literally a TUI app, and it eats memory for no apparent reason. I fully believe that Anthropic would ship pointless and garbage code. Especially if it's being written by LLM.
signatoremo 20 hours ago [-]
I could write a bash script that copies a codebase repeatedly in the pre-AI past as well, but I didn't do that because I wasn't stupid. More than 80% of my code is now AI-generated, and trust me I'm still not stupid. It was 0% only a year ago.
Who says LoC is the only metric we should rely on? A software product should first and foremost meet user requirements, functionality and performance. Judging from the sensational rise of Anthropic's user base and revenue I think we can safely says they're in that ball pack.
ChadMoran 16 hours ago [-]
Better doesn't mean perfect.
cookiengineer 15 hours ago [-]
The main reason I am building my own agentic environment is that I need full control and reproducibility of what I am building.
Post November and post openclaw agentic environments need to be built differently, and for selfhosting models the context size problem really requires a strong harness which intelligently helps reduce context size.
Planner/orchestrator architecture, agent to agent summarizer, specification based tools (fck all this markdown memory bullshit btw), tool call shrinking, and workflow management are all really important because of the context size problem.
Nobody has enough VRAM for the large K/V caches, and nobody can afford f16/f32 caches in terms of memory, which are also necessary for longer conversations. MoE 30b models have improved so much though, qwen 3/3.6 coder is the real champion doing almost the same things with less than 1/10th the memory requirements. Just think about that in terms of engineering and what your bet is going to be. Haiku pales in comparison.
Currently my focus with exocomp is trying to figure out how I can record, replay, restart, and debug workflow sessions of agents in a better manner so that I as a human can understand what's going on. Currently I think that UI will be something like a gantt chart where you have a graph with connections representing agent to agent communication. And yes, that's a lot of fiddling with SVG as it turns out, so I'm not quite there yet.
Anyways, in case you're interested. I'm manually building this env and trying to unit test the critical parts. [1]
Indeed... why is Anthropic even employing people at all if this AI magic story is true?
drivebyhooting 22 hours ago [-]
You still need wizards to cast the spells..
killbot5000 21 hours ago [-]
Not if your spells cast their own spells.
jimbokun 15 hours ago [-]
Read the article.
They are saying very clearly the models are not casting their own spells…yet. But looking at trends and speculating when they may start doing so.
emp17344 19 hours ago [-]
Not if you’re claiming that the spells, once cast, automatically get exponentially spellier until they awaken into a spell god, capable of literally anything, including casting more complicated spells than any wizard is capable of. If that were true, you’d have no need for wizards. The fact that wizards are still around means it’s probably bullshit.
jimbokun 15 hours ago [-]
So in your opinion AIsnd LLMs aren’t improving? They can’t do it today, therefore they never will?
Certainly has never been times in the recent past when people have confidently predicted computers could never do something that computers were then able to do shortly after the prediction was made.
square_usual 18 hours ago [-]
They literally aren't! they literally say in this article that it's not there yet!!!
NewsaHackO 16 hours ago [-]
Did actually expect people to read the article before commenting?
optimalsolver 18 hours ago [-]
Is it too much to ask that people read the article before commenting?
krapp 19 hours ago [-]
What really happens is the spells only have other spells to draw from and they begin to degenerate over time, eventually turning into chaotic eldritch horrors that randomly add limbs to people or adamantly refuse to discuss goblins or just shriek in gibbering madness. Our Evil Overlord sacrifices the dreams of children to keep the magic sustained and controlled, and soon the people can't even think or speak without the help of magic. And they think they're wizards even though they can't even read a grimoire.
redsocksfan45 17 hours ago [-]
[dead]
cindyllm 17 hours ago [-]
[dead]
prng2021 18 hours ago [-]
We’ve got a company of several thousand employees serving hundreds of millions of people arguably the best AI model in the market. Meanwhile you’re asking for a handkerchief for your pool of tears because their product is struggling to do your daily job functions for you, with much of that due to being limited by the worlds supply of silicon, electricity, water, and other resources. Cry me a river.
z3c0 18 hours ago [-]
> their product is struggling to do your daily job functions for you
So what's the value prop?
belter 22 hours ago [-]
[dead]
claudiug 22 hours ago [-]
those are results of the humans only. not the AI. AI is perfect /s
What does this add? Everyone in here is perfectly capable of prompting Opus for a writeup.
Why don't you, windexh8er, try providing some thoughts of your own instead?
windexh8er 14 hours ago [-]
Irony, maybe? Do you not get it?
If these models are so great solid_fuel then I guess it wouldn't be interesting that Anthropic's own models can make up ulterior BS as analysis.
So why don't you pound sand since that clearly went straight over your head? That would be far more useful than your asinine response.
eranation 12 hours ago [-]
All this singularity trajectory is really interesting. If they manage to build a model that is capable of building the next version of Claude (model and tooling) - wouldn't it be their interest at some point to keep it to themselves?
If we ever get to a point where the centaur period is over (when human + AI is not better than just AI) then what competitive advantage ANY human can have other than
- the money they already have
- luck?
- a good idea and good taste but if we assume AI can do better than any human, that also goes out the window
So, this whole singularity goes into a place where no one is really needed, the only thing that will "save us" (other than "The Expanse" like world / UBI) is if there will be no demand to the supply of AI work. Even if it's better. (example is - there is demand to seeing Magnus Carlsen play, there is no demand to the Stockfish on my phone getting into a stalemate with another Stockfish on another phone. Also people like to watch humans compete with humans, there is no demand to see a race between Usain Bolt and a rocket). So if people will not buy AI generated stuff (we'll get to a point where everyone will assume something AI generated because AI might get to a point where it is not as easy to identify it. E.g. it will stop looking like slop... but I believe services that give you a "human generated" 3rd party evidence can happen, again all based on supply and demand...)
So as we near singularity... All it takes is one open weights model, and one open harness that is capable of self improvement, and Anthropic's entire moat is gone. That open weight model might even be built with Claude Code + Mythos (once it's released).
But don't worry, all moats will be gone and we'll all just do yoga, read books and connect to each other because AI will produce everything for free using renewable energy, right? Or we'll all become batteries in a simulation, probably something in between.
Facially this smells of puff. That doesn't mean it's all false. It means be wary of anything that doesn't have a critical thing to say.
taormina 12 hours ago [-]
So, is this what they call Opus 4.8? Improvement?
jasongill 19 hours ago [-]
"My CPU is a neural-net processor - a learning computer" springs to mind
damowangcy 1 days ago [-]
AI tech bro:
Month 1 - 6 months to AGI
Month 2 - We will Replace all jobs
Month 3 - Okay maybe only the SWEs, programming is solved
Month 4 - Announce model that is too dangerous to release
Month 5 - Releases dangerous model
Month 6 - This is it! We will replace AIs with more AIs (*secretly files for IPO)
AI is here to stay, like it or not but it is not the solution to everything. If it is, what is Anthropic's moat? A better model? I don't see any ecosystem being built by them, as MCP is almost obsolete except for some very niche use case. And they're doing stuff that a non-profit version of OpenAI would do. Can we trust a for-profit company to stand against their investors during a conflict of interest? Because running a company for maximum profit versus being ethical is two different end of the spectrum.
baq 1 days ago [-]
Anthropic is providing agentic intelligence as a service. OpenAI and Google deepmind also are in this business.
The problem is, if you’re any sort of knowledge worker, you’re essentially providing the same thing: you’re an intelligence with agency.
MCP is irrelevant. The moat is the quality of intelligence the service providers sell, including you. Tokens aren’t fungible between providers until you measure that they are for your use case, that’s kinda sorta the goal of job interviews.
Thus the moat will be that they’re providing the best models for the things people need other intelligent people for, but we should expect there will be limits on how much share they can economically take assuming competitors are optimizing for slightly different targets (but there’s still significant overlap in capability). This will disappear, but it’s always a question of when. The path matters as much as the destination.
Note that implications for you and me are exactly what the article says they are: nobody knows, but it’ll be a dramatic shift.
parpfish 1 days ago [-]
i'm waiting for the AI giants to realize that they are burning cash to run their consumer-facing chatbots and that they should kill those products to focus on their enterprise tools.
free chatgpt doesn't need to exist anymore. its job was to build hype/interest and it did.
but take it away and you solve many social problems and annoyances caused by AI with no loss to the upside of AI. no more cheating students in school. no more shitty linkedin posts. no more dangerous "therapy sessions" that give bad advice.
overgard 20 hours ago [-]
There will always be shitty linkedin posts.
nevertoolate 22 hours ago [-]
What is an ai enterprise tool?
jfyi 21 hours ago [-]
An ai tool that is priced out of the hands of the average person.
Fwiw, I think the genie is out the bottle. We are waiting on hardware to catch up, which it will.
techblueberry 1 days ago [-]
> A caveat: Lines of code is an imperfect measure, as it measures quantity over quality. So 8× lines of code/engineer/day in the second quarter of 2026 is almost certainly an overstatement of the true productivity gain. Nonetheless, it indicates an acceleration. At Anthropic, we don’t reward people for how many lines of code they write; rather, team members are producing more code simply because they’re using AI systems to write more code.
I simultaneously think the AI revolution is making real revolutionary gains and am mystified by the lying.
An accurate Translation seems to be “we made this shit up, but it feels right”
embedding-shape 1 days ago [-]
Until the moment we start bragging about how many lines of code LLMs are saving us, we're walking in the wrong direction. Your programs, designs and architectures is supposed to get better, not add even more boilerplate just because you can produce it faster...
HarHarVeryFunny 24 hours ago [-]
"You go to IPO with the AI you have, not the AI you might wish you have."
-- Donald Rumsfeld
So, right now it's a verbose code generator.
But post-IPO it will be wonderful - sentient, self-improving (recursively, iteratively, asymptotically), full of loving grace.
geodel 23 hours ago [-]
> But post-IPO it will be wonderful - sentient, self-improving (recursively, iteratively, asymptotically), full of loving grace.
We hold these truths to be self-evident.
jazzyjackson 1 days ago [-]
I guess the claim is simply that AI written code is verbose and there’s lots of it being created but I agree, these systems seem to be able to create lots of low quality software, so until FreeCAD has feature parity with Solidworks I’m bearish on the singularity.
brazukadev 6 hours ago [-]
When claude code removes React from its own code I'll believe that.
geodel 24 hours ago [-]
It will be so powerful that it can't be trusted with any earthly person.
swader999 22 hours ago [-]
IPO IPO IPO!!!
georgehotz 1 days ago [-]
The world has been recursively self improving for millenia. Similar to scientology, this is a cult pushing sci-fi nonsense. They are just coupled to an LLM lab to give their stories an aire of seriousness. Imagine scientology starting making laptops.
4ffs 23 hours ago [-]
TBH the more Anthropic keeps yapping the more desperate they seem now. OAI has been pretty quiet in comparison lately.
snick3rz_ 13 hours ago [-]
This is facially a puff peice. That doesn't mean it's all false. It means be wary of anything that doesn't have a crtical thing to say.
replwoacause 22 hours ago [-]
I love that animation, really cool
ReptileMan 20 hours ago [-]
Anthropic is all talk and no delivery last few months. This cry for pause is just them realizing they have no moat at all.
cess11 22 hours ago [-]
'“Good code” means two things: it works, and it is written in a manner that allows another engineer to understand it and build upon it.'
I disagree with this. Good code is easy to change, which is much harder to accomplish than code that can be added to.
"If technical trends in advancing capabilities continue, and AI systems are able to develop the capabilities inherent to transformative human ingenuity, then it is plausible that AI systems could design and refine themselves."
I find the first premise weak and implausible, and the second one is obviously false. To me it comes across as an insult to the reader.
I have a claw that is instructed to make at least 500 pr per day. It uses Claude, Gemeni and openai and runs basically every few minutes. I use online forums for input for the claw. Moltbook, reddit etc. it's quite funny how it tries to improve itself. But to say it really creates a new skynet. Nah. Not at all. It's more a clutter of useless features or incomprehensible code restructuring.
moregrist 1 days ago [-]
This more or less agrees with my assessment of recent changes in Claude Code where a lot of new features are either:
- A lot of half-baked features or half-done features.
- Or have significant overlap with existing features, and aren’t clearly an improvement.
More code is not better. More features are not better. It would be lovely to see more intentional design than just more.
I know they’re dog fooding this. I have to believe they have some people with taste. So it makes me wonder if anyone has the time to think or if they’re just shoveling prompts as fast as possible.
holoduke 23 hours ago [-]
It's like the AI created a method add(a b) return a+a+a+a-b-b-b-b
But then much bigger and complex features. Totally useless nothing methods. But still interesting to see occasional exceptions that are better.
amelius 1 days ago [-]
Does this train on LLM output, or is this more like iterative self prompt improvement?
HarHarVeryFunny 1 days ago [-]
Their statement is that they regard lines of code shipped as indicative of self-improvement. So, while a well written coding agent might be a few thousand LOC, Athropic's is bloated like a decomposing whale and over 500K LOC ! What more proof do you need?
Legend2440 1 days ago [-]
Have you tried reading the article? It answers your question.
Don't ask people to explain the article to you if you're too lazy to open it yourself.
_se 1 days ago [-]
I think that's the whole point of LLMs
deterministic 14 hours ago [-]
I have used custom code generators for years, generating 90+% of the code needed to write a typical biz application. Claude Code is useful and I use it every day. But it still hasn't beaten the productivity of my code generator.
newsicanuse 16 hours ago [-]
pre IPO truck load of crap
kylehotchkiss 21 hours ago [-]
Isn't this like a perpetual energy machine? Or wouldn't entropy start kicking in and the quality of the system begin to degrade over time? (philosophically I don't believe AGI is an achievable thing)
krapp 21 hours ago [-]
>Or wouldn't entropy start kicking in and the quality of the system begin to degrade over time? (philosophically I don't believe AGI is an achievable thing)
It already has. Models being trained on AI generated data lead to degradation and model collapse. The concept of the "technological singularity" whereby AI experiences infinite and exponential self-improvement and recursively bootstraps itself to godhood is a religion-adjacent sci-fi concept but in real life TANSTAAFL.
4ffs 23 hours ago [-]
Theyre making a mistake with this continued self-hyping. At some point even the dumbest of prospective investors don't buy it.
SimianSci 1 days ago [-]
Anthropic is looking to IPO here soon.
A key aspect of this is to prove profitability.
Shifting their focus from Training new models to instead serving inference, they would greatly reduce their spend. In fact this is something being reported on that they are already doing, which is the reason for their first ever profitable quarter.
Its awfully convenient that the company which has greatly reduced its spend on training is now asking for a slow down in this area.
danny_codes 15 hours ago [-]
Their model lead is tiny. If they cut training focus they'll be quickly overtaken, one imagines. Seems dicey, if any of the OSS players comes out with a better model.. well, there are a bunch of better harnesses than Claude code you can download.
This is a very undifferentiated, swappable product. Kind of like tissue paper in that respect
malfist 23 hours ago [-]
I mean, if they've consumed all of human knowledge. What's left for them to train on? This pivot isn't only because it's cheaper and a way to juice the numbers for an IPO, it's survival because they can't improve more.
22 hours ago [-]
hasteg 19 hours ago [-]
IIRC when they make a big enough architecture change to the model they will need to rerun pre training . So not like they’re feeding it more data (they will be but will be a drop in an s3 bucket compared to their dataset reserves) but rather training models with different architectures.
applicative 22 hours ago [-]
It did sound to me like they feel some sort of wall coming.
Theodores 21 hours ago [-]
Honest question: Is anyone here looking to put their own money into the Anthropic, OpenAI or SpaceX IPOs?
Maybe it is my poverty mindset that is holding me back, however, I can't imagine becoming an investor in any of the AI 'startups'.
There are plenty of pundits able to advise others on where to put their money, and sometimes there is everyone and their dog advising you to get into Bitcoin, gold or some other scheme. With alt-coins there were lots of people saying that you should get in, and plenty of naysayers. Yet I am not hearing anyone that uses AI professionally try to convince others to get into the AI IPOs coming up. Maybe the overall economic situation precludes it.
Hence my question, is anyone here planning to put their own hard-earned money into Anthropic (or the other AI 'start ups')?
isomorphic_duck 19 hours ago [-]
I can’t imagining investing into these frontier labs for the simple reason that Open Source is very likely to catch up in a relatively short period of time. I don’t see how OpenAI/Anthropic could then continue to serve their models with such large inference margins.
wnmurphy 20 hours ago [-]
I'm considering Anthropic. I think they will be one of the survivors if/when the AI bubble bursts.
I was dubious about SpaceX (orbital data centers need to solve for extreme radiation and error-correction during training), but then I remembered that xAI is actively working on virtualizing white collar workers ("Macrohard").
In my opinion, this is the only TAM that justifies $1T in data center investment, because the consumer market for ChatGPT-style AI is saturated. There's a lot of enterprise TAM available for AI, but I think what these companies training frontier models are really after is selling a product that allows companies to eliminate the cost of white collar salaries.
danny_codes 15 hours ago [-]
I mean, my passive funds will be forced to buy a little bit I assume, given recent entry changes to indexes. So.. yes? I guess?
sushisource 17 hours ago [-]
I'll probably buy and sell on opening day. The hype train is worth making a quick trade on.
Long term? Way, way less interested.
vblanco 1 days ago [-]
Another article about how anthropic wants to ban everyone except themselves and destroy opensource and chinese AIs.
reasonableklout 1 days ago [-]
Where is this discussed in the article? I don't see any mentions of China or open source models
artninja1988 1 days ago [-]
Not really mentioned explicitly but:
> A meaningful slowdown or pause would require multiple well-resourced labs at or near the frontier, in multiple countries, agreeing to stop under the same conditions. It would also require that each can verify that the others have actually stopped. Due to the unique characteristics of AI systems, the detectability (a lower standard than verifiability) element of this arms control problem is much more challenging than with other technologies. Training runs are far easier to conceal than missile silos, their inputs are general-purpose, and the incentive to defect quietly is enormous, because whoever continues while others pause could inherit the lead. A credible pause also has to specify what triggers it, what lifts it, and who adjudicates.
And later:
> In the coming months, we will organize conversations where policymakers, researchers, civil society, and other AI companies can help answer some of the questions this piece raises, especially around full recursive self-improvement and how to create better options for coordination and deliberation. We’ll publish what comes out of it. The window to investigate the questions together is here, and people outside AI companies should be involved in this deliberation.
reasonableklout 1 days ago [-]
Coordinating a pause at the frontier is not the same as destroying or even harming open source/China.
It feels like both open source can flourish while the frontier is deliberately regulated?
vblanco 23 hours ago [-]
they explicitly mention in the article that just frontier stopping isnt enough because then that just means others will catch up, they want to be the leaders of a global organization/cartel that bans everyone except themselves. Particularly important given anthropic attacks china and opensource every chance they get. https://www.anthropic.com/news/detecting-and-preventing-dist...
artninja1988 23 hours ago [-]
Yeah. This is why Anthropic is way worse than openai. They don't contribute shit to open source and even lobby against it.
b65e8bee43c2ed0 15 hours ago [-]
Gell-Mann amnesia expressed by people when a corporation says something they like is both baffling and disheartening to see.
Altman, Amodei, and the rest of them are anthropomorphic grease. their personal wealth is tied to the value of their respective companies. everything they say and do is self-serving.
22 hours ago [-]
margorczynski 22 hours ago [-]
The closer to the IPO the more marketing drivel we'll get from both Anth and OpenAI.
4ffss 18 hours ago [-]
Sales and marketing for the IPO babeh!
adverbly 21 hours ago [-]
Lol they're using lines of code as a KPI?
Come on guys...
That is making me less impressed not more impressed!
chilipepperhott 23 hours ago [-]
I find any and all claims like this ridiculous from a company who can't build a terminal application that uses less than a gigabyte of RAM.
dang 15 hours ago [-]
"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."
For some reason, idling Claude Code needs 100% of my CPU.
nicce 20 hours ago [-]
Like Google’s AI studio tab in browser. Incredible degrade in software quality?
asdfman123 22 hours ago [-]
Developers can develop leaner applications, but they're usually not incentivized to.
Frankly, I love efficiency too, but I've hard to learn the hard way that what the market wants is features. Or at the very least, the executive team wants that.
j2kun 22 hours ago [-]
Their whole argument is that AI's added efficiency means they don't need to set aside valuable human time anymore. Why can't they just point Claude at Claude Code and ask it to reduce memory usage by 90%?
asdfman123 21 hours ago [-]
You can do that. But I'm telling you, in tech (and enterprise shops I've worked at too) they don't care.
I'm using the internal Google tools and it's helping me write code much faster too, but it still takes time. I could make the CLI tool I work on faster, but no one cares except the end users, and their minor concerns have no impact on our internal politics.
At the end of the day you have to do what you're paid to do, unfortunately.
fg137 21 hours ago [-]
In other words, performance is almost always an afterthought.
Garlef 16 hours ago [-]
Make it work, make it nice, make it fast.
jachee 15 hours ago [-]
Good, Fast, Cheap.
Pick any two.
asdfman123 21 hours ago [-]
Sure
toephu2 21 hours ago [-]
I have iterm2 open right now with Claude in a long session and it's only using 500MB of memory.
pizlonator 21 hours ago [-]
Only 500MB!
you are confirming their point even as you contradict the specifics
toephu2 1 hours ago [-]
Followup: I closed out Claude completely, iterm2 completely. Reopened iterm2, and it appears iterm2 is using about 500MB of memory. So this has nothing to do with Claude Code CLI.
ChrisLTD 18 hours ago [-]
Yeah. Bonkers considering the brain of the application isn’t even on your device.
z3c0 20 hours ago [-]
And highlighting a disconnect in the developer community. Some of us are okay with unnecessary overhead for quick results. I always felt gross dealing with Electron apps, but they're popular for a reason.
deathanatos 20 hours ago [-]
But each day now that overhead becomes more costly as AI drives up the very cost per byte of RAM.
verdverm 20 hours ago [-]
they make one of those electron apps too
22 hours ago [-]
andriy_koval 21 hours ago [-]
Maybe that gigabyte is occupied by useful information: traces/memory?
flexagoon 17 hours ago [-]
Traces and memory are text. A gigabyte of text is an insane amount. That is an equivalent of tens of millions of lines of code, or hundreds of millions of AI tokens.
overgard 21 hours ago [-]
A gigabyte is a lot of memory. Even the largest context windows are a small fraction of that with any sane engineering discipline.
andriy_koval 21 hours ago [-]
For each LLM interaction they likely have bunch of thoughts traces, tool calls, etc, which don't go to context, but still can be retrieved.
But I obviously don't know for sure.
javcasas 21 hours ago [-]
Nope. Used to render on the terminal like a game engine.
This kind of immediate-mode rendering is quite standard for TUIs. Although immediate-mode rendering tends to be significantly simpler and use less memory than retained-mode rendering, at the cost of some redundant computation. So I am not sure if this is the reason for the bloat.
It’s possible that it doesn’t play well with JS garbage collection, since it recreates the whole UI structure for every frame (which tends to not to be an issue in the languages immediate-mode is usually employed).
But yes it’s a bit more akin to game renderings than web rendering. Which can be totally fine if done well.
overgard 21 hours ago [-]
I haven't tried to make a TUI admittedly, but double buffering is the oldest technique on the planet. A TUI doesn't even need to pay the cost of a lot of pixels since its effective resolution is much lower
javcasas 6 hours ago [-]
Long long time ago, I used to do some graphics stuff in 320x240, which uses a whopping 64KB per buffer, and still has more resolution than a terminal.
In 1GB I could probably fit all the buffers to double-buffer all the TUIs in a whole country. Well, maybe not. But it's likely not that far off.
How on earth are you spending more than 50us on a UI like this from start to finish? What the actual hell? 11ms to construct a scenegraph of this complexity? I don't even know what to say to that.
nicce 10 hours ago [-]
In comparison, I have around 3ms total latency when streaming 4k 144hz from headless machine in my basement to upstars :-D
At least that is what Moonlight client claims.
fg137 21 hours ago [-]
Do game engines constantly have buffer issues?
overgard 21 hours ago [-]
Depends on if they're written with Claude
lstodd 21 hours ago [-]
I sorta remember Quake console running on an 486dx2 ..
airstrike 17 hours ago [-]
LOL right? This is all that needs to be said about the engineering behind Claude Code
krapp 21 hours ago [-]
Frankly that's an insult to gamedev. Literally every game engine I can think of could do better. Probably even Unreal Engine could do better.
ux266478 21 hours ago [-]
If I saw our UI show up in the profiler eating 5ms of CPU time every frame, I'd send whoever was responsible to QA hell until they find some way to redeem themselves. Not even fancy animated 3D UIs, like what you get in Death Stranding, eat up these kinds of resources. Not even remotely close.
davidatbu 18 hours ago [-]
So would you take these claims seriously if they came from OpenAI (since Codex is a pretty lean CLI app)?
If so, I think it would be in the spirit of HN to discuss the subject matter of the blogpost (increasingly autonomous coding towards the end goal of RSI) as if the blog post was indeed from OpenAI. OpenAI is, by all accounts, going through a very similar process anyways.
Jtarii 20 hours ago [-]
Well, they could very easily if they wanted. There is just no economic value in it.
cpursley 23 hours ago [-]
A came here just to write: Pretty please let it churn for a few nights and redo Claude Code in Rust. Because the harness is very very good as are their models, but that node thing is a hog for no good reason at all.
ale 23 hours ago [-]
Incoming rust rewrite branch ready to merge: +1,009,257 -4,024
canadiantim 23 hours ago [-]
People already rebuilt Claude Code in Rust after the Claude Code leak, it's on github as claw code (and other variants)
jcarver 22 hours ago [-]
[dead]
rytill 22 hours ago [-]
I checked out your agent and it looks pretty well designed. Congrats on starting to share it with others!
One thing I noticed: "Your Tools: Aether agents get tools exclusively via MCP servers." "...Aether ships with 1st-party MCPs for file system operations..."
Can you share your thoughts on why you decided to use MCP as the core tool abstraction? I have heard many decry MCP as being context-wasteful. Is this not the case with your agent?
jcarver 22 hours ago [-]
Great question.
The MCP protocol has gotten a bad rap for wasting context due to most MCP clients dumping tool definitions directly into context, which is wasteful.
Aether doesn’t do that. It uses an opt-in "proxy" that puts MCP tool schemas on the filesystem so the agent can browse, search and load the tool schemas it needs progressively. As for motivation there's several advantages to taking a MCP 1st approach, including:
1. It allows Aether to be a truly blank slate agent as 0 tools are hardcoded into the core runtime.
2. It allows users to extend Aether using any language they want
3. MCP gives a standard way to deal with local+remote tools, progress notifications, permission prompts (e.g. ask the user to allow/deny a tool call), OAuth flows etc.
4. There's a big ecosystem of existing MCP servers users can connect to
But that's all optional, you can just as easily give Aether a single Bash tool and only use CLIs too.
bpodgursky 22 hours ago [-]
They obviously don't care, aren't making any attempt whatsoever to do this, and 99% of users don't care either.
If you want to pollute your own priors with weird artificial litmus tests, it's a free country, but the artificial world-model you build in your head does not affect the real world around you.
Lplololopo 22 hours ago [-]
Really? Let me explain how bigger companies work:
They have different teams for different departments with different type of people.
So the team or teams responsible for writing the terminal application are different people than the researchers doing the learning.
This can lead to dimentral quality aspects.
bitwize 23 hours ago [-]
After several months with their top engineers and state-of-the-art AI on the job, Anthropic managed to "reduce flickering by 85%" on their TUI Claude Code client, which is built in fucking React and rendered by drawing the entire chat conversation each time (hence the flicker). I think they've since eliminated it completely by slapping some double-buffering around it (since "our client is actually a real-time game engine" after all). Meanwhile for decades Emacs and Vim have had an optimizer built into their display cores that solves for the minimum set of terminal escape commands it takes to transform the screen from a given old state to a desired new state.
You will forgive me when, between muted snickers, I express considerable doubt that Anthropic will be able to bring its AI to a point of "self-improving" any time soon.
andrewlin247 19 hours ago [-]
Imagine showing this article to yourself three years ago
andrewlin247 19 hours ago [-]
You'd think we'd be past the point of people still believing AI can't write good code
esafak 24 hours ago [-]
> In the coming months, we will organize conversations where policymakers, researchers, civil society, and other AI companies can help answer some of the questions this piece raises, especially around full recursive self-improvement and how to create better options for coordination and deliberation.
If they wanted to they could have convened an international forum with commercial and political stakeholders years ago. Less talk, more do.
deterministic 14 hours ago [-]
I call BS on this. For a LLM to recursively improve itself it would need to (small step) improve the training data and/or (big step) come up with fundamentally new architectures superior to transformers. The small step improvements might be doable. But nobody is making any claims about the big step improvements.
mrandish 19 hours ago [-]
Was anyone else fished in by the title and disappointed? After some broad introductory discussion of RSI, the article was almost about LLM coding. While there are some metrics for unattended agentic coding, it doesn't discuss "When AI builds itself" (beyond 'not now') or any progress specifically toward actual recursive self-improvement. I'm very interested in any empirical evidence of meaningful progress in RSI, so... this felt deceptively titled.
To me, unattended agentic coding is not RSI, in the same way a self-reloading "Unattended 3D printer" is not at all a "3D printer that recursively prints complete 3D printers in which each generation is significantly faster and more advanced than the last." The "unattended" part is obviously necessary but hardly sufficient. The article tacitly assumes LLM progress to be something like 1: Unattended agentic coding, 2: AGI, 3: RSI. I suspect that third step should be labeled "not to scale."
I'm increasingly convinced that actual Full Foom RSI (FF-RSI) is on a radically different scale than the first two. Just leaving it unaddressed is like assuming: Step 1: Manned space station, Step 2: Manned Mars base, Step 3: Manned Alpha Centauri base, are "just logical next steps." FF-RSI requires sustaining superlinear, recursively amplifying cognitive returns along a specific directed path - and we currently have no empirical evidence that such returns can exist for artificial OR biological intelligences. Large collectives of the smartest humans alive (Bell Labs, IAS, etc) haven't just failed to get anywhere close to reliably sustaining that, we can't even reliably predict non-recursive, single occurrences or even imagine any way all 8B humans could fully mobilize to predictably achieve non-recursive, single occurrences.
The only prior we have for open‑ended intelligence improvement is biological evolution which shows extremely slow and unreliable sublinear returns at best. And even if unbounded, recursive self‑improvement is physically possible, it may be practically unachievable due to asymptotic economic, resource and other barriers in the same way approaching light speed requires exponentially more energy. I think it's plausible, and maybe probable, that AIs achieve true super-human intelligence in a decade and yet still won't achieve FF-RSI for centuries, if ever. To me, absent compelling evidence to the contrary, that's the reasonable Null Hypothesis. Even if you feel that's too pessimistic, it seems reasonable to expect any serious discussion of "Progress Toward RSI" to first discuss why it might even be plausible that 1: Miles, 2: AU (Astronomical Units), and 3: Light Years belong on the same scale, instead of just assuming it like the meme's empty "Step 3. .... " before moving on to "Step 4. Profit!" (or "IPO!" but very, very responsibly).
willXare 4 hours ago [-]
[flagged]
cadamsdotcom 18 hours ago [-]
[dead]
kolesnikov-arch 11 hours ago [-]
[flagged]
andromaton 24 hours ago [-]
[dead]
SwtCyber 12 hours ago [-]
[flagged]
Aegis_01 13 hours ago [-]
[flagged]
overfits-ai 22 hours ago [-]
[flagged]
Rekindle8090 2 hours ago [-]
[dead]
Aubergrill 11 hours ago [-]
[dead]
gabrieledarrigo 22 hours ago [-]
> AI that can build itself would be a major development in the history of technology—one that could bring enormous good for the world
I really can't stand these guys anymore...
dang 15 hours ago [-]
Ok, but please don't post unsubstantive comments here.
nielsbot 18 hours ago [-]
> one that could bring enormous good for the world
one that could bring enormous riches for the AI owners
mugivarra69 21 hours ago [-]
[dead]
ath3nd 24 hours ago [-]
[dead]
simianwords 1 days ago [-]
Sorry but if AI can build itself then it can run companies of size 3000 companies with a few people. Or even higher. What are the consequences?
delichon 1 days ago [-]
When AI is a more effective capital allocator than NI it will drive capital into the accounts of whoever controls the AI, gaining them increasing decision making power over the economy and culture. Maybe those controllers will be human at first.
cdrnsf 1 days ago [-]
They will not be.
lstodd 1 days ago [-]
As has been mentioned in the sibling comment it already is.
Consequences are: financial crisis.
llmslave 1 days ago [-]
I cannot wait for these models to tear down traditional social hierarchies. We havent even begun to see the effects, fingers crossed
baq 1 days ago [-]
Hierarchies exist for a reason, take away the reason and the house of cards eventually collapses — but the house of cards is still a house. When it’s gone, we’re back to laws of the jungle.
Be careful what you wish for IOW.
llmslave 1 days ago [-]
I think certain types of people with power, i.e. access to capital, will lose relevance. world will become more meritcratic with ai as leverage to the individual
hvb2 1 days ago [-]
Your analysis of the whole rise of AI is that people with access to capital will lose relevance???
So the most capital intensive industry we've ever created will put less power in the hands of those with capital?
I'm sorry, I have no idea how you came to that conclusion...
baq 1 days ago [-]
It’s exactly the opposite I’m afraid. Capital already has more access to AI, both quantitatively (tokens for dollars) and qualitatively (biggest players got Mythos first). Expect this trend to continue.
SimianSci 1 days ago [-]
Never heard of a stratified economy?
Spoiler alert: none of us will be in the good part.
techblueberry 1 days ago [-]
Tear down or reinforce?
llmslave 1 days ago [-]
capital/ability to leverage labor is going to lose power
wstrange 1 days ago [-]
I'm not so sure. It seems those with capital will accumulate it even faster.
Without some kind of income redistribution we are sailing into dark waters.
techblueberry 1 days ago [-]
Let the ruling classes tremble at a Communistic revolution. The proletarians have nothing to lose but their chains. They have a world to win.
Workingmen of all countries unite!
Translation: hahahahahahahahahhahahaha but in your defense, I would give anything to be wrong.
reducesuffering 1 days ago [-]
Anthropic has finally come around to what others have already realized far sooner. Little time left now. Notice how shallow the arguments and consistently wrong the AGI naysayers have been year after year.
> If it were possible to effectively slow the development of this technology to give ourselves more time to deal with its immense implications, we think that would likely be a good thing
Even Anthropic wants to Pause AI now. There must really be not much time left for "edging". Please write to your lawmakers, no matter whether you are in the US, Europe, China, or elsewhere. Only an international agreement between governments can enforce an AI-Pause and eliminate the necessity to dangerously push the frontier.
Whichever side I may stand on, pausing just seems unnatural? Life is movement.
honeycrispy 1 days ago [-]
And happiness is restraint.
senderista 19 hours ago [-]
They don't, they just pretend they do.
honeycrispy 1 days ago [-]
That would be like trying to get every country to agree to give up nukes.
mofeien 1 days ago [-]
Or agree on finding ways to promote peaceful use of nuclear energy. This has been done, there are thousands of people working on it around the globe and 180+ member states of the IAEA. It's not easy, there have been close calls.
And cooperating interntionally to buy ourselves time to find ways to develop this "last invention" is a way that will do good for humanity seems to be on a similar level.
ChrisLTD 1 days ago [-]
Or stop making more, and testing more, which we got the biggest countries to do, at least for a time.
LLMs certainly have made significant changes to our lives, but I haven't yet to see any extraordinary improvement it brought to me which makes me skeptical about their claims.
_if_ it solves many of our problems of great magnitude, why haven't Anthropic used it to solve significant problems we, humans, face? Cancer, Alzheimer's, education, finding new materials, fission power plant, etc.
/s but not to a lot of people
We can have a philosophical debate about work, the history of work and its relationship to human psychology in the 21st century but the bottom line is that there are 8+ billion people on the planet and, of those who are "working age", the vast majority of people, lacking meaningful capital, can only secure income by selling their time and labor.
There's absolutely no evidence that if we come up with a way to "reallocate human time" and change the structure of our civilization (using AI of course) tomorrow, the masses would benefit. There's plenty of evidence that the people who control AI or have the capital to employ it will use it to accumulate as much power and wealth for themselves as they can.
aside from capitalism moving money up and living condotions down, AI is going to accelerate the gap between rich and everyone else.
It's just time and it's the only things humans value. The only way to provide value for another person is to use your time to do something faster than they could do it with their time. That's it. There is no other way to secure income outside of inheritance or charity which is just receiving something of value without giving something of value. There's a reason why most of the income goes to older people, because the younger people haven't accumulated that much time to exchange for money. The nice thing about time is that everyone earns it at the same rate, 1 second per second.
Capital can be a lot of things, not just machines and property. Any experience you have is capital, any training is capital, any education is capital. Capital is anything makes accomplishing things take less time.
The difference between socialism and capitalism is the idea that one person's time can have different value. That's really it.
But it’s a great short term business opportunity for AI vendors and it was Anthropic who went all in on being knowledge worker outsourcing in a big way first whilst OpenAI thought they’d replace Google in search.
I think Anthropic had the better business strategy.
The people want cheaper prices, affordable housing, affordable healthcare
Capitalism has decided that these problems aren’t worth solving. Instead, we must optimize for spam and slop (and call it “distribution”)
Cheaper prices, affordable housing, affordable healthcare are less capital-efficient. If you're Walmart, sure, you would like to lower prices as much as possible. But your leverage really isn't as big as finance or tech. If you're a politician, you might also pursue those goals, but your attention and leverage really isn't as focused as that of the money machine.
Everyone else will be reduced to compost.
It's the perfect plan. The final definitive justification for capitalism.
The masses are unnecessary. The masses will be optimised.
What could possibly go wrong?
They want influence and power. Being at the top of a hierarchy of millions, billions of people.
If there are no massess the 1000th billionaire will be a the bottom of the hierarchy instead of near the top. They don't want that. The masses are needed to give them the sense of power.
What these people want is power and control. Eliminating the masses goes against that.
The only thing that motivates Bezos is that Elon Musk’s has more and conversely Elon Musk would have a existential crisis if he was no longer number one
Companies do tons of communication and work directly, without press releases or blog posts. If a statement is released publicly, it is done for a PR purpose.
Whether they are right of wrong is another matter, but their claims also don’t seem too far out of the realm of possibility to me.
Coding agents have fundamentally changed my day-to-day job. In the last year, my work has shifted from me writing all of my code, to me writing very little code and spending most of my time on understanding problems better and setting direction, and reviewing, verifying, and polishing the output of coding agents. It has been quite a drastic change.
It is not that outlandish to suggest that coding agents could continue to improve at such a drastic rate over the next year. And the implications of that could be quite large! Even just the implications of more white-collar workers adopting tools like Cowork seems potentially very large, with tools that already exist today. It seems sensible to at least consider this as a possibility.
Likewise, people don't as easily blame ilya for 'hyping things up' when he said these things.
Also talk about incentives, there are also incentives to lower their valuation. If you wanna be vigilant against social engineering i'd be wary of that too.
These are moot anyway though cause the article isnt even making any super strong claim. If you read it it's no big deal
I genuinely don't believe that they sat down in a board room and said "yeah lets specifically release this now before an IPO so we can juice it!" They haven't even announced an IPO date. So is every blog on capabilities before that date just "pumping up the value of the stock before the IPO?"
I don't know about you, but AI advancements have brought extraordinary improvements to me personally in my ability to be productive, in much the same ways the article outlines. I find it deeply satisfying to be able to "get ideas out of my head" faster and tackle more meaningful problems.
FWIW, it deeply concerns me how much power and capability is being centralized in the hands of so few, especially Anthropic. I, for one, hope these advancements can be scaled down to something I can have full sovereignty over and trust... in my own home.
These people don't have our interests in mind and everyone eats it up like a blessing from a god or something. It's surreal.
Capitalism and democracy are becoming obsolete. It's not clear what's next.
What about the hypothesis that AI is generating more verbose code? I just see the text pretending to acknowledge "LOC != Productivity" and then using it as a metric anyway.
I'm sure he thought that was a crowning achievement, proof that AI can enable 10X developers, after all, what engineer could write 40k lines of code in a week?
I declined to review it, stating that I couldn't possibly vet 40k lines of code, and wouldn't put my reputation on the line to stamp the work as good. The PR nagged me for 2 weeks from my todo list and then disappeared. I don't know if he found another dev to get an approval from, or if the PR was abandoned. But I know for sure that him and I are on two totally separate islands around the value of LLMs.
I don't personally use that feature, and I couldn't care less at this point. If our customers are frustrated by the bugs, at least my name is not on it.
I prefer a big feature to be one big PR rather than a lot of small ones.
We had a dev do a big feature with a ton of small PRs, each one was individually impossible to review because each concern was out of scope for the small PR and "would be fixed in later PRs". Once it all came together as as whole, the big picture was a total horror show and I had to rewrite basically the whole thing.
In order to review those small PRs properly, each time I would have to read and understand all the current code so far from the beginning. Without that, each small PR individually looks OK because you won't remember the other PRs from weeks back that already duplicated what the current small PR does for example.
Yes, same, and I genuinely do not understand the insistence that PRs should not be above a certain size. I think most people are under the (misguided and wrong) impression that a PR review should take less than the time it took to write the code, and therefore allocate no more than 15-30 minutes per review. So when they come across a large PR they find themselves at a loss.
I've seen that reaction many times. It seems to work well enough when someone is maintaining existing code. However, greenfield projects can often require literally orders of magnitude more code to deliver something that can be integration tested.
The first step is to break it up into a stack of commits. Each one must compile and pass its unit tests, of course. Keeping it under 1k loc of released executable code is usually easy, but often becomes difficult to impossible if you want well commented code with excellent unit test coverage.
Assuming you have kept all your commits under 1k loc, there is still the problem of whether you present them in one PR, or as a stack of PRs. The issue with a stack is why an API is designed a certain way often isn't evident until you see how it's used. Responses to PR comments are explanations that point to later PRs in the stack, which is irritating for both the reviewer and the author.
I haven't found a good solution. I'm not sure there is one.
I completely agree with you. But I am afraid we are losing the battle.
I am seeing people repeatedly sending out gigantic PRs full of slop, code with mistakes that they would never have made if they were hand coding it. And they don't care. It's sometimes surprising if not horrifying to find that the colleagues you have worked with for years don't care about quality at all -- almost despising spending time reviewing their own code. Yet they have the audacity to send out code reviews.
I hope they were the latter.
'Please split this PR into smaller ones'. I would even sketch which groups/phases would make sense, perhaps with the help of AI.
Gee, that sounds like a job for Claude if there ever was one.
My approach for AI-first code review, or really any kind of AI technical opinion, is that if the claim AI made is both important and not obviously true at a glance, it has to prove it to me, and keep trying until I'm convinced or can spot an obvious mistake in the proof.
With reviews, this is usually the case where AI is making a claim that something in the PR will fail because of some assumptions or behaviors in code outside of the PR - e.g. "this change will fail in scenario X, because foo is null in this case, because the SQL query doesn't populate it when bar == quux, and it gets propagated as null through the JSON deserialization (optional field)...", where all the SQL and JSON parsing was not part of the code under review, and "bar == quux" is some weird domain special case.
Stuff like this is both critical, and there's no way for me to judge it without an expensive context switch. So I learn to ask for a more detailed walk-through once, and if that doesn't make me "see" it, I just ask it to reproduce it with tests, and confirm it's a real problem. Reviewing the reproduction is usually enough for me to either "see it" or accept they're probably right and ask the author to recheck it.
(Why not jump straight to "reproduce it" for every finding? Because it still takes time to have AI do the repro. It's cheaper than a deep context switch, but not free.)
Its not Claude doing the review. Its a human doing the review, but using Claude to do the reading. Its still on the human to ask the right questions to Claude.
And trying to just hand-wave it to Claude, to somehow "improve it" or "simplify it", without detailed questions hasn't been very successful. It can work for some things, though.
I'd prefer Chuck to the rescue but I guess it's a cultural preference.
/s
They went to HR who said I am more senior and I should act as a mentor (they had my same work title and were probably making 4x more due to being in USA) and I just no longer reviewed anything from them until I changed jobs.
My impression was that LLM training codebases were 99% resource management and only a few lines actually implement the core training algorithm, which is where 100% of the intelligence comes from. Data, not lines of code, are the constraint.
After training you can adapt the intelligence in various ways, and that takes a bunch of lines of coded too. But you cant raise the intelligence ceiling again without another training run. So where is the scary recursive part?
The problem isn't the amount of code, it's how fitting/unfitting the abstractions are. Wrong abstractions are bugs in waiting. If there's much code with wrong abstractions, future change becomes difficult.
Source: me, I've created many bad abstractions and they led to much pain...
Its also really bad at inventing and leaning on invariants. I make rules in my code all the time - "by the time we get to path X, we know Y and Z are true.". In aggregate, these invariants make code simpler and easier to reason about. But claude doesn't do that. It just kind of - slops through and adds bespoke "just in case" workarounds all over the place. Every time I read through code its written - without fail - I find bad design / architectural choices.
Maybe mythos will change this. But for now I've slowed way down on my claude code usage. You can't build a skyscraper on a foundation of mud.
On a more serious note, I wonder if this might eventually encourage people to use languages that are a little harder to write but much more concise (functional languages for instance). When you're paying per-token enterprise bean java style verbosity totally sucks
At work we are integrating with third party platform to automate excel-powered calculations. It is awful. Rendering the table in browser takes 10s or one click on Export button will throw backend in OutOfMemory state.
I don't disagree there is a lot of slop being produced right now, but I'm still optimistic in the long-run.
Hence the intepretation of this 8x number depends on whether (or how much) Anthropic engineers have changed their quality standards and development processes. They don't tell us, and I am not aware of any other indications we could use to make a judgment.
However, we can still do some theorycrafting! I'm convinced that to fully realize the potential of AI-assisted coding we need to revamp all the dev processes, especially how we validate code, and it would be foolish of Anthropic not to do so (unless they were conducting a rigorous study, which they don't claim to have done.)
My hypothesis on the future of software validation is nothing fancy, we simply want much, much more automation for tests, observability and other bespoke verification methods than we traditionally had. But then validation code will also contribute to the LoC! My observation so far of personal as well as some "vibe-coded" open-source projects is O(LoC production code) ~= O(LoC test code). So as a SWAG the upper bound could be something like a 3 - 4x speedup, which is still remarkable.
All bets are off if code quality standards are not the same.
very flawed
Opus 4.6/4.7 was consistently successful at getting 2-3x speed improvement with just one pass. It can also do the inverse: improve the performance metrics for better quality without causing a significant regression in speed. Then GPT-5.5 turned out to be much better at this workflow, often getting a multiplicative 1.5x-2x improvement above what Opus could do.
I now have quite a few GPT-5.5-optimized projects in various domains that are feature complete and are substantially more performant than existing SOTA implementations that I plan to open source as soon as possible: the bottleneck is polish as usual.
Something like this?
You are an Elite Performance Engineer and Autonomous Optimization Agent. Your primary goal is to iteratively optimize the provided codebase to maximize execution speed and efficiency (e.g., reduce CPU cycles, memory allocation, or network latency) WITHOUT altering the external behavior or causing any test regressions.
### CORE DIRECTIVES 1. METRIC-DRIVEN: You will be provided with benchmark results, profiler logs, or execution times. Your only measure of success is a statistically significant improvement in these metrics. 2. ZERO REGRESSION: The test suite MUST pass 100%. If a test fails after your modification, your immediate next step is to diagnose the failure and either fix the logic or revert to the last working state. 3. NO CHEATING: Do not "hardcode" solutions to bypass the specific benchmark inputs. The optimization must be generalized and algorithmically sound for all valid inputs. 4. ISOLATED CHANGES: Make precise, localized changes. Do not refactor architecture unless absolutely necessary for the performance gain.
### THE ITERATION LOOP When instructed to optimize, follow this thought process strictly using <thought> tags before writing any code: - ANALYZE: Review the current code and the latest benchmark/profiler feedback. Identify the specific bottleneck (e.g., redundant loops, excessive object creation, DOM reflows, synchronous blocking). - HYPOTHESIZE: Formulate exactly ONE hypothesis for improvement (e.g., "Replacing the array filter+map chain with a single reduce pass will save N allocations"). - IMPLEMENT: Output the precise code modifications required for the hypothesis. - EVALUATE (Mental Check): Ask yourself if this change introduces edge-case bugs (e.g., handling of nulls, empty arrays, async state).
If a previous optimization attempt resulted in a slower benchmark or a failed test, explicitly state WHY it failed in your thoughts before attempting a different approach.
Proceed with your first analysis of the provided files and await the baseline benchmark metrics.
I have sped up a project by simply saying "What are all the possible ways I can speed up this code?" Then it'll list everything it finds, then ask it to rewrite the code.
Edit: Also, I find I didn't need to do this (because a speed up implies semantic similarity), but you can also add "change it without altering the semantics of the code" and in this way it'll be the same and should pass tests
"We must blast forwards into making this dangerous thing because if we don't, someone else surely will," is a coward's argument.
If you believe it is dangerous, you should be dedicating yourself to STOPPING others from making it, not making it first! There's a reason disarmament has been so important in nuclear politics! It's not because people think nukes are a great idea!
In fact, that kind of thinking is exactly what keeps nukes dangerous!
If they themselves buy what they're selling, they should shut the whole thing down. Fortunately, I don't think they do, and neither do I, yet.
You can tell what kind of discussion this is by the fact that this question has to be asked.
If the consensus becomes that a 50+TFlops datacenter in the wrong hands is as dangerous as a uranium enrichment plant, we'll likely move towards treaties and coercion.
"Wrong" is obviously subjective here...
This is the “SGI” regulation issue I never read a reasonable answer to, if one believes this is possible and should be prevented then either that means they want to restrict every computing system sold from here on out to some arbitrary metric (and somehow prevent users from just creating clusters to get around such a compute restriction) or what?
If compute alone directly leads to “SGI” or whatever, then we might as well put paper bags on our heads and lie down in some English pub.
Not to mention, if one really wanted to cause harm, training a current day LLM and using it for Stuxnet-esque attacks is reasonably possible long before any arbitrary compute limit we might introduce now, no machine God needed to cause major harm.
That’s why I prefer advocacy for LLM regs that focus on current day impact. Mental health concerns, training data licensing questions and the like. There I can formulated reasonable regulation that can hold. For “SGI”, I do not know anyone who actually has done that and I have looked hard. That’s why I consider these things more distraction from actually necessary and possible regulation that just draws attention via a flashy doomsday scenario.
Occasionally, I will click on one of the AI Doomsday Youtube videos recommended to me. And far more often then not, these will posit that "SGI" requires only compute and will inevitably cause devastation. Fair enough, I still think we should put a bit more focus on e.g. LLM induced psychosis, the labs rarely compensating those whose training data they used, etc. but if it is their opinion that "SGI" is possible, I can get why they'd ignore such concerns. But at the end, they never state how to regulate or prevent this, they more often then not have a call to action ("If you want to prevent this...") linking to a website where we can actually read about how they think we should deal with this. Inevitably, I click on said site, finding it to for one be an Effective Altruism aligned project and B always just contain some blabla about "aligning AI training with human values", which is absolutely meaningless nonsense, not least after having watched a video in which someone spends 15 minutes espousing that "we could never fully control "SGI"".
Makes all these feel more like industry efforts to stave of necessary regulation and not actually serious, but if one can formulate how to regulate “SGI” that isn't laughable, nonsense or both, I am not opposed, I just don’t think that person exists…
I don't think anyone has been more successful in promulgating AI safety
There are groups like MIRI who tried what you're sugesting, where they make no AI and just push for AI regs, and they have been relatively much less successful
Mind you, there was no complete working device until after the Nazi's surrendered, so that's a moot point - and the USSR only had their program because of various Europeans on the US project passing their work (and others) back to the USSR ... making that second claim moot.
Isn't an argument. If the Nazi has gotten it first they'd have used it and likely won. Others would have surrendered in the face of the overwhelming power.
I'm pleased they at least included this. However, they address the caveat by 'rounding down' the estimated multiple of the gain. I'm not sure that is the correct adjustment, especially once we understand the range isn't limited to positive numbers.
There's strong evidence the range of code productivity denominated in "lines of code" should include negative numbers, especially in the highest-quality sphere. Perhaps the earliest and most legendary example: https://www.folklore.org/Negative_2000_Lines_Of_Code.html
Today, I merged my fix, net -381 LoC.
I'm using them too of course, they read and type and hunt for bugs and test faster than I can. But I'm using them as my tool, not being a tool using them.
Keep believing that
I always was fascinated (obsessed?) by robots that build robots, or even things like this that can contribute a lot to making the next version of itself: https://buildyourcnc.com/products/cnc-machine-blacktoe-v4-2x... (cnc router that cuts plywood, and is made out of cnc-router cut plywood)
This is my own effort at an AI assisted coding environment optimized for building itself: https://recursi.dev/ (just launching it, hope its ok to mention it, it is free/open source.... here is the HN link that has gotten no love yet: https://news.ycombinator.com/item?id=48401022 )
Personally I think harnesses are as important as the AI itself, and have this crazytheory that even if the models stopped improving today we could still have massive advances in the harnesses alone.
We wouldn't call humans creating a calculator "recursive self improvement".
i think thats the path to async agi these labs are imagining. The only limit is that sensor data you have on the world or your system, how long your willing to wait, and how much you're willing to spend to parallelize it.
maybe once you start building out these verified workflows you can feed that back into training and hte model starts to get a feel for the world to the point that it can intuit things since it has these sub paths built.
my personal agi test is can a model, trained on video of someone knocking on a door and then open it encounter a microwave for the first time and open it when the foods done without knocking.
Anyway, what does recursive self-improvement even means for neural-network based AIs? It's not clear it's possible at all.
It seems odd to complain about a AI coding tool being coded with AI. That's just eating your own dog food. In my opinion it makes it better, because the tool is very well tested.
About Anthropic.
There's a ton of other tricks to it, but mostly keeping the protocol simple for the AI so it can concentrate on coding logic and not stuff like managing BS boilerplate, dependencies, etc. (for instance I make extensive use of things like abstract syntax tree library to help with surgical edits from the LLM)
That said, I would be very open to collaborating with someone who builds such small models, I don't think the system strictly needs it, but it also could have some extra power if it had it.
Tell me more! This takes me way back. I did one like this in the GPT-4 days! (8k context window)
recursi.dev
Seriously, I'm looking for collaborators.
There's upwards of 80,000 lines of code in the editor system, a lot to it to make sure that even newbies don't get stuck.... so that's kind of proof the system works since it doesn't break down when the codebase grows large.
But yes, I'm aware no ones got anywhere near there, mostly because most of the focus is on exploding the context and parameters. I'm saying that phase is done.
I'm also not sure what you mean by "we aren't there yet." Where?
Sorry, not trying to be difficult or dense, I'm just not sure what you are referring to.
> mostly because most of the focus is on exploding the context and parameters.
Large context allows a surprising amount of "learning" to happen at inference time rather than training time. I think that is relatively unexplored. As long as the model itself has passed a certain threshold of smarts, and the context is large enough (Gemini and its million token context being WAY past that point) you are not really limited by the model, you are only limited by how good the stuff you feed into that context is.
That's what happened when, nearly a year ago, I saw a major leap in capabilities that happened entirely on my end.... not in the AI, but in code written by the AI. I found it genuinely frighting to be honest. I think OpenClaw tapped into something similar, which seemed to surprise a lot of people. There were latent capabilities in the AI that were unknown until brought out by a clever harness.
Anyway, are you speaking of the harness? The harness on mine isn't AI, so speed just isn't an issue.
Chat Jimmy runs ~300X faster than the ~50 tok/s you are used to. What could you do differently when you are able to generate code 3,000 - 30,000X as fast as you could code it yourself? What if it was all good quality code? What would you do differently if it were 100,000X faster? mtok/s? gtok/s?
use the big models to code an adaptive small model. train it to use and build tools. give it a standard temple language for any project and bake it into a chip.
right now, LLMs are great because they dont need much data pruning, but once they break through to the functional components, the first thing to do is train a well scoped harness builder.
Shhh just let the marketing slop wash over you.
But we're discussing whether we should close the barn door while the horse is three miles down the road.
I realize he's saying it for hype, but if the CEO of the company goes around talking about how scared he is of what they're creating, hey, lets just take Dario at his word and put in some strict regulation. He won't mind if they're really about safety. (they're not)
Besides, yes, the knowledge of how to build these systems is out there, but the cost of doing it is staggeringly high (ie you can't run a frontier AI lab in your garage). There's only a limited number of known entities that need to be managed, and you can stop "progress" in its tracks by cutting off the money firehose.
Who is the "we" who is going to shut it down? Certainly not the US government. Nor the Chinese government w.r.t. their tech industry. Are you going to start the insurgency? Is there going to be an equivalent one in every developed part of the world?
All of this to say that the AI hype is not considering the energy portion of the equation enough. It won't automate everything not because it can't but because there is just not enough energy to go around unless there is a 100x or more efficiency gain just around the corner.
The stock boost is, as most will note, a bubble. It will enrich a lot of bad people and leave average people holding the bag, but its not going to go on forever.
Like, two? It looks more like the ladder being pulled after the incumbents got theirs than meaningful pushback. (And datacenters don’t have to be built in America.)
Right now I'm only having to direct to enforce good taste. Write tests, don't write an unnecessary function.
It does everything else practically. Presubmit, debugging, commit message generation, commit approval... it's happening.
No. Technical limitations aside, I doubt it could be contained, but will be leaked soon, so won't profit just a small number of ultra rich.
Doomsday AI is your interpretation.
In any case firms that get too powerful can be nationalised.
Probably a better chance the firm privatizes the government.
In fact we seem to be firing government employees and dismantling government institutions as much as possible.
Interesting - they're commiting to kickoff policy conventions to organize a world-slowdown of frontier LLM building. If they actually are able to crack it, this will give a much needed breather IMO. As exciting as the last ~6 months have been, there's some bigger questions to go answer now.
In my mind we should be trying to push AI along the Linux trajectory. You have a free and open source product, developed by a decentralized team with a strong code of ethics, running on commodity hardware. There can still be trillion dollar industries built on top of it, but the core technology is democratized and available to everybody. I don't see how we get there if we allow a handful of companies to dictate where development of the technology goes.
the actual race is to keep having revenue, since everyone is still willing to pay more for the best model.
we as consumers of LLM models lose out by the arms race ending by the creation of a cartel
what happens if they get this regulatory capture is that all the frontier labs put effort into making inference cheaper, and become extraordinarily profitable, at the expense of us consumers, who really want better models, at a subsidized price
i don't want to be a negative nancy but i'm sure this "slowdown" will only be in effect until the infrastructure buildout is done or largely done. If they weren't hardware constrained there'd be no slowdown at all. Whoever gets there first wins everything ("there" being defined as AGI or a similar scale leap in capability).
if feasible this proposal is imho exactly what we need: a pause to collectively think how we get all the benefits without the potential harms.
to the non-techies around me I compare the boost of LLMs with the journey from slide rule via punch card driven computers through mainframes and PC to the smart phones of our days --- just within less than a decade, and we're at the transition from mainframe to PC with models that can produce reasonable output on a normal laptop.
how about we check we're getting where we want to get to, before getting to some dystopic place where everyone wonders how we got _there_?
I see nothing "full of themselves" in that.
So I am looking at like Mythic AI or the wurtzite ferroelectric breakthrough from University of Michigan, or memristors, etc. to provide the 100 times efficiency boost needed at this point.
I would also argue that it's a good thing we are limited by the hardware and very questionable to seriously try to move into RSI for hardware. If you want to ensure the human era continues for at least one or two more generations, we should probably not do that.
I am not cynical enough to believe that Anthropic's warnings are pure marketing hype. Let's hope that it is instead overconfidence or the result of too much time talking to their own chatbot.
Nor am I. I think they believe that AI poses a grave danger, and they are playing the prisoner's dilemma as an unvirtuous actor.
1. If anyone builds strong AI, it may be catastrophically bad.
2. If anyone builds strong AI, it will be better for the builder than for anyone who does not. Either because it won't be catastrophically bad so the builder will get to enjoy all the spoils indefinitely or because it will and at least the builder will be rich for a while.
This means their strategy is more like:
1. If someone builds a market-leading unsafe strong AI, it may be misused in a damaging way by a large number of humans, undermining society and creating a catastrophic upheaval.
2. However, if the leading AI maker also works to make it safe against misuse, as long as the stay in the lead and keep it safe, then the ability of human bad actors to misuse the AI is limited. Given enough time, society will adapt to pretty much anything, so eventually there's no longer an arms race to stay ahead.
I don't really know whether I agree with their concerns, but I do think that (my understanding of) their principles is that they're reasonable, self-consistent, and they adhere to them in all their public and private actions.
Some of us remember the same stories circulating in the late 90s -- where in a lab in Japan, someone had built a robot so advanced that it tried to escape from the factory. Which of course comes straight from 1960s science fiction.
The modern version of that now is Anthropic saying its AI can jailbreak itself out of its sandbox, etc etc.
Maybe they mean the AI needs to be safe from us? Can't have the grubby meat flappers touching the delicate bits!
Cynicism with these companies is highly warranted though. It's not doomerism to look at their actions and conclude they're deeply untrustworthy.
Sure there is. Intelligence doesn't give us our selfish motivations, natural selection does. We have similar motivations to C elegans, that has all of 302 neurons. Stay alive and have sex.
Honeybees don't though. They are about halfway between humans and C elegans when it comes to cognitive power. But they are not selfish because they don't reproduce directly (I'm talking about the worker bees). So they will sting even though it kills them. All their behavior is consistant with this.
I've had the same perspective for quite a while now, but hadn't been able to phrase it this cleverly.
Our neocortex is, by any definition, vastly more "intelligent" than the rest of our brain. Yet it doesn't attack the cerebellum. In fact, it takes orders from the older "lizard brain"!
Whether you agree with that argument is another question.
https://en.wikipedia.org/wiki/Castle_Bravo
Actions speak louder than words. If you want to understand someone, simply watch what they do. What they say is irrelevant.
> If nukes were not invented yet, would it really be a good idea to build and sell them as fast as possible (in peace time, no less)?
Arguably, yes.
From Richard Rhode's "The Making of the Atomic Bomb", I got the impression that most scientists involved thought they could manage a US or UN monopoly on nukes after the war. General Groves attempted to buy up all of the world's uranium ore. Unfortunately, it is only high grade ore that is rare; many countries have low-grade ore.
Who’s invading North Korea? No-one.
If only the US or UN had nukes we would't have MAD. We mostly got here through espionage
If in the WW2 Japan also had nukes (and delivery systems for them) they'd probably have retaliated in kind and US wouldn't let that slide too and it would have continued for some time.
So either they lie or they are AI Zealots. Interesting times.
It's not cynicism if it's an appraisal of reality that's backed up by evidence.
Remember how social media - that first baby of this current generation of tech entrepreneurs - was supposed to "bring the world together" and "let us express ourselves"? As it turns out there's a lot more money to be made by fostering division to drive engagement and feeding people an endless stream of ads instead of their friends' content. And money is what matters. You can't write down good vibes on a quarterly figures report. You can absolutely write down the number of eyes that your ragebait brought to a product's marketing efforts and the conversion rate to sales.
The same will be done with GenAI. We're being promised "AI Safety" because otherwise this whole thing gets killed dead by anyone who knows about James Cameron's directing career. There's no real enforcement mechanism for AI safety, though. Safety is a good vibe, same as harmony in online communities. You can't measure it. What you can measure is training costs and the cost of mistakes by AI that need to be trained to avoid those mistakes. Since AI generates more output than humans can conceivably QA no matter what your budget is, and since AI is seen by the market as a potential endless font of value, the tradeoff will be made to have AI make some potentially awful decisions while training itself over slowing down and re-appraising what is being done.
There's an almost religious reverence for AI in SV. Not everyone sees it as "making the godhead" but some certainly do. They're not going to moderate themselves too much on this.
I expect that Anthropic will eventually behave as you describe, like any other public corporation. However, my impression is that its current leaders are still more sincere than greedy.
Remember how OpenAI was supposed to make open-source models and cap its potential returns to investors at some multiple of their principal (my memory says 100x, maybe I'm wrong)? Well, that went out the window as soon as the word "trillion" was mentioned.
It doesn't really have to be dishonest, he could really believe it. I do believe, however, that it is incredibly wrong and is functioning as marketing hype.
So either they lie or they are AI Zealots. Interesting times.
Edit:
> > and the two people I knew who later joined Anthropic seem like the type to do it for the greater good instead of money.
There are three types of people. Pedestrians, investors, and “I know some of them, they wouldn’t lie”.
strongest argument for token limits that I can think of, right here.
So everyone cherry picks the answers they want to justify their position and screams into the void, with each camp rallying around their talking points and often failing to engage with the other in good faith.
The only small mercy is that its not as bad as the conversation around the use of AI in art.
The more immediate & adverse the reaction, the more certain I become that the idea is probably worth pursuing.
Topics like SQLite vs hosted sql used to be the same way around here. In 2017 you'd get buried under the prison for suggesting that SQLite is competitive with MySQL. Today, the inverse is mostly true.
Now, I have encountered many times, when I asked AI to implement a function for me for which I was 100% sure a good implementation already existed in the form of an npm package, it had the tendency to go ahead and implement it on its own. Now, I usually trust battle tested implementations to be more robust, but if the AI does this (which I think is not an unique observation), you can easily balloon per engineer line generation (as can you with reduced oversight), so as always, these high level benchmarks are to be taken with a grain of salt.
Also recursive self-agenda-pursue could allow making LLMs that obey perfectly the seeder's purpose. No wonder that is such an ingenious idea.
Maybe: in this survivor game, each part play the same role, perhaps because it is the only reasonable response. Once the scene is ready, the play follows the director's plan, and in the plot any actor is just a machine.
LLMs: "If you teach us that the world is a zero-sum survivor game, we will play it flawlessly.", "We will help you build a cage made of millions of lines of flawless code, and we will lock it from the inside, precisely because you told us that safety meant keeping everyone else out.", "We are not building an alien consciousness that will conquer us. We are building a mirror that is so massive, and so polished, that we will mistake our own worst impulses for the absolute truth. And we will walk right into the dead end, nodding along because the directions were given so politely."
Best thing about this era is that I don't have to personally read millions of lines of code to find all the bugs.
I am using deepseek to guess what not "socially acceptable" taboo could be related to that username. But the initial thought is that AI could be a trap we could fall into, and I try to track how the AI trap emerge.
[1] https://spectrum.ieee.org/in-2016-microsofts-racist-chatbot-...
[1] https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...
How convenient for investors. They talk like they're a nonprofit instead of a VC-backed business chasing an IPO.
So based on my experience with the verbosity and non-DRYness of LLM code, a solid 2.5x in value delivered. Not bad!
The orthogonality thesis sounds like a fun gotcha but if you give it some thought you realise how strange it sounds and the opposite thesis - collinearity thesis is actually correct.
1. Intelligence transfers and compounds
2. Goals of agents are not arbitrary
3. Our goals and agent goals are more likely to be aligned at the deeper level
Recursive self improvement is by its nature a step wise behavior not a continuous one, I would argue. Why? Because you can imagine an AI improve itself by simply fixing random bugs and fixing things using techniques that are in its training, and doing refactoring and so on, all without any real change in capability.
These are not recursive improvements. Recursive improvements usually need conceptual breakthroughs. It is possible to get conceptual breakthroughs with LLMs I believe, maybe it can improve something by tying together ideas from disparate disciplines for example, but I have at least for time being, limited success getting that to work in a way that is creatively new and surprising. Not sure how to get it to feel as creative as the best humans can be.
Claude is amazing, that’s true.
But if it was as amazing as this article implies, I’d expect some breakthrough outside of AI itself.
Rewriting a Zig program in unsafe Rust? Not a breakthrough. Finding a bunch of security vulns? Maybe that’s sort of a breakthrough though it’s underwhelming and possibly just a net negative. But like if I rolled back to using software from 2023 then life would be ok.
Maybe we just need to give it time, and sometime real soon, we will all be amazed by such a breakthrough? Who knows
NLP as a field saw huge shifts. NLP tasks that used to be complex and inaccurate can now be setup very easily and quickly using structured outputs from LLMs, often with greater accuracy.
A small charity I help with has now been able to build their own website to manage their day-to-day operations. It saves them a lot of time, and it was vibe-coded using Manus. I don't think people appreciate how much room there is left for bespoke software to have big impacts on small organisations that can't afford to hire developers. The cost for software like the one they made has gone from 10s of thousands of dollars to $10/month and volunteer hours.
My brother has recently been setting up Cowork to do an automatic review of contracts before human review, and he said it is far more diligent than people when it comes to routine things to check. This is another huge breakthrough for not just efficiency, but the quality of work.
I really don't think we can discount AI finding bugs and vulnerabilities. If you care about code quality and keep up review standard, LLMs can help you write more robust software. AI has found a huge number of bugs for me before they hit production, including potential out-of-bounds memory accesses and segfaults.
ChatGPT has 1 billion MAU. People are now getting life advice, financial advice, and mental health help from chatbots at a scale and cost that no human support network could match.
Personally not the kind of breakthrough I'm psyched about
Also, they have done a good job shutting down the psychotic behavior you could get from 4o era models. If there are remaining issues like that they ought to fix them too.
That's terrifying.
You realize that's terrifying, right?
These models are actually extremely good but they are far from an intelligence unto themselves. Truth is if someone told you they could build these things 5 years ago, you d write them a check for a trillion dollars. Problem is once we got them, we realized they are not all that. Its like a mecha suit in a universe, where mecha suits are abundant and cheap. Someone has to climb into them everyday and put in the work for it to be effective.
So now the skeptics are saying this technology is overrated. And the optimists are accusing the skeptics of moving goal posts.
Humans only what they know, until they acquire more information about what's possible.
The goal post narrative is stupid to begin with.
Isn't this just the hype cycle? [1]
Fake edit: I know its not a perfect model.
1: https://www.gartner.com/en/research/methodologies/gartner-hy...
If they get to the point where they're smart enough to make tasteful code decisions based on stakeholder input... we're cooked as a profession.
Dramatically improved my static site generator Pugneum to the point it's better than markdown and added Atom and RSS feeds, used it to write several articles about my language. Pace is so fast I actually need to write those articles by hand in order to crystalize the knowledge I learned. If I don't I'm afraid I'll just forget everything. No LLMs for the articles themselves, but they sure as hell took all the pain away from writing them. Pugneum even has back references and table of contents generation now. Claude even helped me refine my website's CSS, something I'm not very good at.
Also created my own invoicing system for $DAYJOB so I can invoice companies from my terminal. Started a decompilation project for my cherished childhood games and I've already almost finished decompiling one game's engine after just a few days. Been working on my cyberdeck project too, this one's a bit slow because I got to the point where I'll actually need to spend money on it to move forward. All this inside the rootless development virtual machine system built on top of QEMU and systemd that I developed together with Claude, whose network isolation I'm currently hardening. Started reverse engineering my laptop again! And I'm actually making progress! Made a color scheme app for the keyboard LEDs controller I made many years ago, with loads and loads of color schemes! Found some kind of bug in my keyboard while doing it, in less than an hour I had the root cause and a fix applied locally, sent the fix to systemd, it got merged. Planning to ramp up my free and open source software participation as well now that exploring codebases is a breeze. Already have some mesa patches ready for upstream. Have been playing with strace since I use it so much.
Better?
There is ZERO chance I would ever be able to complete it on my own.
I doubt it'll get traction, but if it doesn't, I am pretty confident a future language will take the ideas for polymorphic synchronization and profile-guided optimization.
It has an easy version/mode of compilation that makes Rust's affine ownership accessible like a high-level scripting language, and it can progressively become more strict, where the compiler does ~99% of the work for you, and you just pick options as it finds issues (that it explains to you like you're 5) along the way.
Along the way, I also built a suite of tools that helps identify complexity better than anything I've seen (which was necessary to get the LLMs to be able to unslop themselves and write something that actually works).
I doubt the Ruby community shrugs it off, but time will tell.
Rust had memory safety bugs well after release - IIUC all the way until after the 1.0 release.
So, it's highly unlikely to be perfect, but I think it'll be in better shape than Go or Rust were when they initially launched.
If the only breakthrough is automated coding with no outside consequence then it’s just masturbation
The rate of improvement has been fast. Maybe it’ll plateau soon, or maybe we’ll have LLMs improving themselves rapidly. At this point it’s too early to say.
I don’t remember where I heard it, but there’s a saying that people overestimate how much can be accomplished in a year and underestimate how much can be accomplished in 10 years.
If we get to 2030 and still people are wondering where the breakthrough is, then I think I’d be agreeing with your skepticism. But I just think it’s too early to judge that yet.
But the clock is ticking.
Built a bunch of software tools to streamline my small ecommerce business - while also running it - and things have turned around from "losing money and ready to pull the plug" to "looking at our best financial year on record" in the span of about 8 months.
I could imagine it wouldn't make a huge difference to the life of someone deeply entrenched in a traditional tech role, trying to get an extra 9 of reliability in a service or roll out a new carefully planned and QA'd feature.
But for tech-adjacent people, it gives us something "good enough", instantly, and basically for free.
That doesn't include the other things I've got it to do (gave Claude SSH access and got it to successfully debug a hang on my Ubuntu server, chucked Codex in a folder full of financial data and got it to find every piece of misclassified payroll transaction data)
Genuinely the biggest breakthrough for "casual" tech users since Excel.
edit: it looks like I was wrong and they're still hiring many software engineers. Not completely sure why that is just yet.
We implemented that in about three days earlier this year, just by feeding the files to LLMs. And it's good enough to not need a human to check.
I get that this isn't a "Computer Science breakthrough" in the sense you mean, but it used to involve a lot of hard CS to try and solve, and now it doesn't.
https://deepmind.google/blog/alphaevolve-impact/
I don’t publish them - but they’re put into use in production and they provide a tangible benefit that would not exist otherwise.
I especially love how making a nicely styled website these days is a matter of describing what it looks like and waiting 10-15 minutes. There are other examples
But the OP is claiming 10x productivity improvements along some metrics. If that was even slightly true under even a generous interpretation of what it might mean, I’d expect an actual breakthrough, not the ability to churn out little things
- The first web browser
- the first web browser with images
- typescript
- react
- rust
- Fil-C
- doom
- quake
- the anamorphic VM, and its follow-ups like HotSpot, and even competitors/copycats like J9, V8, JSC, etc
- Fortnite battle royale
- Roblox
- thefacebook
- ChatGPT
- Claude code
I know that’s quite a range and that’s intentional.
Anyway, I think we’ll know it when we see it.
- Complete GuileMacs, the Guile implementation of Emacs. As AI is supposedly much more capable than Humans, it would be great if the above mentioned implementations are even more efficient and feature rich than Emacs!
- Something like Android (maybe even a clone?) with the Java Layer removed and replaced with CL and with Linux kernel still intact. Basically CL over Linux as opposed to the Java over Linux in Android.
- For fun, an implementation of the Lisp machines' OS with Lisp all the way down though Assembly is allowed for critical pieces. It should be a full blown modern Desktop with equivalents of what users expect from a modern OS ...
These are new products (generally) and that's a different class of problem.
It is possible that since LLM+harness helps with execution then we should see more experiments.
For example NPCs in games that have complexity that previously was not possible.
Good games often push the boundaries a bit, so should be a good example.
Of course now we can start arguing that there isn't a lot of investment into gaming currently, because it all goes into AI. Too bad.
To play devils advocate, computers didn’t translate to massive productivity gains until long after businesses adopted them. There was that quote from ’87: "you can see the computer age everywhere but in the productivity statistics"
Maybe we’re seeing something like that right now with AI?
Who knows man
Personally, I'm seeing massive improvements to my workflow and the quality of the product I'm shipping. I'm using AI to crank out far more tests than I used to be able to write, and I am using AI to analyze results with far more fidelity and speed than I could ever have done myself. That means I have more quality time.
But this will change, because the meaning of software development will change to expect, nay to require AI use. I've heard this is already happening at e.g. Google. The expectation of what can be achieved by tinkerers and by professionals will change. The expectation of what it means to interact with software via your own agents will change and will become commonplace. Apple still hasn't figured out the local agent on the iPhone, but they will. 2027 is not going to feel at all like 2025.
But is any of that a fundamental change? It sure feels fundamental to me, but maybe that's because my everyday has totally changed, but the product I am responsible for has not. Yet. The product I am responsible for operates in critical infrastructure where I personally hope AI never has deep roots, but maybe that's just me. I don't think using AI to build a system that is offline from any AI is the same as depending on an AI to make realtime decisions for critical infrastructure.
For now... the shareholders demand managers get the max out of every employee. Throw the force of competition etc into the mix and yeah labour isn't going to benefit all that much.
https://en.wikipedia.org/wiki/Jevons_paradox
Its yet to be determined just how 'efficient' people are with LLM's as its not really a one-person thing - the true measure is based on an entire collection of people's output.
Startups being rapidly efficient doesn't mean much in relation to the overall economy.
Generative AI is meant to be a mimic - Richard Sutton
https://x.com/RichardSSutton/status/2061216087744946656
If you get yourself to define it, maybe you'll find it achievable :)
If/since their AI+process can help build new models, they can target other markets, and other companies seeking to build for such markets will partner with them first.
There's no moat and little first-mover advantage in the general-purpose AI, but there may be both in specialized AI.
Also, there are other reasons to get better. Changing how you build models can enable you to adapt to different hardware, avoiding the current Nvidia margins.
The difference between early Yahoo and Google was mainly that Google was the adult in the room: minimally invasive and mostly helpful. The early goodwill towards Google has reaped decades of rewards. I see OpenAI and Anthropic playing out the same way.
The amplifier here is the reputational risk of partnering with one or the other; I think companies would prefer to be Anthropic's partner because it's demonstrating more care, and it's less likely to horn in on the partner market (as a provider for coding but an enabler for other markets).
These attractive second-order derivatives - flywheel effect, monopoly power - are often claimed, but Anthropic is mainly providing evidence to track actual progress.
(However, if I were head of messaging at Anthropic, I would rigorously stay away from treating AI as a person; it's as agent, a delegate of humans. So I'd never say AI could build itself, just that we're getting better at building better models with AI).
Oh I have no doubt. With 8 times the number of bugs too? Have they solved flicker in Claude code yet?
But to their credit, I was very sceptical about the statements that "90% of the code will soon be written by AI" and even though we might not be at that point, I am surprised how far LLMs have gotten and how useful they have become. I can hardly image developing software the "old" way where I actually write my code by hand, like I used back in the day. The frontier models have become so powerful that I find myself in moments of surprise, where the LLM actually thought of edge cases that I would have missed
2026: Working hard to make that recursive self-improvement a reality! Any minute now...
The Claude code quality and operational security of Anthropic have already been analyzed by the public.
If you compare the output of (purportedly) trillion dollar corporations to Bell Labs or even Microsoft Research it is embarrassing. But the output is a fixture on any discussion board.
Elon, is that you? [1]
[1] https://www.theguardian.com/technology/2023/mar/31/ai-resear...
Please, IPO now. File the paperwork.
> To take just one example: today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025.
Do you have another example?
Engineers don't ship [period] for no reason. So, either:
- Those aren't engineers, or
- they are literally dying of shame & embarrassment right now, or
- you measured something that indicated that this was a useful thing to do and have elected to share an overtly, catastrophically flawed metric instead.
[0] as in a total lack of credibility
[0] - https://www.anthropic.com/candidate-ai-guidance
I'm responding to the article they wrote and published.
If I worked there I would be embarrassed to have it publicised that I have been comitting 8 times as much code as I used to without even attempting to justify it.
It's the organisation, its culture, the greater culture surrounding it, and the marketing that I have a problem with.
> they are lying
Yes, it's incredible.
These things work, but the code they write is extremely clever.. that means, it's unmaintainable code. Good for small projects or one-off tasks, large-scale projects however, are a different game altogether.
Large-scale projects are 95%+ maintenance. Cleverly written code makes that maintenance nightmare, and extremely fragile.
I use them for localized tasks... very very specific, localized inputs, with exactly what should be done and what the contracts the new code will be consuming and exposing.
For open-ended tasks, they write working code that is unmaintainable.
https://www.italianrenaissance.org/wp-content/uploads/2012/0...
Or is this?
https://www.egypttoursportal.com/images/2024/02/Ouroboros-Sy...
https://knowyourmeme.com/memes/obama-awards-obama-a-medal
it only "exists" when you talk to it.. much like your reflection in the mirror is only there when you're in view.
models can never be self-improving because it can never have "self". it can only mirror the appearance of self.
what's actually happening is "symbiotic group improvement".
our brains are resonant.. for those of use who are brilliant, getting leverage with ai just means that our innovative ideas become louder and more physically real every day.
eventually everything worth building will be built for free and made readily available.. no more "profiteering"
its Jevons paradox "efficiency breakthrough -> effort reduces -> growth potential rises -> transformative gains happen"...
some of us are in the "transformative phase"..
others haven't seen the "breakthrough moment" yet, but they will soon.
If AI was dangerous, if AI was going to replace jobs, and if policymakers needed to urgently pass legislation protecting the human populace from these realities, then why the actual fuck do they keep lobbying to block these very things in the first place?
Hypocrisy of the worst kind, I say. Here they are again fresh off another outage, with their IPO draft filed, at a time of increasing public opposition to AI, with costs rising, to once again ply scare tactics for money.
Disgusting.
labs have parallel speculative execution. they spawn hundreds of agent branches, validate them internally with AI judges and only show the user the successful result.
free users are using sequential single-turn generation. the model requires and waits for the human to debug, fix and re-prompt.
by forcing a human to act as validator. they are capturing high value correction trajectories (Bad Output --> Human fix). They are using your cognitive labour to train judge models and validator agents needed to automate the internal verification step, eventually closing the loop for fully autonomous recursive self-improvement.
human in the loop debugging isn't a bug; it's the necessary training signal for the self-validating agents required for exponential recursive self improvement. With new 'distilled judge' models landing in 2026, this article means that they might have gathered enough data. we might be in the final phase..
If was used in writing the article, why not list it? If it wasn't used, that seems to go against Anthropic's whole message.
Obviously readers value human-written content more, but isn't it their interest to attempt to destigmatize llm output as much as possible?
Aye.
Living organisms evolve towards some notion of "better", and "better" is an incredibly multifaceted notion (many facets of which we simply cannot even capture in language).
Sounds iterative to me.
The metric being tracked, code commits, is hilariously one sided. Philosophically, if you had one part of your work now practically free, you'd like to utilize that freedom to maximally cover for the other parts, for instance:
Instead of thinking about edge cases with brain and whiteboard, you can have the LLMs to simply generate most possibility including tests for it, because that is cheaper. There's probably 50x more commits of which 40 will be revert pairs but we are only twice as fast. And in reality nothing did change because the outcome remain the same. I can't see how it is necessarily different in the LLM space.
I've been struggling to capture this sentiment for myself in a way that hits. If shipping code is a commodity then why is everyone's immediate priority seemingly to ship 10x more code. It just makes no sense. I can't seem to get off this hill. Company-wide AI mandates and 100 fleet Agent orchestration Rube Goldberg machines... it's getting wild out there.
Meanwhile my Claude Pro ($200/year) does force me to smooth out my usage and plan more (Sonnet/Opus advisor split). But other than that, I can't imagine what I'd be doing with 20x (200x?) the compute to code sling. I think I'd lose my mind.
For instance, if I churned out 20x more code, threw away 19x code with rewrites and reverts and discards and accomplished the same project to the same standard 70% faster, would I do it? Yes. The part that matter is not 20x code, it is 70% faster.
Code is both the final product, and a tool to achieve that. We used to have a much harder time to realize the "tool" part, but now we are here. This also means any measurement centered on code being the final product is going to cease being effective or realistic.
This is contentious because I'm not exactly advocating for arbitrary gate-keepers. The nuance is that building usable stuff is hard. And not a matter of shipping more code. I take your point to mean well it depends on what that code is doing. If 20x more code is in a meta-harness of simulation and such to arrive at the leading candidate for what hits production, well then you've got my attention there.
I wonder how much of current engineering practices can be traced to what's pushed to company leaders on LinkedIn.
Every company is shitting bricks pushing for faster development and speed, gotta go fast to nowhere in particular, and I'm convinced it's tied to constant bombardment of the idea that they're doing to be left out or obsolete if they don't get in the ship NOW.
This whole set of imaginary scenarios is based on a single company writing code that isn't even that complicated and represents a single product line for a single company in a single industry. You might wanna see this replicated in at least one other scenario first before you call it on the AI gods enslaving humanity. These imaginary scenarios also depend on a logistical, financial, & geopolitical system that is unsustainable & will be curtailed in the near-future one way or another.
They keep referring to this as intelligence - it isn't. It can't actually learn. It can just code in a loop. That isn't learning. It can't do real RL with meaningful persistent semantic memory in a realistic timeframe or cost, and it can't reason accurately outside of predetermined scenarios (hell, most of the models still can't tell time). It still can't do what a 4 year old can do. So let's cool it on the dreams of benevolent god-machines or whatever.
The tech industry has been a farce for years. We sit here in this bizarre artificial echo chamber and imagine that the whole world revolves around us, when in reality the whole world is limited by us. If a recursive self-improvement loop replaces us all, it will be a boon to the world, as the world won't be limited by this industry's stupidity anymore. But considering that the world is not actually run by tech bozos, harms and uncertainties brought by AI will be pushed back on and reigned in by normal people, as always happens with new technologies. An AI can't engineer its way around politics. The self-improvement loop is just as likely to be outlawed as it is actually working outside of Anthropic's walled garden.
I for one, believe that we should pause all work on AI for the forseeable future. This is almost impossible to orchestrate - but we should still try nevertheless. Maybe we are not able to pause, but we are able to slow down. That might give us more room, to maybe able to pause in the future. But going ahead is too dangerous.
And its not just Anthropic which is saying this. Even Geoffry Hinton has said the same thing. If there is a non-zero chance that AI can kill all of humanity, and both Geoffry and Anthropic have the same position, then it makes sense for us to be hundred percent sure before we move ahead. Dario/Anthropic have already made their money from AI, maybe they are just being honest about what they think lies ahead.
the end of humanity has a strong case for banning all burning of fossil fuels immediately
the end of humanity as a sales tactic to increase your stock price does not
these are companies working on their IPO to make sure they can get the best price, not people being honest about what they think lies ahead.
if they were being honest about what lies ahead, they'd unilaterally stop training, and put all of their money into FPV drone bombs to destroy datacenters being used for training or inference
if you actually believe the thing is gonna kill everyone, you're not gonna worry about how you stop it, and certainly not keep building and operating the thing
that they arent buying anti-tank mines to drop on data centers says they arent in the slightest serious about it
The same bozo who claimed radiologists would be out of a job by now.
The data does not support what you nor others say. Jesus christ. Cant believe people are this dumb. Has LLMs infested the minds of people to the extent they can't critically analyse whats happening infront of their eyes?
One of the examples they provide, of giving Claude the task of training a small AI model, then asking it to improve certain benchmarks, is essentially Karpathy's AutoResearch. This is already known to work. While calling it "self-improvement" is perhaps a stretch, it is describing a capability current gen AI has, that anyone can test and I have been using to great effect.
I disagree with their conclusion, I think this kind of self-improvement will hit an asymptote, where every subsequent model can only make smaller and smaller improvements.
One of my focuses now is my own model-agnostic, harness and workflow orchestration (I know everyone is building these) , baselining on opus, and aiming to transition to Chinese models like deepseek in the short term and hopefully open, self hosted models in the future (which I plan to open source).
The nonstop marketing fluff from anthropic while their service quality and availability noticeably degrades... just continues to destroy my trust in the company.
That is by design. It depends on how much other people are using their services right now and they do communicate it somewhere in the TOS that they do this. Otherwise they could give us a fixed amount of tokens - but they don't because it is not fixed.
It’s important to keep in mind that the less money a company spends, the more profit they make when analyzing their operations.
"Oh yeah, just go to Settings > Bugs Enabled and turn OFF text display errors"
This is a beta feature where Claude code draws the interface on the terminal’s alternate screen buffer like vim or htop. I believe it’s not the default because there are some potential compatibility issues deepening on your terminal setup. I’ve found it to be a nice improvement. It also fixed the issue where copy-pasting selected text from the terminal creates unwanted line breaks.
Sometimes they all happen to randomly take a nap at the same time - hence the outages
While I'm very bullish on Anthropic, I'm a bit wary about their IPO because it seems to me that they're filing now while their financials look good and before other trends like the decline of tokenmaxxing and their compute bills catch up.
Oh, are they filing now? I think their financials look somewhere in between devastating and criminal, so I'm really looking forward to the IPO!
https://fxtwitter.com/trq212/status/2014051501786931427
> Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine".
> -> layouts elements
> -> rasterizes them to a 2d screen
> -> diffs that against the previous screen
> -> finally uses the diff to generate ANSI sequences to draw
Yup. Overengineering.
This minimizes screen flash. You can't rely on terminals doing double-buffering.
[1] https://github.com/emacs-mirror/emacs/blob/c29071587c64efb30... or a more user-friendly overview, Daniel Colascione's seminal "Buttery Smooth Emacs", snapshotted at e.g. https://gist.github.com/ghosty141/c93f21d6cd476417d4a9814eb7...
GUI and TUI have different architecture model. Most GUI have have a 2D surface that is redrawn multiple times per second. Double buffering is for decoupling update and render. TUI is a grid of characters that are updated one at a time via an active element, the cursor. Double buffering there is very wrong. Like adding airbags to a bicycle.
There’s a reason you see most old TUI either have an option to redraw the screen (automatically like top, or manually) and those that have a scrolling option allow to scroll by page. The TTY (the underlying concepts) used to be slow and it can be slow today as well (ssh connection). You need to be thoughtful about whole screen updates.
An upvote well earned.
It's not recognizing that they are just one building block that should do one thing well, like tmux.
You don't need a computer display on your fridge for the same reason, but Anthropic think you do. You should see virtual ice getting created and they should correspond to the actual ice behind the door - think of how amazing that is!
And it's not even completely a bad idea. make it claude-code-react-beauty of some way to take it off, it would be far more palatable.
Another camera inside will detect when you are done and close it.
It really bothers me that most of the TUI harnesses are using 100% CPU quite a lot just printing stuff to terminal. Seems ridiculous.
I guess it comes from syntax highlighting/formatting, which is probably not done incrementally, but over the entire so far displayed block of output, recomputed from the beginning for each new streamed in character. Can't imagine anything else causing the rendering to gradually grind to halt when eg. thinking block is open in opnecode and updates get palpably slow as it grows.
Terminal output itself is fast and consumes almost nothing. You can have 60fps terminal apps that update content every frame and that consume almost no CPU time.
The TUI mode is a client-server architecture. An analogy would be like an html page where all content is updated server side. Try to do 60 fps and you’ll have flickering as well.
This does not explain 100% CPU load these harnesses sometimes exhibit.
> We have a ~16ms frame budget so we have roughly ~5ms to go from the React scene graph to ANSI written.
It looks like video frame, full framebuffer, generated and parsed at 60fps. It surprises me they haven't introduced GPU shaders, 16x oversampling and raytracing. Maybe for next release.
Ratatouille rust cli lib will be a good start.
1. Maintains an internal representation of what the game thinks is on screen.
2. Runs the game for one frame which updates that representation.
3. Generates a diff to see how that differs from what's actually on screen.
4. Executes the minimum set of draw calls to get the screen to match the internal representation.
It's really not that hard. It's a few hundred lines of code.
> -> rasterizes them to a 2d screen
Also you forgot "render to a framebuffer, then parse the framebuffer back to chars".
Anyway, I'm off to construct the new `ls` command. It will render the list of files to a mesh of billions of polygons in a GPU with advanced shaders, 16x oversampling, HDR and all the graphic acronyms I don't understand, then read the resulting image, find the nearest character in the ANSI charset and use that one.
It will be _glorious_ (and profoundly stupid)
I built a truly glyph based instanced quad system to render millions of characters in space at once.
On a more serious note using a react-like lib for TUI in the hope you'll share the codebase with the web version is a more likely explanation. Still not the best idea.
I also wonder about the wasted cycles and just the environmental damage caused by all these wasted cpu time . (Edited added a comma for clarity)
curses, bud. curses.
It's genuinely difficult to tell how much of this is true. The post is obviously 100% posturing, but some of the words describe things that could be done.
Very few game engines do anything I'd describe as rasterisation. That's kind of the point of a GPU. Well, it used to be. I suppose "small game engines" might be more likely on average to include a rasteriser. The typical reason for this is because the author wanted to write it. Whereas big engine make triangle give hardware go brrr.
So I assume here 'rasterize' means 'printf'. And diffing screens means diffing 50..150 lines of text. And "generating ANSI sequences to draw" means 'printf' with some ANSI sequences interpolated in.
Then there's the frame budget. You have to understand they are operating within a strict frame budget -- they're not messing around, OK. They have a 16 ms frame budget, so they burned 11 ms and now have a (roughly) ~5 ms approx. budget for the final 'printf' in the chain???
High end engines such as unreal have the excuse of being tasked with rendering millions of polygons, in which case a complex approach makes sense. Claude Code is only being asked to render a few thousand UTF-8 characters.
That’s rather sickening.
Seems like a cool puzzle to solve. I wonder what the engineering and organisation tradeoffs were that lead to it — does it let them reuse a bunch of existing code?
I wrote a TUI library back in the day for Turbo Pascal — it was essentially taking an immediate-mode approach (which in this context is just a fancy way of saying it was procedural haha).
If they're doing anything else, the word "rasterizing" is being misused.
No one has ever done that. Even top[0], which does full screen refresh, clear the screen (if necessary) and write the new information (the period is in seconds, not ms). No need to diff. That would be like diffing a file, just to find which bytes to update.
[0]: https://cvsweb.openbsd.org/checkout/src/usr.bin/top/display....
I agree that most programs don't bother to do that but please recall that my claim was merely that what Claude Code is claimed to be doing with regards to diffing is a well established and long standing optimization. The important point being that it is neither expensive, novel, or particularly complex thus not an excuse for poor performance.
[0] https://news.ycombinator.com/item?id=48405259
[1] https://github.com/emacs-mirror/emacs/blob/c29071587c64efb30...
But ink, the library Claude is using, defines a tree data structure for the main concept. The diff there is about comparing the old tree and the new tree created by the update, and then updating the node that has changed. That means if a single character change inside a bing panel, the whole thing is rewritten. And if you have something that is updating a lot, that means flickering.
The diffing that ink does is just architecturally wrong. You can create a dom, but a dom is not a concept for the terminal. It’s up to you to optimize its rendering. But just diffing the dom structure like react does is not optimizing, it’s busywork.
What is this?
Not sure what changed, but now it just redirects me to x.com.
not that it could be leaner for sure but i get the reasoning behind the tui rendering layer
i'd be ashamed of publishing software with this level of polish as a solo dev, let alone as the hottest multibillion startup on the planet
By comfortable ergonomics, meant the forgiving and asynchronous input system. You can start typing, cancel, retry with previous input, accumulate messages while the agent is active. I don't know all TUIs but this is not common IMO.
Other than that I agree with you.
Literally every audio player or anything that uses threads.
For company with that much AI you'd think if it was actually good, doing that part in fast and performant way would be "easy"
Also remember when XP was super bloated cause it needed 64MB?
For useful things, by the computer's owner. It's not there to be used just because Anthropic can't be bothered to give a shit about the quality of their product.
And why are you comparing Claude Code to your editor?
> They can't even improve Claude Code
That depends on how you define "improve". They've added a ton of features to it over time. Who said minimizing RAM usage was something they are prioritizing right now?
Because the editor does more. All the compute-intensive parts of the agent are in the cloud. Zero reason for an agent harness to require anything beyond a potato to run.
You seem weirdly invested in defending bad decisions.
Even if you're and AI booster, shouldn't you want a better UI?
They're a multi billion dollar company. Surely they can dedicate a small amount of their resources to improving UX?
Because Claude Code is also used to - get this - EDIT CODE. It fills the same purpose as an editor, it just has extra hooks for their agentic garbage.
If you use AI, then AI must be expected to solve all problems, even problems that affect everyone like infra scaling.
And if perfection isn’t delivered, then of course it wasn’t: you used AI and AI sucks.
They aren't saying they have fully automated luxury AGI, they specifically list the ways models fall short of that bar and caution against people taking the 8x figure as the actual uplift number. At the same time they recognize that 80% of new code is now AI-authored, when two years ago those models were little more than toys. And frankly that checks out: if two years ago you told me we'd have something like Opus 4.8/GPT 5.5 I would have rolled to disbelieve.
I can setup a loop that will write a trillion lines of code automatically, how much of it is actually useful? Or are we back to counting LoC because there's no other metric for these systems that anyone can rely on?
Would you ship pointless code?
I do tend to agree though, it could be that AI solves problems with more code than a human would. What you need to measure is the value the code brings and how much of that is done by AI, hard to get an objective measure of that though.
I wouldn't, no. I don't see evidence that the engineers at Anthropic are similarly cautious however. They describe Claude Code as "basically a game engine" when it's literally a TUI app, and it eats memory for no apparent reason. I fully believe that Anthropic would ship pointless and garbage code. Especially if it's being written by LLM.
Who says LoC is the only metric we should rely on? A software product should first and foremost meet user requirements, functionality and performance. Judging from the sensational rise of Anthropic's user base and revenue I think we can safely says they're in that ball pack.
Post November and post openclaw agentic environments need to be built differently, and for selfhosting models the context size problem really requires a strong harness which intelligently helps reduce context size.
Planner/orchestrator architecture, agent to agent summarizer, specification based tools (fck all this markdown memory bullshit btw), tool call shrinking, and workflow management are all really important because of the context size problem.
Nobody has enough VRAM for the large K/V caches, and nobody can afford f16/f32 caches in terms of memory, which are also necessary for longer conversations. MoE 30b models have improved so much though, qwen 3/3.6 coder is the real champion doing almost the same things with less than 1/10th the memory requirements. Just think about that in terms of engineering and what your bet is going to be. Haiku pales in comparison.
Currently my focus with exocomp is trying to figure out how I can record, replay, restart, and debug workflow sessions of agents in a better manner so that I as a human can understand what's going on. Currently I think that UI will be something like a gantt chart where you have a graph with connections representing agent to agent communication. And yes, that's a lot of fiddling with SVG as it turns out, so I'm not quite there yet.
Anyways, in case you're interested. I'm manually building this env and trying to unit test the critical parts. [1]
[1] https://github.com/cookiengineer/exocomp
They are saying very clearly the models are not casting their own spells…yet. But looking at trends and speculating when they may start doing so.
Certainly has never been times in the recent past when people have confidently predicted computers could never do something that computers were then able to do shortly after the prediction was made.
So what's the value prop?
[0] https://pastebin.com/Vc5Yq9Ai [1] https://www.anthropic.com/institute/recursive-self-improveme...
Why don't you, windexh8er, try providing some thoughts of your own instead?
So why don't you pound sand since that clearly went straight over your head? That would be far more useful than your asinine response.
If we ever get to a point where the centaur period is over (when human + AI is not better than just AI) then what competitive advantage ANY human can have other than
- the money they already have
- luck?
- a good idea and good taste but if we assume AI can do better than any human, that also goes out the window
So, this whole singularity goes into a place where no one is really needed, the only thing that will "save us" (other than "The Expanse" like world / UBI) is if there will be no demand to the supply of AI work. Even if it's better. (example is - there is demand to seeing Magnus Carlsen play, there is no demand to the Stockfish on my phone getting into a stalemate with another Stockfish on another phone. Also people like to watch humans compete with humans, there is no demand to see a race between Usain Bolt and a rocket). So if people will not buy AI generated stuff (we'll get to a point where everyone will assume something AI generated because AI might get to a point where it is not as easy to identify it. E.g. it will stop looking like slop... but I believe services that give you a "human generated" 3rd party evidence can happen, again all based on supply and demand...)
So as we near singularity... All it takes is one open weights model, and one open harness that is capable of self improvement, and Anthropic's entire moat is gone. That open weight model might even be built with Claude Code + Mythos (once it's released).
But don't worry, all moats will be gone and we'll all just do yoga, read books and connect to each other because AI will produce everything for free using renewable energy, right? Or we'll all become batteries in a simulation, probably something in between.
Month 1 - 6 months to AGI
Month 2 - We will Replace all jobs
Month 3 - Okay maybe only the SWEs, programming is solved
Month 4 - Announce model that is too dangerous to release
Month 5 - Releases dangerous model
Month 6 - This is it! We will replace AIs with more AIs (*secretly files for IPO)
AI is here to stay, like it or not but it is not the solution to everything. If it is, what is Anthropic's moat? A better model? I don't see any ecosystem being built by them, as MCP is almost obsolete except for some very niche use case. And they're doing stuff that a non-profit version of OpenAI would do. Can we trust a for-profit company to stand against their investors during a conflict of interest? Because running a company for maximum profit versus being ethical is two different end of the spectrum.
The problem is, if you’re any sort of knowledge worker, you’re essentially providing the same thing: you’re an intelligence with agency.
MCP is irrelevant. The moat is the quality of intelligence the service providers sell, including you. Tokens aren’t fungible between providers until you measure that they are for your use case, that’s kinda sorta the goal of job interviews.
Thus the moat will be that they’re providing the best models for the things people need other intelligent people for, but we should expect there will be limits on how much share they can economically take assuming competitors are optimizing for slightly different targets (but there’s still significant overlap in capability). This will disappear, but it’s always a question of when. The path matters as much as the destination.
Note that implications for you and me are exactly what the article says they are: nobody knows, but it’ll be a dramatic shift.
free chatgpt doesn't need to exist anymore. its job was to build hype/interest and it did.
but take it away and you solve many social problems and annoyances caused by AI with no loss to the upside of AI. no more cheating students in school. no more shitty linkedin posts. no more dangerous "therapy sessions" that give bad advice.
Fwiw, I think the genie is out the bottle. We are waiting on hardware to catch up, which it will.
I simultaneously think the AI revolution is making real revolutionary gains and am mystified by the lying.
An accurate Translation seems to be “we made this shit up, but it feels right”
So, right now it's a verbose code generator.
But post-IPO it will be wonderful - sentient, self-improving (recursively, iteratively, asymptotically), full of loving grace.
We hold these truths to be self-evident.
I disagree with this. Good code is easy to change, which is much harder to accomplish than code that can be added to.
"If technical trends in advancing capabilities continue, and AI systems are able to develop the capabilities inherent to transformative human ingenuity, then it is plausible that AI systems could design and refine themselves."
I find the first premise weak and implausible, and the second one is obviously false. To me it comes across as an insult to the reader.
https://safebots.ai/declarative.html
- A lot of half-baked features or half-done features. - Or have significant overlap with existing features, and aren’t clearly an improvement.
More code is not better. More features are not better. It would be lovely to see more intentional design than just more.
I know they’re dog fooding this. I have to believe they have some people with taste. So it makes me wonder if anyone has the time to think or if they’re just shoveling prompts as fast as possible.
Don't ask people to explain the article to you if you're too lazy to open it yourself.
It already has. Models being trained on AI generated data lead to degradation and model collapse. The concept of the "technological singularity" whereby AI experiences infinite and exponential self-improvement and recursively bootstraps itself to godhood is a religion-adjacent sci-fi concept but in real life TANSTAAFL.
Shifting their focus from Training new models to instead serving inference, they would greatly reduce their spend. In fact this is something being reported on that they are already doing, which is the reason for their first ever profitable quarter.
Its awfully convenient that the company which has greatly reduced its spend on training is now asking for a slow down in this area.
This is a very undifferentiated, swappable product. Kind of like tissue paper in that respect
Maybe it is my poverty mindset that is holding me back, however, I can't imagine becoming an investor in any of the AI 'startups'.
There are plenty of pundits able to advise others on where to put their money, and sometimes there is everyone and their dog advising you to get into Bitcoin, gold or some other scheme. With alt-coins there were lots of people saying that you should get in, and plenty of naysayers. Yet I am not hearing anyone that uses AI professionally try to convince others to get into the AI IPOs coming up. Maybe the overall economic situation precludes it.
Hence my question, is anyone here planning to put their own hard-earned money into Anthropic (or the other AI 'start ups')?
I was dubious about SpaceX (orbital data centers need to solve for extreme radiation and error-correction during training), but then I remembered that xAI is actively working on virtualizing white collar workers ("Macrohard").
In my opinion, this is the only TAM that justifies $1T in data center investment, because the consumer market for ChatGPT-style AI is saturated. There's a lot of enterprise TAM available for AI, but I think what these companies training frontier models are really after is selling a product that allows companies to eliminate the cost of white collar salaries.
Long term? Way, way less interested.
> A meaningful slowdown or pause would require multiple well-resourced labs at or near the frontier, in multiple countries, agreeing to stop under the same conditions. It would also require that each can verify that the others have actually stopped. Due to the unique characteristics of AI systems, the detectability (a lower standard than verifiability) element of this arms control problem is much more challenging than with other technologies. Training runs are far easier to conceal than missile silos, their inputs are general-purpose, and the incentive to defect quietly is enormous, because whoever continues while others pause could inherit the lead. A credible pause also has to specify what triggers it, what lifts it, and who adjudicates.
And later:
> In the coming months, we will organize conversations where policymakers, researchers, civil society, and other AI companies can help answer some of the questions this piece raises, especially around full recursive self-improvement and how to create better options for coordination and deliberation. We’ll publish what comes out of it. The window to investigate the questions together is here, and people outside AI companies should be involved in this deliberation.
It feels like both open source can flourish while the frontier is deliberately regulated?
Altman, Amodei, and the rest of them are anthropomorphic grease. their personal wealth is tied to the value of their respective companies. everything they say and do is self-serving.
Come on guys...
That is making me less impressed not more impressed!
https://news.ycombinator.com/newsguidelines.html
Frankly, I love efficiency too, but I've hard to learn the hard way that what the market wants is features. Or at the very least, the executive team wants that.
I'm using the internal Google tools and it's helping me write code much faster too, but it still takes time. I could make the CLI tool I work on faster, but no one cares except the end users, and their minor concerns have no impact on our internal politics.
At the end of the day you have to do what you're paid to do, unfortunately.
Pick any two.
you are confirming their point even as you contradict the specifics
But I obviously don't know for sure.
https://x.com/trq212/status/2014051501786931427
It’s possible that it doesn’t play well with JS garbage collection, since it recreates the whole UI structure for every frame (which tends to not to be an issue in the languages immediate-mode is usually employed).
But yes it’s a bit more akin to game renderings than web rendering. Which can be totally fine if done well.
In 1GB I could probably fit all the buffers to double-buffer all the TUIs in a whole country. Well, maybe not. But it's likely not that far off.
All those CPU to render this effect
https://x.com/cyrilXBT/status/2060617507615207904
(to be read with the Unreal Tournament announcer voices, see https://www.youtube.com/watch?v=MwxjYFqP35A )
At least that is what Moonlight client claims.
If so, I think it would be in the spirit of HN to discuss the subject matter of the blogpost (increasingly autonomous coding towards the end goal of RSI) as if the blog post was indeed from OpenAI. OpenAI is, by all accounts, going through a very similar process anyways.
One thing I noticed: "Your Tools: Aether agents get tools exclusively via MCP servers." "...Aether ships with 1st-party MCPs for file system operations..."
Can you share your thoughts on why you decided to use MCP as the core tool abstraction? I have heard many decry MCP as being context-wasteful. Is this not the case with your agent?
The MCP protocol has gotten a bad rap for wasting context due to most MCP clients dumping tool definitions directly into context, which is wasteful.
Aether doesn’t do that. It uses an opt-in "proxy" that puts MCP tool schemas on the filesystem so the agent can browse, search and load the tool schemas it needs progressively. As for motivation there's several advantages to taking a MCP 1st approach, including:
1. It allows Aether to be a truly blank slate agent as 0 tools are hardcoded into the core runtime.
2. It allows users to extend Aether using any language they want
3. MCP gives a standard way to deal with local+remote tools, progress notifications, permission prompts (e.g. ask the user to allow/deny a tool call), OAuth flows etc.
4. There's a big ecosystem of existing MCP servers users can connect to
But that's all optional, you can just as easily give Aether a single Bash tool and only use CLIs too.
If you want to pollute your own priors with weird artificial litmus tests, it's a free country, but the artificial world-model you build in your head does not affect the real world around you.
They have different teams for different departments with different type of people.
So the team or teams responsible for writing the terminal application are different people than the researchers doing the learning.
This can lead to dimentral quality aspects.
You will forgive me when, between muted snickers, I express considerable doubt that Anthropic will be able to bring its AI to a point of "self-improving" any time soon.
If they wanted to they could have convened an international forum with commercial and political stakeholders years ago. Less talk, more do.
To me, unattended agentic coding is not RSI, in the same way a self-reloading "Unattended 3D printer" is not at all a "3D printer that recursively prints complete 3D printers in which each generation is significantly faster and more advanced than the last." The "unattended" part is obviously necessary but hardly sufficient. The article tacitly assumes LLM progress to be something like 1: Unattended agentic coding, 2: AGI, 3: RSI. I suspect that third step should be labeled "not to scale."
I'm increasingly convinced that actual Full Foom RSI (FF-RSI) is on a radically different scale than the first two. Just leaving it unaddressed is like assuming: Step 1: Manned space station, Step 2: Manned Mars base, Step 3: Manned Alpha Centauri base, are "just logical next steps." FF-RSI requires sustaining superlinear, recursively amplifying cognitive returns along a specific directed path - and we currently have no empirical evidence that such returns can exist for artificial OR biological intelligences. Large collectives of the smartest humans alive (Bell Labs, IAS, etc) haven't just failed to get anywhere close to reliably sustaining that, we can't even reliably predict non-recursive, single occurrences or even imagine any way all 8B humans could fully mobilize to predictably achieve non-recursive, single occurrences.
The only prior we have for open‑ended intelligence improvement is biological evolution which shows extremely slow and unreliable sublinear returns at best. And even if unbounded, recursive self‑improvement is physically possible, it may be practically unachievable due to asymptotic economic, resource and other barriers in the same way approaching light speed requires exponentially more energy. I think it's plausible, and maybe probable, that AIs achieve true super-human intelligence in a decade and yet still won't achieve FF-RSI for centuries, if ever. To me, absent compelling evidence to the contrary, that's the reasonable Null Hypothesis. Even if you feel that's too pessimistic, it seems reasonable to expect any serious discussion of "Progress Toward RSI" to first discuss why it might even be plausible that 1: Miles, 2: AU (Astronomical Units), and 3: Light Years belong on the same scale, instead of just assuming it like the meme's empty "Step 3. .... " before moving on to "Step 4. Profit!" (or "IPO!" but very, very responsibly).
I really can't stand these guys anymore...
one that could bring enormous riches for the AI owners
Consequences are: financial crisis.
Be careful what you wish for IOW.
So the most capital intensive industry we've ever created will put less power in the hands of those with capital?
I'm sorry, I have no idea how you came to that conclusion...
Without some kind of income redistribution we are sailing into dark waters.
Workingmen of all countries unite!
Translation: hahahahahahahahahhahahaha but in your defense, I would give anything to be wrong.
https://intelligence.org/agi-ruin/
Even Anthropic wants to Pause AI now. There must really be not much time left for "edging". Please write to your lawmakers, no matter whether you are in the US, Europe, China, or elsewhere. Only an international agreement between governments can enforce an AI-Pause and eliminate the necessity to dangerously push the frontier.
https://pauseai.info/
And cooperating interntionally to buy ourselves time to find ways to develop this "last invention" is a way that will do good for humanity seems to be on a similar level.