It’s clear LLMs are one of the biggest changes in technology ever. The rate of progress is astounding: recently due to a configuration mistake I accidentally used Claude Sonnet 3.5 (released ~2 years ago) instead of Opus 4.6 for a task and looked at the output and thought “what is this garbage”?
But daily now: Opus 4.6 is able to generate reasonable PoC level Rust code for complex tasks for me. It’s not perfect – it’s a combination of exhausting and exhilarating to find the 10% absolutely bonkers/broken code that still makes it past subagents.
So yes I use LLMs every day, but I will be clear: if I could push a button to “un-invent” them I absolutely would because I think the long term issues in larger society (not being able to trust any media, and many of the things from Dario’s recent blog etc.) will outweigh the benefits.
But since we can’t un-invent them: here’s my opinion on how they should be used. As a baseline, I agree with a lot from this doc from Oxide about LLMs. What I want to talk about is especially around some of the norms/tools that I see as important for LLM use, following principles similar to those.
On framing: there’s “core” software vs “bespoke”. An entirely new capability of course is for e.g. a nontechnical restaurant owner to use an LLM to generate (“vibe code”) a website (excepting hopefully online orderings and payments!). I’m not overly concerned about this.
Whereas “core” software is what organizations/businesses provide/maintain for others. I work for a company (Red Hat) that produces a lot of this. I am sure no one would want to run for real an operating system, cluster filesystem, web browser, monitoring system etc. that was primarily “vibe coded”.
And while I respect people and groups that are trying to entirely ban LLM use, I don’t think that’s viable for at least my space.
Hence the subject of this blog is my perspective on how LLMs should be used for “core” software: not vibe coding, but using LLMs responsibly and intelligently – and always under human control and review.
Agents should amplify and be controlled by humansI think most of the industry would agree we can’t give responsibility to LLMs. That means they must be overseen by humans. If they’re overseen by a human, then I think they should be amplifying what that human thinks/does as a baseline – intersected with the constraints of the task of course.
On “amplification”: Everyone using a LLM to generate content should inject their own system prompt (e.g. AGENTS.md) or equivalent. Here’s mine – notice I turn off all the emoji etc. and try hard to tune down bulleted lists because that’s not my style. This is a truly baseline thing to do.
Now most LLM generated content targeted for core software is still going to need review, but just ensuring that the baseline matches what the human does helps ensure alignment.
Pull request reviewsLet’s focus on a very classic problem: pull request reviews. Many projects have wired up a flow such that when a PR comes in, it gets reviewed by a model automatically. Many projects and tools pitch this. We use one on some of my projects.
But I want to get away from this because in my experience these reviews are a combination of:
In practice, we just want the first of course.
How I think it should work:
I wrote this agent skill to try to make this work well, and if you search you can see it in action in a few places, though I haven’t truly tried to scale this up.
I think the above matches the vision of LLMs amplifying humans.
Code GenerationThere’s no doubt that LLMs can be amazing code generators, and I use them every day for that. But for any “core” software I work on, I absolutely review all of the output – not just superficially, and changes to core algorithms very closely.
At least in my experience the reality is still there’s that percentage of the time when the agent decided to reimplement base64 encoding for no reason, or disable the tests claiming “the environment didn’t support it” etc.
And to me it’s still a baseline for “core” software to require another human review to merge (per above!) with their own customized LLM assisting them (ideally a different model, etc).
FOSS vs closedOf course, my position here is biased a bit by working on FOSS – I still very much believe in that, and working in a FOSS context can be quite different than working in a “closed environment” where a company/organization may reasonably want to (and be able to) apply uniform rules across a codebase.
While for sure LLMs allow organizations to create their own Linux kernel filesystems or bespoke Kubernetes forks or virtual machine runtime or whatever – it’s not clear to me that it is a good idea for most to do so. I think shared (FOSS) infrastructure that is productized by various companies, provided as a service and maintained by human experts in that problem domain still makes sense. And how we develop that matters a lot.
Read more of this story at Slashdot.
Read more of this story at Slashdot.
Read more of this story at Slashdot.
Read more of this story at Slashdot.
Read more of this story at Slashdot.
Read more of this story at Slashdot.