June 18, 2026

Agentic engineering: speed is cheap, correctness is not

OpinionTechnicalAgents

🧨

Speed without quality is just higher-frequency failure. The correct move is not to ship faster but to make correctness cheaper by moving constraints into automation, reducing handoffs, and using autonomy only inside well-defined bounds.

AI makes code cheap. The work that stays expensive is correctness under real constraints: tests, review load, security policy, rollout risk, incident recovery.

Code is going the way of assembly. It does not disappear. It becomes an implementation detail for most work, pulled into focus only when performance, reliability, or platform edges demand it.

The literature is converging on a boring conclusion. Tool use turns models into control loops, and control loops fail in familiar ways. They get stuck, they retry, they stop early, they “pass” for the wrong reason, they do the right local thing and break the system. Output scoring sees none of this. Tool use and API integration closes the loop between generation and validation and feedback loops add retry and stopping conditions. Code-generation agent work is explicitly about workflow autonomy across the SDLC and evaluation that includes execution trajectories and tool usage accuracy. The louder perspective pieces are directionally correct and operationally underspecified. They also quantify the gap: isolated-task performance vs continuous-evolution performance stays wide without better context, memory, and verification.

This is the shift-left that matters. Product intent, user impact, and operational constraints move earlier because generation is cheap. The loop only stays stable if constraints are explicit, executable, and checked continuously.

When code moves fast, engineers bottleneck on everything around code. Requirements remain ambiguous. Reviews queue. Security becomes the shadow queue. Incidents interrupt flow. Faster coding does not fix a slow value stream. Autonomy across the value stream does.

A modern AI-enabled engineering team looks like operators of an execution system.

Intent ownership shifts up-stack. Product-minded engineers define slices with explicit stop conditions, not just tickets. They own rollout and rollback, not just implementation.
Review turns into verification, not authorship. Humans do not need to type every line. Humans decide whether behaviour is correct under system constraints.
Constraints move into automation. Tests, lint, policy checks, secret scanning, dependency rules, and release gates block unsafe changes without debate.
Tool permissions become architecture. Agents need least-privilege tool access and hard boundaries. Repo write access is not a default.
Evaluation becomes trajectory-based. We care about loops: plan, edit, run, fail, retry, converge. If it cannot be replayed, it cannot be trusted.
Observability shifts to change provenance. We need to answer: what changed, who authorised it, which guardrails ran, and what passed.

Superpositional is being built for this mode of work. Not to make agents “smarter”, but to make their execution bounded, inspectable, and cheap to verify.

References

The End of Software Engineering: How AI Agents Are Fundamentally Restructuring the Software Paradigm — https://arxiv.org/html/2606.05608v1
AI Agentic Programming: A Survey of Techniques, Challenges, and Opportunities — https://arxiv.org/html/2508.11126v1
A Survey on Code Generation with LLM-based Agents — https://arxiv.org/html/2508.00083v1
Evals grade prose. Score the trajectory. — Score an agent’s tool-use trajectory, not the last message