Correct by Default — Haplab Blog

At the end of The Backlog Loop, I mentioned that software is becoming a commodity and that the bottleneck isn't writing code anymore. That felt like something worth sitting with, so here's more of what I mean.

When I type "plan a new feature" to my OpenClaw agent, I don't explain what planning means. The agent runs the plan CLI, which decomposes the feature into tasks with priorities and dependencies, surfaces open questions, and creates tickets ready to work. When I say "start the loop," it creates an epic, kicks off the iteration script, and monitoring starts automatically. The correctness comes from what's already in place, not from the prompt itself.

That's what I mean by correct by default.

Diagram showing the correct by default cycle: conversation flows through the foundation (skills, tools, tests, review chain) to produce correct software, which enriches the foundation — Conversations produce correct software when the foundation knows what things mean. The output feeds back into the foundation.

The scaffolding

The foundation is a few things working together. Skills files encode the design conventions, architecture patterns, and workflow expectations for the project. td handles task management. The plan tool handles decomposition. The loop script handles execution. Test Squad does QA by using features the way a user would, and reporting both human UX issues and agent experience problems. The codebase is organized so agents can navigate it the same way a new developer would: predictable directory structure, consistent naming, architecture notes that explain why things are set up the way they are.

When that foundation exists, agents don't invent an architecture. They work within yours.

The Backlog Loop describes Designer, a Figma-style design tool built almost entirely by the loop over a weekend: about 50,000 lines of Svelte and TypeScript across 78 components. The output uses Linear's design patterns and Svelte 5 runes consistently throughout because those conventions were encoded upfront in skills files that every session reads. The conversation was about what to build. The implementation followed from what was already in place.

Here's what the actual flow looked like for one specific feature. I wanted to add real-time monitoring to the Designer canvas so users could see when an agent was working on their document. I described this to my OpenClaw agent on Discord. She created a ticket, I bumped the priority, and the next loop iteration picked it up. The agent ran the plan tool, which produced a short spec and flagged three open questions about animation timing and how to handle concurrent agent operations. I answered those in the Discord thread. The plan became tasks. The loop worked them. By the time I came back to my desk, the feature was done, the tests were passing, and there were screenshots of it working.

No part of that required me to explain what a plan is, what the loop does, or what "done" means. Those things are encoded in the scaffolding.

What makes it compound

One piece of the system that I underestimated at first: each loop iteration reviews the previous task before picking up the next one, and the reviewing session starts completely clean, with no memory of what was built or why. It reads the code cold and flags anything that looks off. An agent reviewing its own work tends to check that things match its own intentions. A separate session catches different categories of mistakes. Before a task closes, there's a proof requirement too: passing tests and screenshots of the feature working. Correctness gets verified before it accumulates.

The foundation gets richer with each project. When the loop built Designer's collaboration system, it already knew about the component patterns, the API conventions, and the design token system, because those had been built by earlier iterations and folded back into the project context. New features start from that accumulated foundation. Each project benefits from the last.

This is different from vibe coding in a specific way. The gap shows up when you start building on top of what you made, when a new feature needs to fit correctly with everything else, or when a bug fix can't introduce three others. With the right foundation in place, agents extend the system correctly. Without it, they extend it plausibly.

Where this is going

The loop built Designer, a design tool that would have been a team-sized project not long ago, over a weekend. The constraint wasn't writing code. It was deciding what to build and writing tickets that described it clearly enough for the loop to work them.

The Perch planner showing the Test Squad plan with areas for concept, agent design, execution engine, and reporting — The Test Squad plan in Perch's planner, created through a conversation on Discord.

As the scaffolding gets better, that constraint gets easier too. More of the "how" gets encoded. Less has to be specified each time. The gap between describing something and having working software shrinks.

The fact that one developer can build a Figma-style design tool over a weekend, mostly by writing good tickets, seems like it's pointing somewhere interesting.