Back to Blog

1,380 Lines of Production Code Per Hour

Matt KottMarch 28, 20264 min read
engineeringai-agentsbuild-velocityspec-driven-developmentagent-computemodern-aiproduction-codeengineering

How AI Agent Orchestration Produces 1,380 Lines of Production Code Per Hour

That number comes from a real build session, not a demo or proof of concept. A production deployment: database migrations, API endpoints, test suites, and security hardening. 62 tests passing. Deployed from a single Markdown specification.

We track this internally because it tells us something most teams have not figured out yet: the constraint on build speed is no longer engineering capacity. It is specification quality.

---

What the AI Build Session Actually Produced

A customer operations pipeline. Database schemas with migrations. Three scripts handling intake, quality assurance, and delivery workflows. A test suite covering edge cases, schema mismatches, and end-to-end flows. Security hardening across the stack. Eight templates. A 387-line standard operating procedure.

None of this was scaffolding or boilerplate. Every line runs in production. Every test validates real behavior.

Most teams would estimate this as a multi-week sprint. The spec-to-deployment window was a single build session.

---

How Spec-Driven AI Agent Orchestration Works

The methodology is straightforward, even if the execution requires careful architecture.

Start with a spec, not a ticket. Every build session begins with a Markdown document that defines what needs to exist. Not user stories or Jira descriptions. A precise specification that names the inputs, the outputs, the dependencies, and the acceptance criteria for every component.

The difference matters. A ticket says "build user intake flow." A spec says: this script accepts these fields, validates against this schema, writes to this table, returns this response shape, and fails in these specific ways. One produces conversations. The other produces code.

Decompose into parallel units. Once the spec is written, the system breaks it into independent units of work. Independent means: this unit can be built without waiting for another unit to finish. Database migrations do not depend on API endpoint logic. Test fixtures do not depend on template formatting.

The units that can run in parallel go to specialized agents simultaneously. While one agent writes migrations, another builds API endpoints, and a third generates test fixtures.

Handle dependencies automatically. Some components need others to exist first. The orchestration layer sequences those and agents working on downstream components read the outputs of upstream agents as they land, without sync meetings or handoff documents. Self-correct in real time. When something breaks mid-build, the system does not stop and file a bug. It fixes the issue, runs the test, and updates the specification with what it learned. During the session that produced these numbers, three schema mismatches surfaced. All three were detected, corrected, and tested without human intervention.

---

Agent Compute Time vs. Decision Lag: The Two Metrics That Matter

We track two numbers internally: agent compute time and decision lag.

Agent compute time is straightforward. How long did the machines spend building? This is the number that produces the 1,380 lines per hour figure. We measure it in agent compute hours, not wall-clock time, because parallel agents compress the calendar but the work still happens.

Decision lag is the time between when the system needs a human decision and when that decision arrives. Should this field be a UUID or an integer? Is this the right schema for the use case? Does this SOP match how the team actually works?

The ratio between these two numbers has been consistent in the sessions we have tracked so far. Human decision delay is the primary bottleneck, not AI execution speed or model capability.

A precise spec eliminates most decision lag before the build starts. That is the real insight. The 1,380 number is a consequence of writing better specifications, not a consequence of faster machines.

---

Why Enterprise Teams Should Measure Specification Quality

Story points do not measure anything useful when AI agents are doing the building. Build velocity is now constrained by the quality of your specifications. A precise spec in a Markdown file produces production code in hours. A vague spec produces nothing useful regardless of how many engineers or agents you assign to it.

We built [Modern Discovery](https://modernai.io/discovery), our AI competitive intelligence platform, using this same methodology. It is how we approach every production system.

Enterprise environments add review gates, compliance checks, and approval workflows that extend wall-clock time. The core insight still holds: spec quality determines build quality regardless of scale. The organizations that learn to write precise specifications will build faster than those that write better tickets.

The tooling will keep improving. The models will get faster. But the spec discipline is the part most teams skip, and it is the part that determines whether any of this actually works.

Is your engineering team still estimating in sprints? What is actually constraining your build speed: engineering capacity or specification quality?

If you are thinking about spec-driven build methodology for your own team, reach out at info@modernai.io.

---