DIY AI Product Security: Buy vs Build - A Task Is Not a System

Bryan Taylor

June 13, 2026

A Task Is Not a System - an agent writes one PR; a program runs the whole loop, on every repo

The best question I get on a demo call now is a challenge, and a fair one. A security leader shares their screen, points an AI coding agent at one of their repos, and a minute later there's a clean remediation PR sitting in GitHub. It builds, it reads well, it closes a real finding. Then they look back at me and ask the obvious thing: "We already do this ourselves. Our scanners flag the issue, we prompt our agent, it opens the PR, we merge it. Why would we need you?"

It's the smartest objection in the category right now, and the demo they just ran is completely real. So I leave the demo alone. The word worth questioning is "this."

Here's what that demo actually proves. The moment you reach for an AI agent to triage findings and write fixes, you've already settled the biggest question in AppSec: AI agents are how product security gets done now. We agree completely. Nullify runs on the same frontier models you're prompting. The model was never the disagreement.

So the question was never "AI or not." It's buy vs. build. Do you want to build and operate an AI security system yourself, or run one that's been engineered for exactly that, every day, for three years?

MIT's 2025 State of AI in Business study found that 95% of enterprise GenAI pilots deliver no measurable impact to the bottom line, and that tools bought from external vendors succeed roughly twice as often as the ones built internally (MIT / Fortune, 2025). Wiring it up yourself is, statistically, the path most likely to stall.

A Copilot Is Not a Workforce

What you built is a fantastic task-doer. Point it at a repo, hand it a finding, get a PR. That task works. But a task is not a system, and a copilot is not a workforce.

The distance between "the agent can write a remediation PR" and "an AI security organization that runs unattended across a thousand repos, deciding what's exploitable, validating the fix, ticketing it, chasing it to merge, all at a forecastable cost" is the entire product. It's the difference between a power tool and a crew that shows up every morning.

Let me walk the exact pipeline a prospect described, because the gaps don't live in the steps. They live in the spaces between them.

Walking the DIY Pipeline

The workflow: scanners surface findings, you prompt the agent, it opens a PR, a human reviews and merges. Here's what it misses, step by step.

Before the PR, what to fix. Your scanners hand over raw findings, and the agent writes a PR for whatever it's pointed at: the unreachable finding, the build-time-only one, the CVE that's critical on paper but unexploitable in your cloud. Nothing in the pipeline asks "does this actually matter here?" So you pay tokens to generate the fix, then your engineers' afternoons to review the noise. In a recent enterprise proof-of-value (POV), a library with four critical and ten high CVEs triaged out as negligible because it never ran in production. A DIY pipeline writes four PRs for it anyway.

The PR itself, is the fix real? Does it build? Pass tests? Quietly break an API three services depend on? The validation that should happen before a PR is surfaced instead lands on your developer, after the fact, on every single one.

Across findings, what first? The same vulnerable dependency in forty repos is forty prompts and forty PRs, with no sense of which actually sit on an internet-facing service. No correlation, no dedup, no prioritization by risk. The pipeline fixes things in whatever order you prompt them.

After the PR, the last mile. The agent opens the PR and stops. The ticket, the Slack nudge, the follow-up next week, the chase to merge: all hand-work. That last mile is most of the job of running AppSec, and the pipeline doesn't touch it.

The ceiling, what never even becomes a finding. You can only remediate what your scanners detect. App-contextual secrets, missing-authorization logic flaws, language-toolchain vulns: if your scanners miss them, the agent never gets them to fix, and they sit in production silently. The agent's ceiling is bolted to the floor of whatever you already had.

It's Never Just the Agent

The clean demo also leaves something out: almost nobody runs the agent raw. It's wired into the tools you already own, the SAST scanner, dependency and software-composition scanning, cloud-security and runtime tooling, secrets scanning, maybe an ASPM aggregator on top. The instinct is right: an agent that sees a code finding, a reachability graph, and cloud exposure at once writes a better fix than one staring at a single scanner.

But every tool you stitch in is an integration you own, with its own API, data model, and severity scale, drifting on the vendor's schedule. The same vulnerability arrives three times with three IDs and three severities, and until you build the correlation layer the agent guesses which signal to believe. Feed it a scanner where five of every six findings are noise, the measured rate in a recent ten-repo engagement, and it writes confident fixes for noise.

And the bill was never the token bill: scanner licenses, compute, and the engineers who keep the integrations alive, with tokens as the volatile line on top. Uber reportedly burned its entire 2026 AI-coding budget in four months (Forbes, 2026), and one startup's annual AI bill jumped from $400K to $1.4M overnight on a vendor pricing threshold (Pylon, via LinkedIn). That's buy vs. build multiplied by every tool in your stack, and a bought system collapses it: Nullify runs detection (SAST, SCA, secrets, pentesting), reachability, cloud context, and the asset graph on one data model, one source of truth instead of six consoles disagreeing.

Where Do-It-Yourself Genuinely Works

Let me be straight, because credibility is the point. For a single repo, a skilled engineer, and an afternoon, a DIY agent writes an excellent fix. That demo is genuinely impressive, and that's exactly why it's seductive. For a one-off or a small surface, the build path is perfectly reasonable.

The agent can write the fix; we use the same models it does. The harder undertaking is operating the whole system, across the estate, every day, reliably, at a cost you can put in a spreadsheet. The demo is the easy 20%. The system is the 80% that doesn't fit on a screen.

The Part You Can't See Is the Part You're Buying

That 80% has a name. It's the harness: the routing, validation, self-healing, dedup, prioritization, and connective tissue around the model, kept alive as the models and the attacks change every month. It's also where most of the cost and most of the failure lives. I'll cover that in Part 2.

For now, one request: if you're already running agents against your findings, make buy vs. build a deliberate decision, because right now the demo is making it for you. The POV is free and runs alongside what you've built. If a harness your team maintains still beats one we've spent three years on, across your whole estate at a forecastable cost, no harm done, and you'll have learned something useful.

Next, in Part 2, The Harness Is the Product: the routing, validation, and self-healing around the model is the hard, volatile 80%, and it's the part that quietly turns your security team into a security-tooling company.

Written by: Bryan Taylor

Sources