AI-ing an App
Writing code is effectively free now.
I know it’s been written a thousand times, but I’m just having too much fun not to jump on the bandwagon.
In the last month, I wrote an entire app. Well, technically AI did, and it also wrote the continuous integration, the development environment, the workflow, the documentation, and the release pipeline. And it’s on the Play Store.1
What’s even more magical is waking up with an open PR for a bug your agent fixed overnight, clicking “squash and merge”, and then seeing it on an alpha build on your own watch the next day. All this without writing a single line of code! All in, it took just 2 weeks from day one to an entire app on my own device.
Another word for magical is disruptive. Not the startup-pitch kind — the kind that actually reshuffles who wins and who loses. This is no more “faster coding” than the internal combustion engine powers a “faster horse”. It’s a step change in the fundamental economics of software development, and some of what it changes is already clear.
Writing feature code got free. The other parts didn’t.
While a good professional development could be $200k/year all-in, an agent subscription is ~$1200/year. This means the cost of a feature trends toward merely the cost of describing it.
(This really only applies to simple SaaS — complex infrastructure and cutting-edge technology still require human intervention. A perfect example is Anthropic’s own failure to rebuild GCC with Claude despite using it as a reference implementation.)
The initial version of my app was 80 minutes of AI chugging through dependency chains, build errors, environment setup, and so on, based on the simple request to “build me an Android Wear app that shows NOAA METAR temperature data on a watch face complication”. It downloaded, installed, and configured the JDK, Gradle, Android SDK, NWS API definition and more, totally independently.
Could I have done this? Eventually, yes. But I’ve installed the JDK many times, and forgotten how to do so equally many times. Thankfully for idiots like me, execution is no longer a moat, whether for you as an employee against other candidates or for your business against others.
It also extends well beyond code. Security audits, privacy reviews, release notes, documentation — the “non-building” work that used to be a PM’s unglamorous half automates about as well as features do. But if you’re trusting it to do all this itself, you’re going to end up like this idiot.
Specifying, testing, and configuration didn’t get easier
If you want a security audit, you need to ask for one. The same for privacy audits, legal audits, open-source license disclosures, performance optimization, logging, flagging, accessibility, localization, and dozens of other things I would call part of “launch readiness”. These are often entirely neglected in PRDs, and left to the privacy, security, legal, and other experts to fill in. AI can do a privacy audit, but it won’t by itself, and it’s not clear it won’t later violate the privacy policy it added to the repo. The fact that it’s so willing to delete failing tests does not give me the warm-fuzzies about that!
Describing what we want to build has always been the work of the entire village. It is a specialization entirely separate from engineering, and it likely will be forever. Everyone who has worked in a large company knows there are 10 different ways to interpret a single 10-word product requirement.
So although AI is making this work easier to complete, I’m not convinced yet it’s making it easier to specify.
Writing the spec right the first time
AI seems tuned for the kind of “wow” moment that effortless coding provides. AI’s answers are full of assumptions and shortcuts. So now your spec has to be right on the first try.
The best PMs used to work in a tight loop with engineering, design, and other specialists — refining decisions, getting pushback, changing their minds. Agentic development has none of that: no multiple opinions, no debate, no consensus, no one pushing back in the hallway. If the spec is ambiguous, the agent picks a lane; you find out on the PR. It has little common sense, it doesn’t understand your user, and it will do some strange things if you let it.
Something like “add the density altitude” is fine for a human engineer, but it’s not great for AI. You need to say instead something like “compute density altitude exclusively from provided METAR data and algorithms documented by FAA and NOAA, display it in feet (rounded up to the next thousand), and don’t use a ranged value complication”. Think denser, more precise, and more careful.
Among more obvious differences between those prompts, note the new prompt says “compute” not “add”, lest the model decide to sum density altitude and some other value and build a feature around that meaningless quantity. AI is, after all, only artificially intelligent, and that’s strictly different from truly intelligent.
This makes PM’ing harder — not easier — and makes the development process more product-constrained than it’s ever been. New APM programs like Clay Bavor’s APX still treat engineering and product as separate rotations, but they’re the same job now.
Write your tests yourself
Since implementation can be delegated but not specification, you can’t trust AI to write your tests. AI is a terrible tester.
It doesn’t want to write tests. It doesn’t want to add test seams. It doesn’t want to use testing-based design methodologies (it once did TDD by writing a test that wouldn’t build, and then adding the method header and declaring it “passed”). It considers changing the test just a minor speed bump in feature development, not an indicator it broke something. When a test fails, it will confidently suggest deleting the test. When pressed, it will suggest deleting the test and the testing library it was implemented with.
I find this behavior endearing in the same way Spotify is endearing when it tells me I’m offline with five bars of 5G.
So you are the PM. The tester. The dogfooder and acceptance criteria. PMs need to stop doing and start deciding, and since every yahoo with a good idea can now ship in a month it’s even more important to be right.
Think one level of abstraction higher
The single biggest workflow shift: your job is no longer to do the thing; it’s to get the agent team to do the thing.
In one case, the agent finished on my containerized development process and said “Done! try running the container again and tell me if you get any errors.” I admit the first three times, I did, in fact, run the container. I only realized my idiocy when a friend pointed it out. The wrong response is to mindlessly do as you’re told; the right one is “do it yourself”. You can always, always think one level higher.
This is a familiar pattern to any manager, but not so much to an IC. But now that every IC has subagents, we’re all managers! You should approach every issue like a manager: who does this, using what tools, interfacing with what other people? You’re building and maintaining an agent team — the tools, the prompts, the tests, the guardrails, the templates, the processes, the approvals. That’s the work; feature code is just the output.
Living in that mindset for a month makes one thing obvious: the tools matter more than any individual feature.
Every tool needs a model interface
Any tool you rely on needs a first-class way for a model to drive it — MCP server, CLI, scriptable API, whatever.
Something text-in, text-out (read: cheap for models). The GUI-automation-via-screenshots path (model sees pixels, model clicks pixels) is a long-tail compatibility shim, not a future. It only barely works in demos. The workflow of “model sees pixels, model clicks pixels” is the equivalent of searching for “google” from your URL bar, clicking the top search result (google.com), and typing your real query into the search box. Do I desperately want the GUI driver? Yes, but like I said — only for compatibility.
AI is currently subsidized by VC dollars, and that won’t last. Practical implication for anyone building or buying tools right now: if it doesn’t have a documented, repeatable text-based surface, it’s on borrowed time.
Human roles compress, too
Job specialization between engineer and PM was natural when engineering was expensive and complex. Google hires technical PMs because they can read a design doc and predict subtle user impacts that engineers can’t. But now that building is cheap, “what do we build” and “how do we build it” are best compressed into one role.
It’s a different job: running the workshop, writing the specifications, picking the architecture tradeoffs, owning the tests. You could call it a “product engineer” or perhaps “technical product manager”. The point is it’s one job, not two.
Luckily, this isn’t terribly new — the best engineers already had product sense and vice versa. So AI won’t overturn existing leadership. But it will trim off the hyper-specialists (one trick ponies?) on the edges.
Code is effectively free, and so the new bottleneck is everything other than engineering. Job specialization is dying: the winners will be the product-focused engineers and technical PMs who can define what to build, own the quality, and run the workshop — rather than those who can only do one side. Maybe that’s why it’s so fun — engineering is the only black box that remains in my role.
It’s been magical to build something real just as a side hobby, something I haven’t had energy to do since college. Who’d have thought that just the thing coding needed was to be erased from existence entirely?
-
An Android Wear app that puts METAR data on your watch face — it looks as good as native. ↩