/ WORK

AI-ing an App

Writing code is effectively free now. Designing it isn’t.

This is no more “faster coding” than the internal combustion engine powered a “faster horse”. It’s a step change in the fundamental economics of software development.

That makes things different. I suppose I’m not the first to have Big Thoughts about this, and I doubt mine are any more interesting than others. Then again, this blog is for me, not you, so get off my lawn.1

I’m a “vibe” coder now

This past month I wrote a Wear OS app and didn’t touch a single line of code myself. But is it still vibe-coding if you did all of this?

  • Testability (unit, instrumented, manual, app upgrades, mocks, mutation insertion, UI automation)
  • Security (binary scanning, locking down broadcast receivers)
  • Privacy (identifiers, identifiability, compliance, unique concerns for on-device, disclosures, policy)
  • Analytics (engagement, retention, usage patterns)
  • System Health (battery usage, crashes, timeouts, data staleness, server-side flags)
  • Accessibility (labels, hardware controls, controls, text scaling, tap target sizes, contrast)
  • Version skew (Wear OS vs. Android OS)
  • Resource limits and budgets for APIs
  • Compliance (open source and other licensing, permissions dialogs, nutrition labels)
  • Marketing (naming, iconography, screenshots, Play Store listing)
  • Feedback (daily alpha pushes, in-field manual bug reporting, Play Store reviews, dogfood surveys)
  • Release stabilization (triage, issues, Projects, PRs)

These are all things I do in my day-to-day job, and there’s a particular fun in wearing every hat on a single project.

I think the real joy was learning about the modern tools that have advanced since I was in college. I learned about skills, MCPs, subagents, GitHub Projects, GitHub PRs, how to keep your secrets out of your codebase, containers, markdown, Firebase, BigQuery, Data Studio, Play Store Console, Noun Project, Claude Code, Gemini CLI, Antigravity, Compose, Robolectric, Roborazzi, gcloud, gh, bq, Android Virtual Devices, MockK, and so on! It’s interesting to see the lineage and the improvements in the public GCP tooling.

In a truly futuristic working style, I spun up subagents to …

  • Check user feedback (3 sources), metrics (a dozen tables), and postsubmit tests and file bugs. (CSR)
  • Triage bugs into priorities and releases. (TPM)
  • Describe desired behavior for each bug. (PM)
  • Implement each fix. (SWE)

Not to mention skills to do all of those plus even scan my KPI tables for regressions and insights (after I stupidly clicked around in Data Studio a bunch building dashboards – oops!).

So cutting-edge!

I even developed a skill that can drive Wear OS’s UI reliably to set up whatever watch faces and complications I want so I can come back an hour later to check 50 different watch faces and look for surprises and issues. Did you know there are (at least!) three ways to show a short-text (1 graphic and 2 lines of text) just in the AVD? I didn’t until I scanned two dozen watch faces!

What fun!

Code is free – what isn’t?

Restraint isn’t free

Doing the right thing with your agentic swarm still isn’t free. The AI is quite able to run privacy scans and maintain invariants defined in my privacy policy – but it’s really not great at doing that by itself. If you trust it to handle launch-readiness on its own, you’ll end up like this cautionary tale. And goodness, did it take me a while to convince it that deleting a test was NOT an appropriate way to fix a test. You can still drive the product into the ground with pointless features in a garbage information architecture. The agents (so far) haven’t quite learned how to run an entire real business.

Doing the right thing is, however, a lot cheaper. I can simply ask my agent to analyze my KPIs for me. I can ask it to run new ad-hoc analyses. I can ask it to follow up on its suspicions. I can give it a crisp definition of my suspicions and see where it leads. I can have it do all the aggregation, summarization, sentiment analysis, statistical analysis.

So effectively restraint isn’t free. The restraint to spend 90% of your time doing real product development (the list above) rather than just churning out toy features.

Asking for more isn’t free

I always push my team for better (not more when possible – but better, yes). Perhaps their role is too easy, perhaps I’m not paying enough attention, perhaps I’m not thinking hard enough about it, or perhaps I should be giving them more rewards.

I have to do the same thing with AI, which often treats ME like a subagent. In one case, after wrapping up my containerized dev setup, it cheerfully announced “Done! Try running the container again and tell me if you get any errors.” I admit the first three times, I did, in fact, run the container. I only realized my idiocy when a friend pointed it out. Other times it’s “could you reload the app and tell me what you see?” Well, friend-o, how about YOU reload the app and tell me what YOU see? What is all the multimodal stuff for if you can’t see that’s not centered?2 I’m not your copy-paste monkey.

Knowing what to ask for isn’t free

If asking for less is still hard, and asking for more is still hard, then aren’t we just saying that asking for the right thing is still hard?

I heard it explained recently as this: doing math on paper is cognitive engagement, with a calculator is cognitive offload, and with AI is cognitive surrender. This is where slop comes from – not that the AI wrote terrible code, but that its human overseer didn’t review and iterate on the code. It’s the same reason I won’t give my reports the right answer until they give me some answer. The act of engaging with the problem and taking a stand personally guarantees you’ll never surrender, and you’ll always learn.

Last time I suggested someone give agentic coding a try, they typed something so vague into the prompt I stopped them before they even tapped Enter. Models need to be pointed in the right direction, and even expanding to a paragraph instead of a sentence goes a very long way. There’s no replacement for setting the playing field, describing the expected behavior unambiguously, and pointing the agent at the right files. Not unlike you might point someone on your team at the right process or collaborator!

This will never end. Until we develop Asimov’s “Machines” – or, really, the network of flawless, comprehensive, and up-to-date information fed into them – there will always be an inherent gamble. That ambiguity is where the human will always sit, because the vibes you feel living in and walking around a community thinking about what business to create are still better than a model’s balance of random seeds and averages.

And as long as we’re still building foundational models, data will still be proprietary, so there’s no chance whatsoever of developing The Machines anyway. The Machines won’t exist until they’re no longer valuable, which is to say, they’ll never exist. Hooray for humanity?

What else is on your (my) mind?

Here are some things that still caught me by surprise:

  • Your tool needs an MCP and/or a CLI or it may as well not exist. Letting it drive an integration (or your app) via the “model sees pixels, model clicks pixels” workaround is the equivalent of typing “google” into your URL bar, clicking the top search result, and then typing your real query into the search box.
  • These models really do NOT want to test. When pressed about a failing test, they’ll suggest deleting it – and, just to be thorough, the testing library it was implemented with. I like that about as much as Spotify saying I’m offline when I have five bars, and it happens just about as often!
  • Once I said to “add” the value instead of “compute the value” and it decided to aggregate it by summing it as well as adding the feature. So, I guess I could’ve seen that coming, but it’s hilarious it managed to understand the esoteric math I was asking for but not that it should never, ever be aggregated using sum.
  • It’s excessively tuned for “wow” demos. I guess that makes some sense – demos sell! But it takes real effort and precise direction to build what I’d call “software” rather than “toys”. Next time you read one of the horror stories on /r/vibecoding think of that! But telling it to “make an android watch app that shows METARs on a complication using NOAA and NWS data” and 80 minutes later it exists is, indeed, pretty wild ngl.
  • We’re back to the browser and search engine wars, since every agent uses the same concepts of skills, MCPs, and CLIs and they’re virtually interchangeable. Startups are advised to code a day a month on the competitor just to make sure they have a backup for emergencies. When you can jump ship in a couple hours, it creates outstanding competition benefitting customers.
  • It’s good at categorization (is it A or B?) but pretty terrible at deciding if something is “easy or hard” or some other thing that exists on a continuous spectrum. It really needs its own algorithm of categorization it can run – I suppose this is what /plan is for!
  • They’re truly terrible at DRY practices. The damn thing is perfectly happy to use the same string literal three times when in reality, they should only ever change together. I suppose if code is free, you could argue that’s OK, but context definitely isn’t free, and managing duplication by centralizing functions does as much good to manage your model’s context as your human brain’s.
  • It is really easy to just NOT review the work. It works so often! And by the time it stops working, it’s very hard to unravel. You still have to be an engineer and do things slowly with all the planning and consideration and review. It’s still 3x faster, it’s just not 10x faster. But it’s best to do it before you get really stuck. This is why I built my app with a robust test suite (2 hours for an end-to-end run on 31k LoC – yeah).

Who’d have thought that just the thing coding needed was to be erased from existence entirely?

  1. I don’t work on AI coding tools at Google and I have no idea what they’re up to. I wish them best of luck and I believe in them – after all, Google invented Transformers. 

  2. I empathize with all the visual designers out there who know there are several different versions of “centered”, many of which a layperson knows isn’t “centered” but can’t explain why. Nothing is easy when you’re doing it professionally. 

isaac

Isaac Reynolds

I'm a Googler, product manager, pilot, photographer, videographer. I've been the lead product owner for Pixel Camera Software since its inception. I hold a BS in Computer Engineering from the University of Washington in Seattle, near my hometown. I live near Denver after escaping Mountain View during COVID-19.

Read More