Recently I was working on a quick Android demo, and decided to try agentic codegen to kickstart the development.
I gave it huge task with minimal instructions, such as: Add 2 blank screens for master-detail flow, both with ViewModel. Use compose navigation. Prepare for network request
.
This omits huge amount of detail:
- where to define those screens
- what should be on the screens
- how to provide the viewmodel
- is injection used, which?
- should UI data persist process death
- how to link the screens with viewmodel
- which observable class should viewmodel expose
- is it MVVM or MVI or something else
- where is navigation defined, is it a separate graph
- are navigation arguments supported
- which networking libraries
- does it need domain and data modules
- how is code structured
- will someone get angry if you don’t finish on time
- should test classes be prepared
- naming and other conventions?
- how should exceptions be handled
- how robust should the solution be
- etc.
Obviously I received a huge changeset. Did not enjoy reviewing it. And it got me thinking about something else.
Peter Naur: Programming as Theory Building
The essay from 1985 basically argues that even though you “get” your program, this knowledge can not be written down. There is a theory that only you possess as you build the system. You know the whole domain, the real world, the limitations, the why for all the design decisions. You know how to modify the codebase quickly, without breaking it.
A new team member does not have this theory, they can not do that. And It’s painful to acquire that theory, because per Naur you can’t just describe it in a document. You can only nurture it when working together with the original developers.
Does AI know the “Theory”?
Coming back to my task of reviewing AI code. Do we have the same understanding of the “theory”? I don’t think so. It’s evident when AI has taken some design decisions I would not approve. Other decisions, i can approve, but still seem strange to me.
Overall this results in a changeset that is mildly alien, to fully verify it you need to focus on every symbol. If you work in a proper team, this is not the case: all members converge to the teams “theory”. All members would end up in similar decisions. And this makes reviews very fast because there rarely are surprises.
Domain Theory?
The diff for this kickstart is still relatively okay, because it doesn’t do anything domain or team specific. How to set up an android app should be represented well enough in the data available to the model.
However I assume that the more domain specific the task gets, the less useful the public knowledge is. AI needs to follow the teams decisions and context. Furthermore, per Peter Naur, there is something about the teams mental model that cannot be written down, making it even harder(?) to teach to AI.
More Questions
Will AI be able to grasp the “theory” to fit the teams expectations?1 Will the teams converge to whatever the current AI model prefers instead? If AI can’t get the “theory”, can it be an independent replacement? Can there be codebases without the “theory”?
Anyway in this case I ditched the diff and wrote the thing myself.