I have been skeptical of code generators. Of course, I used to be skeptical of code formatters, but I ended up embracing them wholeheartedly. Typically, we are not a first-adopter team - we let others go through the process of finding pain points and reap the benefits later on. Now that AI coding agents are much more common and gaining popularity, I decided it was time to try one out.
For my tooling, I've been using Augment Code as a context-aware chat solution in VS Code, so I decided to try out Augment Agent.
I found that Augment Agent had significant productivity benefits by generating functional code structures quickly, but was limited in pattern recognition, code organization, and testing, highlighting that they serve better as accelerators for human developers rather than replacements.
Building a Weighted Calculation Comparison
The platform is a calculator, and the user's profile are the inputs. Depending on the profile, the calculator produces results for a number of items, which are combined in a single result. The items are weighted, so the output can be adjusted by changing the weights. My client wanted a way to see a comparison of different weights.
My initial prompt is below:
I need to add a new feature. We'll start with a grid report of users with their profile fields. Selecting a user will bring up a form to select two Parameter records by their label. The first Parameter record should default to the current is_primary record. Submitting the form will calculate non-live CalculationResult records, then go to a view to display the results side-by-side.
The agent spent a while "thinking" after the prompt to look at different pieces of code context. This is in contrast to Augment Code's Chat mode, which seems to provide more instantaneous results even when applying changes to multiple places in the code. That said, this was a bigger ask to have it generate more pieces by itself.
The immediate results were quite promising. With one prompt, I had the structure of a grid view for user profiles, a form for selecting calculation parameters, views to use these with needed permissions applied, and the Jinja templates to tie up the UX. It even added a navigation link for me.
Addressing Limitations
The results, though decent, didn’t work right away. I decided to dive into each piece in turn and evaluate what was happening.
For all of the criticisms, bear in mind that the agent essentially developed a 70% solution, which is a huge productivity boost.
- For the grid, I added some prompting to make the agent adjust the existing columns, add new columns, and correct some query conditions. This type of specific update was handled acceptably.
Commonly-used mixins in the project were not applied to the form and views.
I would have thought that in all of the context-gathering that happened at the beginning, the agent would intuit the usage of those mixins and base classes.I didn't take the time to prompt again to see if it would adjust itself in this case. At a certain point, it's just easier to make the changes myself.
No tests present.
That's not too noteworthy in itself, as this was not something I requested in my prompt. But as a developer, it's informative to see the agent’s assumptions for what we want generated.I wonder if, as agents get smarter, they will begin to build a developer profile for who they're working with and learn that testing needs to be a first-class citizen in each framework (even if we don't explicitly ask for it each time).
Code duplication in the Jinja templates
Pieces of the template were the same markup, just shown for different score results. I want to see that wrapped up in a macro to deduplicate, but the agent is quite happy to generate the same code over and over again instead of factoring the common pieces.When working with the agent and prompting it to adjust the templates, it could easily handle that duplication. The agent doesn't care if I'm asking it to change one instance or a hundred.
This type of shortcoming highlights how code would easily become needlessly complex if the agent were left to its own devices. In this instance, I manually adjusted it again. But I want to try a prompt at some point to see if it will correct its duplication.
The bottom line is that the results were actually better than I anticipated for a tool of this kind. The output is certainly not production-ready (or even working in some cases).
In some ways, the agent is like having a junior-level developer who can code some things but doesn't care to test whether the code works. Then, the rest of the process is something like code review.
AI-Assisted Development
The productivity boost is undeniable—even with its flaws, the agent delivered a substantial portion of the feature in record time. What would have taken much longer was condensed into minutes of prompting and refinement.
That said, these tools aren't replacing developers anytime soon. They're more like eager apprentices who need supervision and guidance. The agent lacks the judgment to make architectural decisions, fails to follow established patterns without explicit instruction, and seems to have no concept of maintainability or technical debt.
For now, I'll use AI agents as accelerators rather than replacements—letting them handle the 70% that they can while I focus on the nuanced 30% that makes code truly production-ready. As these tools evolve, I expect they'll become increasingly indispensable, even to skeptics like me.