I first started using AI coding assistants in early 2021, with an invite code from a friend who worked on the original GitHub Copilot team. Back then the workflow was just single-line tab completion, but you could also guide code generation with comments and it'd try its best to implement what you want.
Fast forward now to 2025, and there's a wide range of coding assistants that are packed with features, the models have gotten substantially more powerful, and the way I develop with them has changed too. They're an essential part of coding for me in various contexts.
However, many developers remain skeptical of the utility of AI coding assistants. This is usually because they tried a vague task with a free AI model in the past and noticed incorrect code, hallucinated API calls, or some other issue. Others have incorporated these tools, as have their teams, but use of these tools hasn't resulted in more working software. In some cases, the tools have lowered overall productivity because teams had to hunt down bugs where the root cause involved a developer who blindly trusted AI-generated code.
I've chosen to write this post specifically for those who are skeptical or have came away unimpressed. I can't promise that if you follow what I say that you'll fall in love with AI coding assistants. But I do believe that if you adopt some of the following tips, you'll come away substantially more impressed than you might be today.
AI-assistants are tools, not magic, and to use them effectively, you need to develop skills to be effective. If there's one single thing to take away from my post, it's that AI is a tool that requires developing skills to wield effectively. If you do not invest in these skills, you will be ineffective using AI for coding.
Before you stress about getting skilled at using AI coding tools, first you need to actually get a good tool.
Right now, the best AI model for coding is Claude. There are coding benchmarks that Claude scores well for, but you'll find in the AI community that everyone claims that these benchmarks don't come close to doing it justice. Somehow, in some way, Claude has this je ne sais quoi where it doesn't veer off course much as you use it, the code it writes for your codebase seems to fit the style of your code, and it just doesn't make up API calls that don't exist.
You can suffice with ChatGPT or Gemini or other models, but you'll need to be on your guard more because these tend to "drift" from the style and goals you set in the code that they suggest over time.
Do not take this recommendation lightly. Claude is the only model that you can reliably not have to second-guess every time you use it. I have found that it can still make mistakes, but more often than not, these mistakes have come becuase I didn't provide enough (or accurate) context, such as forgetting to mention that I updated some other part of the codebase. And as I'll discuss later on, you can eliminate most of these problems as well with particular prompting techniques.
If you formed your opinion about AI coding assistants from using free ChatGPT a year ago, you need to radically update your priors. The gap between free AI tools and premium models like Claude has widened substantially, and the capabilities have advanced dramatically.
Maybe at some point, another model will de-throne Claude for coding. Claude is also not necessarily the best for non-coding workflows. But for now, it's the one to use.
Since Claude is not just the chatbot, but a model, it can be used to power any number of coding assistants -- Copilot, Cursor, WindSurf, and more. I personally don't use these tools much and usually just copy/paste from the Claude web interface, but little that I'll describe in this post prohibits the use of other tools.
AI coding assistants vary wildly in their effectiveness based on the kind of code you're writing. Generally speaking, I think about this in three ways.
The more common the kind of code you're trying to write, the more likely an AI model will do a good job. For example, if you're writing a NextJS app with Tailwind CSS, most modern models will not only be up to date on the exact patterns that work best, but they'll also work with older versions too. But if you're writing code in your custom backend with significant domain-specific constraints, you'll probably find that the AI model struggles to generate code that fits your needs.
That said, this doesn't mean you can't be effective with AI and domain-specific constraints. It just means you'll need to be more explicit about what you want, and you'll need to carefully specify what those constrains are up front. If don't know these contraints, or you can't explain them clearly to another person, you probably won't be able to get an AI model to work well with them either.
If the code you're writing for your domain has a lot of similar code available online, chances are an AI model will write good code for you without much work.
Web development, relatively boring backend development, mobile development, library development, and "CNCF-adjacent" cloud development (i.e., written in Go and deployed to kubernetes), are some domains that I've found AI models do well with. This is mostly due to the vast amount of this sort of code available that modern AI models pre-train on.
The easier it is to run and verify code, the more effective AI assistance becomes. This is why tasks with quick feedback loops (like frontend development or unit test writing) tend to work particularly well, but tasks with slower feedback loops (like non-kubernetes infrastructure code) can be more challenging.
One unexpected lesson I learned is that because AI makes it very cheap to generate code, I think differently about if I want to use a library or not.
I view libraries as falling into one of two categories:
Libraries that solve genuinely hard problems for me
Libraries that save me from writing a bunch of code that I don't want to write
For the first category, I will obviously use a library. There are many libraries that are rigorously tested, have a large community, and are well-maintained. And in fact, AI models like Claude will often recommend using good libraries in the first place.
Sometimes, if a model recommend a library for me, I'll interrogate the suggestion to see if it's really needed, or if it requires a lot of code to replicate.
Below is an example from a conversation where Claude recommended using react-router:
<me>
Why do you think I should use `react-router`?
</me>
<assistant>
React Router is worth considering for your web app for several key reasons:
1. Client-side routing - Instead of making full page requests to your server, React Router handles navigation smoothly on the client side, making your app feel more responsive and app-like rather than website-like.
2. URL-based state management - It gives users the ability to bookmark specific views and use the browser's back/forward buttons naturally. This improves user experience and makes your app's state more predictable.
3. Dynamic routing - You can easily handle dynamic route parameters like `/users/:id` or `/products/:category/:id`, which is essential for most modern web apps.
4. Built-in features that solve common problems:
- Nested routes for complex layouts
- Protected routes for authentication
- Route matching priorities
- Loading and error states
- Query parameter handling
For example, if you later decide to add user authentication, protecting certain routes becomes straightforward:
<Route
path="/dashboard"
element={
<ProtectedRoute>
<Dashboard />
</ProtectedRoute>
}
/>
Would you like me to show you some specific examples of how these features could benefit your particular use case? Or would you prefer to explore alternative routing solutions?
</assistant>
<me>
How much code would I have to write if I didn't use `react-router`?
</me>
<assistant>
Let me show you a basic custom routing implementation without React Router so you can compare:
(...generated code)
As you can see, even for a basic router you need to:
1. Create your own Link component that manages history updates
2. Build a Router component that tracks URL changes
3. Manage event listeners for browser navigation
4. Handle route matching manually
And this basic implementation is still missing many features that React Router provides out of the box:
- No nested routes
- No dynamic route parameters
- No route ranking/matching priorities
- No built-in handling for query parameters
- No loading states
- No error boundaries
You'd need to write significantly more code to implement these features yourself. Plus, React Router's code is battle-tested by thousands of developers and handles many edge cases you might not think of initially.
Would you like me to show you what the code would look like to add any of those missing features to our custom implementation?
</assistant>
<me>
lol okay
</me>
<me>
Why do you think I should use `react-router`?
</me>
<assistant>
React Router is worth considering for your web app for several key reasons:
1. Client-side routing - Instead of making full page requests to your server, React Router handles navigation smoothly on the client side, making your app feel more responsive and app-like rather than website-like.
2. URL-based state management - It gives users the ability to bookmark specific views and use the browser's back/forward buttons naturally. This improves user experience and makes your app's state more predictable.
3. Dynamic routing - You can easily handle dynamic route parameters like `/users/:id` or `/products/:category/:id`, which is essential for most modern web apps.
4. Built-in features that solve common problems:
- Nested routes for complex layouts
- Protected routes for authentication
- Route matching priorities
- Loading and error states
- Query parameter handling
For example, if you later decide to add user authentication, protecting certain routes becomes straightforward:
<Route
path="/dashboard"
element={
<ProtectedRoute>
<Dashboard />
</ProtectedRoute>
}
/>
Would you like me to show you some specific examples of how these features could benefit your particular use case? Or would you prefer to explore alternative routing solutions?
</assistant>
<me>
How much code would I have to write if I didn't use `react-router`?
</me>
<assistant>
Let me show you a basic custom routing implementation without React Router so you can compare:
(...generated code)
As you can see, even for a basic router you need to:
1. Create your own Link component that manages history updates
2. Build a Router component that tracks URL changes
3. Manage event listeners for browser navigation
4. Handle route matching manually
And this basic implementation is still missing many features that React Router provides out of the box:
- No nested routes
- No dynamic route parameters
- No route ranking/matching priorities
- No built-in handling for query parameters
- No loading states
- No error boundaries
You'd need to write significantly more code to implement these features yourself. Plus, React Router's code is battle-tested by thousands of developers and handles many edge cases you might not think of initially.
Would you like me to show you what the code would look like to add any of those missing features to our custom implementation?
</assistant>
<me>
lol okay
</me>
In general, it's a good idea to "question" an AI model's suggestions here. The first suggestion you get may not always be better. But importantly, asking an AI model to explain its answer in a critical light often leads to a better suggestion.
All that said, AI has made me completely re-evaluate the second category -- where a library is brought in because it saves some code and adds convenience. Because code is so cheap to generate now, I will usually just generate the code and tests for it, package it up somewhere, and have one less hellish dependency to manage over time.
Dependency management is much harder than re-generating code.
With AI assistance, my development loop is more dynamic and iterative than the before-times, and requires several techniques that I've learned over time to be effective.
RAG is something most people building with AI have learned is an essential pattern, and the quality of an AI integration is a direct function of the relevancy, magnitude, and quality of the contextual data you pass to an AI model.
This is especially true for using AI for real-world coding instead of tiny little side projects.
I usually start by creating an instruction/rules file or "project" in the AI tool I'm using. In Claude or ChatGPT, this is literally called "projects", and these are a way to provide a bunch of files that make the model produce significantly better code. Cursor and Copilot support custom instructions in various ways too.
Note that depending on your tool, it's not just written text files that are supported, but also digrams. I've found that diagrams are particularly useful for showing things like architecture or general flows of a working app to contextualize how the code makes something work. You can also provide custom documents that reference other documents, such as a file that explains the names of several screenhots and how they relate to the codebase. Things like Claude Projects will pick this up.
If you're not using a tool that supports this sort of thing, you can even get by with simply pasting in a single file -- preferably following the llms.txt standard -- and your single chat session will produce shockingly better code.
Just writing some custom rules or instructions or pasting in an llms.txt file will make a huge difference in the quality of the code you get back. However, you can take this a lot further by cross-referencing important context with the structure (and substructure) of your codebase. Athough tools like Cursor and Copilot do support codebase indexing, I've found that this only helps with finding likely-relevant code for a task and increasing the likelihood that the generated code can compile -- it doesn't embed actual knowledge of the codebase. To make sure it doesn't "go off the rails", you need to provide that context yourself.
I typically start by dumping the output of tre to a text file and including that with some basic instructions about code areas.
For larger projects, I spend some time writing up a proper llms.txt markdown file that describes the high-level goals of the system, key areas of the codebase with important knowledge, and then the full file structure. This is a bit of work, but it's a one-time cost that pays off in the long run.
For example, here's one that could be used for a production system, Refinery:
# Refinery
## Refinery overview
Refinery is a tail-based distributed tracing sampler that:
1. Collects and distributes trace data across the cluster it's run on
a. It defaults to using trace locality, meaning that all data in a trace is forwarded to be on the same node
b. If trace locality is turned off, only metadata is forwarded across a cluster
2. Once a trace is considered fully recieved (its root span is collected or a timeout is reached), it processes the trace by looking at the configured sampling rules, and decides to either sample (keep) or drop the trace.
3. If a trace is sampled, it is enqueued for transmission.
a. Sampled trace IDs are also saved to a cache, which allows any late-arriving spans to also be transmitted instead of dropped.
4. Operational metrics, trace data, and logs are created throughout Refinery's operation and transmitted as OpenTelemetry data by default.
a. In particular, Refinery uses a notion of "stress level" to determine if and when it should start backing off on processing and applying rudimentary (but compute-inexpensive) sampling rules. This stress level is often monitored to determine if a Refinery cluster has adequate resources to run.
In particular, it specializes in dynamic sampling, which is a technique that adjusts the sampling rate of traces based on the uniqueness of the data and the desired throughput or sample rates. By combining dynamic sampling with rules-based sampling, Refinery allows a high degree of flexibility and sophistication in how traces are sampled.
## Key refinery operational characteristics
Refinery is also itself a sophisticated distributed system. It gossips its internal state and information about data in its nodes across the rest of the cluster. It also supports distributing trace data across a cluster in different ways using Redis.
Refinery also defines a notion of a "stress level", which is a measure of how much load a Refinery node is under. This stress level is used to determine if and when Refinery should start backing off on processing and applying rudimentary deterministic sampling, which ensures that data is still sampled rather than always dropped, but comes at the cost of not being able to apply more sophisticated sampling rules.
## Key areas of the codebase
Refinery has many files and folders, but these in particular are most helpful to understand the system and know where to make tweaks.
1. collect/ and collect/collect.go, which contains much of the logic for collecting and distributing trace data
2. types/ and types/event.go, which contains the data structures for trace data (confusingly, the term "event" is used for any unit of a trace, not just log events)
3. pubsub/ and pubsub/pubsub.go, which contains the logic for gossipping information about refinery nodes and the data resident within them across the cluster
4. sample/, which contains a variety of samplers. In particular, the dynamic samplers (such as ema_throughput.go) are heavily used in production because they automatically adjust sampling rates based on uniqueness of data and desired throughput or sample rates.
5. config/ and config/config.go, which contains the configuration loading and validation logic for Refinery. Refinery uses a TOML configuration file, but also supports YAML and JSON for configuration.
## Full file structure
├── .circleci
│ └── config.yml
├── .github
│ ├── CODEOWNERS
│ ├── ISSUE_TEMPLATE
│ │ ├── bug_report.md
│ │ ├── feature_request.md
│ │ ├── question-discussion.md
│ │ └── security-vulnerability-report.md
│ ├── PULL_REQUEST_TEMPLATE.md
│ ├── dependabot.yml
│ ├── release.yml
│ └── workflows
│ ├── add-to-project-v2.yml
│ ├── apply-labels.yml
│ ├── release.yml
│ ├── stale.yml
│ └── validate-pr-title.yml
├── .gitignore
├── .tool-versions
├── CHANGELOG.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── Makefile
├── NOTICE
├── OSSMETADATA
├── README.md
├── RELEASE_NOTES.md
├── RELEASING.md
├── SECURITY.md
├── SUPPORT.md
├── app
│ ├── app.go
│ └── app_test.go
├── build-docker.sh
├── build-pkg.sh
├── cmd
│ ├── refinery
│ │ └── main.go
│ └── test_redimem
│ └── main.go
├── collect
│ ├── cache
│ │ ├── cache.go
│ │ ├── cache_test.go
│ │ ├── cuckoo.go
│ │ ├── cuckooSentCache.go
│ │ ├── cuckoo_test.go
│ │ ├── kept_reason_cache_test.go
│ │ ├── kept_reasons_cache.go
│ │ └── traceSentCache.go
│ ├── collect.go
│ ├── collect_test.go
│ ├── mockCollector.go
│ ├── stressRelief.go
│ ├── stress_relief_test.go
│ ├── trace_decision.go
│ ├── trace_decision_test.go
│ ├── trace_redistributer.go
│ └── trace_redistributer_test.go
├── config.md
├── config.toml
├── config.yaml
├── config
│ ├── cmdenv.go
│ ├── cmdenv_test.go
│ ├── config.go
│ ├── configLoadHelpers.go
│ ├── configLoadHelpers_test.go
│ ├── config_test.go
│ ├── config_test_reload_error_test.go
│ ├── file_config.go
│ ├── file_config_test.go
│ ├── level.go
│ ├── memorysize.go
│ ├── memorysize_test.go
│ ├── metadata.go
│ ├── metadata
│ │ ├── configMeta.yaml
│ │ └── rulesMeta.yaml
│ ├── metadataFS.go
│ ├── mock.go
│ ├── sampler_config.go
│ ├── sampler_config_test.go
│ ├── validate.go
│ └── validate_test.go
├── config_complete.1.x.toml
├── config_complete.yaml
├── generics
│ ├── fanout.go
│ ├── fanout_test.go
│ ├── mapttl.go
│ ├── mapttl_test.go
│ ├── set.go
│ ├── set_test.go
│ ├── setttl.go
│ └── setttl_test.go
├── go.mod
├── go.sum
├── internal
│ ├── configwatcher
│ │ └── watcher.go
│ ├── health
│ │ ├── health.go
│ │ └── health_test.go
│ ├── otelutil
│ │ └── otel_tracing.go
│ ├── peer
│ │ ├── file.go
│ │ ├── file_test.go
│ │ ├── mock.go
│ │ ├── peers.go
│ │ ├── peers_test.go
│ │ ├── pubsub_redis.go
│ │ └── pubsub_test.go
│ └── redimem
│ └── redimem.go
├── logger
│ ├── honeycomb.go
│ ├── logger.go
│ ├── logger_test.go
│ ├── logrus.go
│ ├── mock.go
│ └── null.go
├── metrics.md
├── metrics
│ ├── legacy.go
│ ├── legacy_test.go
│ ├── metrics.go
│ ├── metrics_test.go
│ ├── metricsnamer.go
│ ├── mock.go
│ ├── multi_metrics.go
│ ├── multi_metrics_test.go
│ ├── nullmetrics.go
│ ├── otel_metrics.go
│ ├── prometheus.go
│ └── prometheus_test.go
├── preinstall
├── pubsub
│ ├── pubsub.go
│ ├── pubsub_goredis.go
│ ├── pubsub_local.go
│ └── pubsub_test.go
├── refinery.service
├── refinery.upstart
├── refinery_config.md
├── refinery_rules.md
├── route
│ ├── errors.go
│ ├── errors_test.go
│ ├── middleware.go
│ ├── middleware_test.go
│ ├── mock.go
│ ├── otlp_logs.go
│ ├── otlp_logs_test.go
│ ├── otlp_trace.go
│ ├── otlp_trace_test.go
│ ├── proxy.go
│ ├── route.go
│ └── route_test.go
├── rules.json
├── rules.md
├── rules.toml
├── rules.yaml
├── rules_complete.1.x.toml
├── rules_complete.yaml
├── rules_conditions.md
├── sample
│ ├── deterministic.go
│ ├── deterministic_test.go
│ ├── dynamic.go
│ ├── dynamic_ema.go
│ ├── dynamic_ema_test.go
│ ├── dynamic_test.go
│ ├── ema_throughput.go
│ ├── ema_throughput_test.go
│ ├── rules.go
│ ├── rules_test.go
│ ├── sample.go
│ ├── sample_test.go
│ ├── totalthroughput.go
│ ├── totalthroughput_test.go
│ ├── trace_key.go
│ ├── trace_key_test.go
│ ├── windowed_throughput.go
│ └── windowed_throughput_test.go
├── service
│ └── debug
│ └── debug_service.go
├── sharder
│ ├── deterministic.go
│ ├── deterministic_test.go
│ ├── mock.go
│ ├── sharder.go
│ ├── sharder_test.go
│ └── single.go
├── smoke-test
│ ├── .gitignore
│ ├── README.md
│ ├── config-tls.yaml
│ ├── config.yaml
│ ├── docker-compose-tls.yaml
│ ├── docker-compose.yaml
│ └── rules.yaml
├── test
│ ├── EMAThroughput_rules.yaml
│ ├── TotalThroughput_rules.yaml
│ ├── WindowedThroughput_rules.yaml
│ ├── config.yaml
│ └── rules.yaml
├── tools
│ ├── convert
│ │ ├── Makefile
│ │ ├── README.md
│ │ ├── configDataNames.txt
│ │ ├── helpers.go
│ │ ├── helpers_test.go
│ │ ├── main.go
│ │ ├── metrics.go
│ │ ├── metricsMeta.yaml
│ │ ├── minimal_config.yaml
│ │ ├── ruleconvert.go
│ │ ├── ruleconvert_test.go
│ │ └── templates
│ │ ├── cfg_docrepo.tmpl
│ │ ├── cfg_docsite.tmpl
│ │ ├── configV2.tmpl
│ │ ├── genfield.tmpl
│ │ ├── genfile.tmpl
│ │ ├── gengroup.tmpl
│ │ ├── genremoved.tmpl
│ │ ├── metrics.tmpl
│ │ ├── names.tmpl
│ │ ├── rules_docrepo.tmpl
│ │ ├── rules_docsite.tmpl
│ │ └── sample.tmpl
│ └── loadtest
│ └── .gitignore
├── transmit
│ ├── mock.go
│ ├── transmit.go
│ └── transmit_test.go
└── types
├── event.go
└── event_test.go
# Refinery
## Refinery overview
Refinery is a tail-based distributed tracing sampler that:
1. Collects and distributes trace data across the cluster it's run on
a. It defaults to using trace locality, meaning that all data in a trace is forwarded to be on the same node
b. If trace locality is turned off, only metadata is forwarded across a cluster
2. Once a trace is considered fully recieved (its root span is collected or a timeout is reached), it processes the trace by looking at the configured sampling rules, and decides to either sample (keep) or drop the trace.
3. If a trace is sampled, it is enqueued for transmission.
a. Sampled trace IDs are also saved to a cache, which allows any late-arriving spans to also be transmitted instead of dropped.
4. Operational metrics, trace data, and logs are created throughout Refinery's operation and transmitted as OpenTelemetry data by default.
a. In particular, Refinery uses a notion of "stress level" to determine if and when it should start backing off on processing and applying rudimentary (but compute-inexpensive) sampling rules. This stress level is often monitored to determine if a Refinery cluster has adequate resources to run.
In particular, it specializes in dynamic sampling, which is a technique that adjusts the sampling rate of traces based on the uniqueness of the data and the desired throughput or sample rates. By combining dynamic sampling with rules-based sampling, Refinery allows a high degree of flexibility and sophistication in how traces are sampled.
## Key refinery operational characteristics
Refinery is also itself a sophisticated distributed system. It gossips its internal state and information about data in its nodes across the rest of the cluster. It also supports distributing trace data across a cluster in different ways using Redis.
Refinery also defines a notion of a "stress level", which is a measure of how much load a Refinery node is under. This stress level is used to determine if and when Refinery should start backing off on processing and applying rudimentary deterministic sampling, which ensures that data is still sampled rather than always dropped, but comes at the cost of not being able to apply more sophisticated sampling rules.
## Key areas of the codebase
Refinery has many files and folders, but these in particular are most helpful to understand the system and know where to make tweaks.
1. collect/ and collect/collect.go, which contains much of the logic for collecting and distributing trace data
2. types/ and types/event.go, which contains the data structures for trace data (confusingly, the term "event" is used for any unit of a trace, not just log events)
3. pubsub/ and pubsub/pubsub.go, which contains the logic for gossipping information about refinery nodes and the data resident within them across the cluster
4. sample/, which contains a variety of samplers. In particular, the dynamic samplers (such as ema_throughput.go) are heavily used in production because they automatically adjust sampling rates based on uniqueness of data and desired throughput or sample rates.
5. config/ and config/config.go, which contains the configuration loading and validation logic for Refinery. Refinery uses a TOML configuration file, but also supports YAML and JSON for configuration.
As you iterate, you'll find that the LLM environment needs more pushing to "get things right" with respect to the context you've provided. Usually it will make an assumption about a module or file that isn't correct. The solution here is very simple:
Program your development environment.
Think of the LLM has a wobbly and weird kind of computer that's bad (or at least inefficient) at tasks like adding numbers together, but extremely good at being directionally accurate about things. You can guide its force vector towards the particular outcome you want.
Given this, when you want it to have updated context, you can simply ask it to update its own context once you know it got something right. For example:
If you generated a new nextjs component that does an important thing and it works correctly, ask the LLM to update its context with the new file and a brief description of the purpose it serves.
If you added a new kind of protocol support for a library, ask the LLM to update internal documentation about protocol support.
If you shuffle data off to a new AWS service, ask the LLM to update its description of the data flow and AWS services used, and for which purpose.
You will still need to review things, but as you iterate and continually have better and more up-to-date context, you'll find that you don't need to correct it that often anymore.
The advice here is no different than the general advice you find online about software engineering regarding small diffs, small deployments, etc.
AI models are highly complex systems that increasingly get things wrong when you ask them to do too much at once. The solution is to simply have them do less. Some examples:
Don't generate an entire website at once. Generate a single component, then another, then another. Run your app and run tests each time.
Don't generate an entire API at once. Generate a single endpoint, work on the conventions that fit best, adjust your data model incrementally, etc.
Don't generate an internal library of utility functions at once.
Don't generate an entire new feature at once. Start with a task description, then iterate on a spec, then generate some code for the spec, then generate tests, run the tests, generate more code for the spec, etc.
These guidelines are not hard and fast, but also hold true with agents in my experience. While an agent can do a better job of defining an internal library of utility functions given a desired goal and constraints (or just a spec), I've found that agents simply don't work well with existing codebases and nontrivial tasks. Right now, they seem especially tuned for generating brand-new codebases from sratch, and simple ones at that.
For agents, my inuition so far is that you really need to establish firm guardrails around their scope. Think of an agent as helping implement a small ticket end-to-end, with you dictacting how it should clarify requirements, test code, and perform the iterative loop. I've found that if I don't do this, the agent will wander off into the wilderness and eventually just generate nonsensical, often uncompilable code.
I'm genuinely looking forward to revisiting this blog post in a year now that LLMs have gotten faster and cheaper, and that agents are starting to pop up. They're all still really bad at doing what they're advertised for, but they can be highly useful with some work.
The future I'm looking for is one where programming is lifted up to another level of abstraction. Instead of writing a ton of code, we focus a lot more on:
What, exactly, we want to build
How we define an experiment and measurement criteria for what we build
Exploring many different approaches to solving a problem all at once
Automatically embedding best practices into code like robust tests, good observability and internal documentation, etc.
Finding out what we don't know about a problem and how to solve it, and using these tools to facilitate learning about systems and the way they get used
I think we're still a long way off from this reality. Even if we assume the current technical hurdles are solved -- which is absolutely, positively not a given -- there are other systematic hurdles involving human beings and the real world that need addressing. To me, all signs point towards software engineering changing radically as a profession to be much more oriented around the what and why of software, and much less around the how. This will cause disruption at a massive scale in the long run. But in the short run, it's just a lot of fun to play with these tools and see what they can do.