Does AI Help Write Better Software, or Just… More Code?

Does AI Help Write Better Software, or Just… More Code?

6 Min. Read

As software teams race to integrate AI into their development workflows, we need to ask ourselves: are AI-powered tools actually making software better? The latest research from DORA confirms what many engineers have long suspected, and what we at Honeycomb have said for a long time: AI tools don’t magically lead to better software. In fact, without careful implementation, AI can introduce a whole slew of challenges, including decreased productivity and unreliable code.

So, how can teams harness AI effectively while maintaining high software quality? Let’s break down the common pitfalls of integrating AI into modern development workflows.

The most common AI development mistakes

AI can be a double-edged sword in software engineering. While it can speed up development, it can also create mountains of technical debt. Here are some of the most frequent issues teams can run into when using AI in development:

Code that misses business context

LLMs can generate code that, while syntactically correct, can introduce subtle antipatterns in your software because often, they don’t have high-level context on the constraints a codebase abides by. Even when the whole codebase is part of their context window (with clever agentic tooling like Windsurf and Claude Code), LLMs lack the non-tangible team context that human engineers bring into any project. Advanced solutions may produce code that works on first try for some areas of a codebase, but struggle in areas that require historical context to evolve in the right direction. This can lead to a more difficult code review problem, especially if the purported author of the code doesn’t actually know what it does.

There are two key things to do here:

  1. Developers should treat their AI assistants as powerful but not necessarily deserving of trust, review changes themselves, use AI to generate more tests of desired behavior as a guardrail, and apply prompting techniques that second-guess generated code before committing their code.
  2. Teams should work towards encoding key principles and constraints as rule files (such as Cursor’s Rules of AI) so that AI assistants can use this knowledge.

Unfortunately, generative AI isn’t at the point where it can figure out all the context behind things, which Phillip Carter, Principal Product Manager at Honeycomb, recently mentioned in his blog: “As you iterate, you’ll find that the LLM environment needs more pushing to get things right with respect to the context you’ve provided. Usually it will make an assumption about a module or file that isn’t correct.” Luckily, the solution is pretty simple: make the tool update its own context as you correct its outputs. This is more of a pattern to apply to agentic coding assistant tools.

Another aspect of low quality code is the use of libraries that can introduce security vulnerabilities. Some AI assistants will gladly recommend new packages, and the ones they suggest may be inappropriate for your use case. Be very careful with libraries—if you can avoid introducing new dependencies, do it. As a rule of thumb, stick to using libraries only when they solve a problem that would be too hard to solve yourself.


Get Phillip Carter’s book: Observability for Large Language Models.


Unreliable outputs

LLMs are non-deterministic black boxes. They don’t always produce the same output for the same input, and the output may be incorrect or nonsensical with respect to your use case. You can’t always understand why they produced the output they did either. If you’re not careful, you can end up spinning your wheels asking an AI assistant to generate code again and again, to the point where it would have been faster for you to just write the code yourself.

Additionally, LLMs still struggle with complex edge cases that require deep reasoning or specialized knowledge, especially if you’re not able to use newer models that apply more complex reasoning steps into generating outputs. As mentioned earlier, without context about what needs to be done and the constraints code must run under, you can’t expect an LLM to output the right code.

Misleading shortcuts

LLMs can give the illusion of rapid progress—after all, they are machines that can spit out a lot of code very quickly—but more code doesn’t necessarily mean more working software, or more happy users. Additionally, overreliance on automation like coding assistants can cause some people to fall into classic automation traps, where they trust the LLM more than themselves to create things.

People also tend to ascribe their own preferences and limitations on LLMs. For a given task, if it seems easy to a human, they may naturally believe that it’s also easy for an LLM to get right, and so they’ll try to automate it. However, this can be a subtle trap. LLMs can be shockingly good at tasks that humans find difficult, and downright awful at tasks that humans find easy as well! This can lead to more overall time spent trying to get the LLM to do “easy” things than simply doing them yourself.

Fred Hebert, Staff Site Reliability Engineer at Honeycomb, also recently wrote: “It’s been known for decades that when automation handles standard challenges, the operators expected to take over when they reach their limits end up worse off and generally require more training to keep the overall system performant.” If we want to grow happy users, let’s make sure we gain understanding of our systems with AI, not lose it.

Join us for our webinar with DORA

AI has immense potential to improve software development, but only when implemented thoughtfully. 

We’d love to invite you to join us for our webinar, AI’s Unrealized Potential: Honeycomb and DORA on Smarter, More Reliable Development with LLMs, on March 20th, 2025, at 10:00 a.m. PT / 1:00 p.m. ET  with our guest, Nathen Harvey, Developer Advocate at DORA. He’ll be joined by Charity Majors, CTO and Co-founder, and Phillip Carter, Principal Product Manager. 

We’ll talk about the challenges of AI usage in software engineering and how teams can avoid common pitfalls. We’ll also talk about how to actually improve the output from your favorite genAI agent by giving it more context. You can register here for the webinar

Don’t forget to share!
Fahim Zaman

Fahim Zaman

Principal Senior Product Marketing Manager

Fahim loves building scalable, always-on intelligence and enablement programs by bridging knowledge from sales, marketing, and product. He brings strategic product marketing experience from Acxiom, LiveRamp, and Optimizely.

Related posts