AI Coding Assistants Are Getting Worse...

Started by rcjordan, January 08, 2026, 04:38:59 PM

Previous topic - Next topic

rcjordan

...Newer models are more prone to silent but deadly failure modes

AI Coding Degrades: Silent Failures Emerge - IEEE Spectrum

https://spectrum.ieee.org/ai-coding-degrades

ergophobe

#1
To some extent, this is always a problem as you increase abstraction in a programming language and AI-assisted coding is, effectively, the most abstract programming "language" available today.

The obvious place where this happens in abstract languages is with type enforcement.

In assembly language, C, C++ if you try to add a string plus an integer, you get compile-time error.

In a more abstract language, it might be a run-time error.

In a language like PHP, you get executable code that *might* throw a runtime exception that *might* be hidden. In early versions of PHP it was hard to even reliably catch something like that. Coming from a programming background, it was perplexing to not declare types and to not enforce them. But for someone with no programming background, it lets them get up and running with minimal knowledge.

But then they write a script where they are unknowingly adding an integer and a string a string and don't understand why the output is wrong.

For example, if you have
$x = 1;
$y = "1";

PHP lets you say
$a = $x + $y;

And that might work as expected 1000 times until the .CSV file changes and $y = "Canada" and the script fails.

As PHP has evolved, one thing that keeps happening is they keep increasing the ability throw fatal errors if there is a type mismatch.

https://www.php.net/manual/en/language.types.declarations.php

In theory this makes it harder to code. In practice, it makes it slightly harder to create code that runs, but significantly harder to create code that has execution errors that are hard to track down - sometimes you can only figure this out with a debugger that steps through the code one line at a time and lets you expect every variable in the current context.

I suspect that to be fully useful, AI coding assistants will need to have flags analogous to the strict_types declaration in PHP.

littleman

Puzzling that they would be getting worse though.  Maybe in the race to truly to assimilate as much human knowledge as possible the llms are not putting enough effort into quality contro -- a basic signal to noise problem.

rcjordan

>Puzzling

Not necessarily for coding, but one explanation I see about worsening results is that they are hungrily scraping everything ...and that includes other LLMs crappy output (with the 30%?? hallucination rate).

ergophobe

I guess my roundabout answer was meant to say that providing AN answer (like taking the spreadsheet line number and using that as the index value) was supposed to be better in the same way loose typing was supposed to be better in PHP.

That had to be corrected in PHP in order to make a professional programming language that powers much of the web and I'm guessing this will be corrected too.

In any case, there are definitely dissenting views. It depends a little on what your evaluation criteria.

https://www.oneusefulthing.org/p/claude-code-and-what-comes-next
"This is Claude Code at work, one of a new generation of AI coding tools that represent a sudden capability leap in AI in the past month or so." Ethan Mollick, Jan 7, 2026


Or Nathan Lambert, Jan 9, 2026
"Having used coding agents extensively for the past 6-9 months, where it felt like sometimes OpenAI's Codex was the best and sometimes Claude, there was some meaningful jump over the last few weeks. "
https://www.interconnects.ai/p/claude-code-hits-different

It's not hard to find others from just the past few weeks. Most of what I read say they are getting better and better.

But as always with AI, it's a jagged frontier. They are way more capable, but they might try solutions that don't actually make sense. I think in general the fact that they have moved from making syntax errors to making errors that are not so obvious means that they have progressed from Programming 101 to Junior Developer. I've seen a lot of professional developers make the kinds of errors the author of the original article mentions (and I've made plenty of those errors too)