https://hai.stanford.edu/ai-index/2025-ai-index-report
That is a fascinating report. Clicking through to the Science and Medicine chapter, this jumped out:
"A new study found that GPT-4 alone outperformed doctors—both with and without AI—in diagnosing complex clinical cases."
In other words, when a doctor uses GPT-4, the quality of diagnosis decreases compared to GPT-4 left to it's own to diagnose the problem.
Also, check out the chapter on technical benchmarks and the ominously named "Humanity's Last Exam (HLE)" where the general LLMs still manage at best 8.8% compared to the 97% standard for humans. I wonder how that looks in a year.
In general, on most benchmarks, the AIs are not outperforming humans. "Intelligence" is now a scalable commodity in many domains.
AI system resorts to blackmail if told it will be removed
https://www.bbc.com/news/articles/cpqeng9d20go
New ChatGPT model refuses to shut down when instructed
https://www.msn.com/en-gb/technology/artificial-intelligence/ai-revolt-new-chatgpt-model-refuses-to-shut-down-when-instructed/ar-AA1Fuv7X
Anthropic faces backlash to Claude 4 Opus behavior that contacts authorities, press if it thinks you're doing something 'egregiously immoral' | VentureBeat
https://venturebeat.com/ai/anthropic-faces-backlash-to-claude-4-opus-behavior-that-contacts-authorities-press-if-it-thinks-youre-doing-something-immoral/