The 2025 AI Index Report | Stanford

rcjordan · May 25, 2025, 02:24:00 PM

https://hai.stanford.edu/ai-index/2025-ai-index-report

ergophobe · May 25, 2025, 11:50:19 PM

That is a fascinating report. Clicking through to the Science and Medicine chapter, this jumped out:

"A new study found that GPT-4 alone outperformed doctors—both with and without AI—in diagnosing complex clinical cases."

In other words, when a doctor uses GPT-4, the quality of diagnosis decreases compared to GPT-4 left to it's own to diagnose the problem.

Also, check out the chapter on technical benchmarks and the ominously named "Humanity's Last Exam (HLE)" where the general LLMs still manage at best 8.8% compared to the 97% standard for humans. I wonder how that looks in a year.

In general, on most benchmarks, the AIs are not outperforming humans. "Intelligence" is now a scalable commodity in many domains.

rcjordan · May 26, 2025, 02:21:15 AM

AI system resorts to blackmail if told it will be removed

https://www.bbc.com/news/articles/cpqeng9d20go

rcjordan · May 26, 2025, 10:04:31 PM

New ChatGPT model refuses to shut down when instructed

https://www.msn.com/en-gb/technology/artificial-intelligence/ai-revolt-new-chatgpt-model-refuses-to-shut-down-when-instructed/ar-AA1Fuv7X

rcjordan · May 28, 2025, 01:53:08 AM

Anthropic faces backlash to Claude 4 Opus behavior that contacts authorities, press if it thinks you're doing something 'egregiously immoral' | VentureBeat

https://venturebeat.com/ai/anthropic-faces-backlash-to-claude-4-opus-behavior-that-contacts-authorities-press-if-it-thinks-youre-doing-something-immoral/

The 2025 AI Index Report | Stanford

rcjordan

ergophobe

rcjordan

rcjordan

rcjordan