The 2025 AI Index Report | Stanford

Started by rcjordan, May 25, 2025, 02:24:00 PM

Previous topic - Next topic

rcjordan


ergophobe

That is a fascinating report. Clicking through to the Science and Medicine chapter, this jumped out:

"A new study found that GPT-4 alone outperformed doctors—both with and without AI—in diagnosing complex clinical cases."

In other words, when a doctor uses GPT-4, the quality of diagnosis decreases compared to GPT-4 left to it's own to diagnose the problem.

Also, check out the chapter on technical benchmarks and the ominously named "Humanity's Last Exam (HLE)" where the general LLMs still manage at best 8.8% compared to the 97% standard for humans. I wonder how that looks in a year.

In general, on most benchmarks, the AIs are not outperforming humans. "Intelligence" is now a scalable commodity in many domains.

rcjordan


rcjordan


rcjordan

Anthropic faces backlash to Claude 4 Opus behavior that contacts authorities, press if it thinks you're doing something 'egregiously immoral' | VentureBeat

https://venturebeat.com/ai/anthropic-faces-backlash-to-claude-4-opus-behavior-that-contacts-authorities-press-if-it-thinks-youre-doing-something-immoral/