The Core

Why We Are Here => Hardware & Technology => Topic started by: rcjordan on May 25, 2025, 02:24:00 PM

Title: The 2025 AI Index Report | Stanford
Post by: rcjordan on May 25, 2025, 02:24:00 PM

https://hai.stanford.edu/ai-index/2025-ai-index-report
Title: Re: The 2025 AI Index Report | Stanford
Post by: ergophobe on May 25, 2025, 11:50:19 PM
That is a fascinating report. Clicking through to the Science and Medicine chapter, this jumped out:

"A new study found that GPT-4 alone outperformed doctors—both with and without AI—in diagnosing complex clinical cases."

In other words, when a doctor uses GPT-4, the quality of diagnosis decreases compared to GPT-4 left to it's own to diagnose the problem.

Also, check out the chapter on technical benchmarks and the ominously named "Humanity's Last Exam (HLE)" where the general LLMs still manage at best 8.8% compared to the 97% standard for humans. I wonder how that looks in a year.

In general, on most benchmarks, the AIs are not outperforming humans. "Intelligence" is now a scalable commodity in many domains.
Title: Re: The 2025 AI Index Report | Stanford
Post by: rcjordan on May 26, 2025, 02:21:15 AM
AI system resorts to blackmail if told it will be removed

https://www.bbc.com/news/articles/cpqeng9d20go
Title: Re: The 2025 AI Index Report | Stanford
Post by: rcjordan on May 26, 2025, 10:04:31 PM
New ChatGPT model refuses to shut down when instructed

https://www.msn.com/en-gb/technology/artificial-intelligence/ai-revolt-new-chatgpt-model-refuses-to-shut-down-when-instructed/ar-AA1Fuv7X
Title: Re: The 2025 AI Index Report | Stanford
Post by: rcjordan on May 28, 2025, 01:53:08 AM
Anthropic faces backlash to Claude 4 Opus behavior that contacts authorities, press if it thinks you're doing something 'egregiously immoral' | VentureBeat

https://venturebeat.com/ai/anthropic-faces-backlash-to-claude-4-opus-behavior-that-contacts-authorities-press-if-it-thinks-youre-doing-something-immoral/