Grok 4 Claimed to Be “Smartest AI,” But Leaderboard Scores Say Otherwise

World Digest Media
Published: August 4, 2025

San Francisco – When Elon Musk declared xAI’s Grok 4 the smartest AI on the planet, it was yet another bold claim in a long line of Muskian pronouncements. According to him, Grok 4 wasn’t just capable—it was smarter than most graduate students, across disciplines.

But now, with data from UC Berkeley’s LMArena leaderboard finally public, the results tell a different story.

Grok 4 did perform impressively, landing third place overall and in text generation. It even tied with GPT-4.5, one of OpenAI’s latest models. Yet it still fell short of dethroning Google’s Gemini 2.5, which topped the chart, and OpenAI’s o3 and 4o reasoning models, which jointly claimed the second spot.

The LMArena platform relies on a community-driven scoring system, rating AI performance across creative writing, vision, mathematics, and code. While not a perfect science, it’s one of the more transparent public benchmarks available.

Still, the leaderboard itself has faced scrutiny. A recent paper led by researchers from Cohere criticized LMArena for methodological flaws, including alleged private testing and score manipulation. The controversy deepened after it was revealed that Meta had used a non-public version of LLaMA 4 during evaluations—raising questions about fairness and transparency.

Though Meta issued an apology, the incident dented the platform’s reputation. As for Grok, it remains a top-tier model, but its “world’s smartest” title now feels more like savvy marketing than a scientific conclusion.

Until a definitive global benchmark emerges, we’ll be left sorting hype from hardware—one leaderboard at a time.