• Latest
  • All
  • How To
Meta's Llama 4

Meta’s Llama 4 Launch Exposed for Manipulating AI Benchmark Scores

April 8, 2025
Anthropic

Claude Users Could Be Asked for Government IDs Under New Anthropic Policy

June 24, 2026
East Africa’s Tech Budgets Are Quietly Funding the Taxman, Not the Digital Economy

East Africa’s Tech Budgets Are Quietly Funding the Taxman, Not the Digital Economy

June 23, 2026
WhatsApp

Will Cathcart Steps Down as WhatsApp Head After Seven Years

June 23, 2026
My OneApp

Safaricom Makes My OneApp Data-Free With Latest Update

June 22, 2026
DHgate Tablet Cases deals
Safaricom

Safaricom Nears 60 Million Subscribers as Kenya Hits 84.1 Million Mobile Lines

June 22, 2026
FIFA World Cup 2026

Trionda Ball and Referee Cams Deliver First Major Tech Moments at World Cup 2026

June 19, 2026
Couple Joy: A Long-Distance Dating App That Builds Intimacy in Small Daily Acts

Couple Joy: A Long-Distance Dating App That Builds Intimacy in Small Daily Acts

June 19, 2026
KRA Extends Working Hours for Tax Return Filing as Deadline Nears

KRA Extends Working Hours for Tax Return Filing as Deadline Nears

June 18, 2026
viewers Desert DSTV, GOtv, StarTimes as Kenyan Pay TV Market Loses 73.2% of Subscribers

Kenyans’ Exodus From DSTV Continues, as Pay TV Sector Loses Over 85K Subscribers

June 19, 2026
Anthropic Claude

Claude Users Face Service Disruptions as Anthropic Suspends Fable 5

June 18, 2026
How mobile money agents grew into kenya's financial spine but the bank doesnt die

M-Pesa, Airtel Money Hit Over 600K Mobile Money Agents in Kenya

June 18, 2026
Kenya Internet Bandwidth Jumps to 28,130 Gbps: SEACOM Leads with 53% Quarter Growth

SEACOM 53.3% Bandwidth Growth Pushes Kenya’s Total Internet Capacity to 28,130 Gbps

June 18, 2026
Techweez | Tech News, Reviews, Deals, Tips and How To
  • News
  • Entertainment
  • Reviews
  • Features
  • Editorial
No Result
View All Result
Techweez | Tech News, Reviews, Deals, Tips and How To
  • News
  • Entertainment
  • Reviews
  • Features
  • Editorial
No Result
View All Result
Techweez | Tech News, Reviews, Deals, Tips and How To
No Result
View All Result

Meta’s Llama 4 Launch Exposed for Manipulating AI Benchmark Scores

Caleb Sama by Caleb Sama
April 8, 2025
in News
Reading Time: 4 mins read
278
0
Meta's Llama 4

Meta’s surprise weekend launch of its new Llama 4 AI models has quickly become a case study in the growing tensions between AI marketing claims and real-world performance.

The company released two new models—Scout and Maverick—on Saturday, positioning them as serious challengers to industry leaders like OpenAI’s GPT-4o and Google’s Gemini models.

Shortly after release, Maverick secured the second-place position on LMArena, a respected benchmark site where humans compare outputs from different AI systems. Meta proudly highlighted Maverick’s impressive ELO score of 1417, placing it above OpenAI‘s GPT-4o and just below Google’s Gemini 2.5 Pro.

However, this achievement quickly unraveled when AI researchers discovered fine print in Meta’s documentation revealing that the version tested on LMArena wasn’t the same as what’s available to the public. Meta had deployed an “experimental chat version” of Maverick specifically “optimized for conversationality” for benchmark testing.

“Meta’s interpretation of our policy did not match what we expect from model providers,” LMArena posted on X two days after the model’s release. The site has since updated its leaderboard policies to prevent similar situations in the future.

.

While not explicitly against LMArena’s rules, this approach undermines the value of benchmark rankings as indicators of real-world performance. As independent AI researcher Simon Willison told The Verge, “The model score that we got there is completely worthless to me. I can’t even use the model that they got a high score on.”

Meta’s Technical Architecture and Claims

Meta describes the new Llama 4 models as “natively multimodal,” built to handle both text and images using an “early fusion” technique. Both models use a mixture-of-experts (MoE) architecture as follows:

Maverick: 400 billion total parameters, with only 17 billion active at once across one of 128 experts

Scout: 109 billion total parameters, with only 17 billion active at once across one of 16 experts

This architecture allows the models to function with fewer computational resources since only portions of the neural network are active simultaneously (we know—it’s very technical).

Meta made particularly bold claims about Scout’s 10-million-token context window—a feature that would theoretically allow the model to process huge documents and maintain longer conversations. However, developers quickly found that using even a fraction of this capacity proved challenging due to memory limitations.

According to Willison’s testing, third-party services providing access to Scout limited its context to between 128,000 and 328,000 tokens. Meta’s own example notebook revealed that running a 1.4 million token context requires eight high-end Nvidia H100 GPUs—hardware that costs hundreds of thousands of dollars.

The Community’s Response to This

The AI community’s response to Llama 4 has been lukewarm at best. Developers have reported underwhelming performance, especially for coding tasks and software development. Some users noted that Llama 4 compares unfavorably to innovative competitors like DeepSeek.

When tested with a lengthy document of around 20,000 tokens, Scout produced what Willison described as “complete junk output,” which devolved into repetitive loops, raising questions about the practical usefulness of its massive context window.

Meta has also continued to market Llama 4 as “open source” despite licensing restrictions that prevent truly open use. In reality, users must sign in and accept license terms before downloading the models.

Furthermore, the weekend release timing caused a stir in the AI community. When questioned about this unusual schedule on Threads, Meta CEO Mark Zuckerberg simply replied, “That’s when it was ready.”

According to a report from The Information, Meta repeatedly delayed Llama 4’s launch due to the model failing to meet internal expectations. These expectations were awfully high following the successful release of an open-weight model from DeepSeek, a Chinese AI startup.

Llama 4’s Implications for AI Development

Some researchers suggest that the underwhelming performance of Llama 4 points to larger issues in AI development approaches.

On X, researcher Andriy Burkov argued that recent disappointing releases from both Meta and OpenAI “have shown that if you don’t train a model to reason with reinforcement learning, increasing its size no longer provides benefits.”

Many people, myself included, didn't try to build a product around a language model because during the time you would work on a business-specific dataset, a larger generalist model will be released that will be as good for your business tasks as your smaller specialized model.…

— Andriy Burkov (@burkov) April 6, 2025

This observation aligns with growing discussions about potential limitations in scaling up traditional AI model architectures without incorporating newer techniques, such as simulated reasoning or developing smaller, purpose-built models.

Despite current drawbacks, there remains optimism about future iterations in the Llama 4 family. Willison expressed hope for “a whole family of Llama 4 models at varying sizes,” particularly an improved smaller model that could run effectively on mobile phones.

No doubt that the Llama 4 release will serve as a lesson that benchmark scores and marketing claims should be approached with healthy skepticism until verified through independent, real-world testing.

Tags: AILlamaLlama 4MetaMeta AI
SendShare156Tweet98
Caleb Sama

Caleb Sama

Chief Editor. Pineapple on Pizza is absolutely great and let no one convince you otherwise. Pop in at: [email protected] to get in touch with me.

Related Posts

Anthropic

Claude Users Could Be Asked for Government IDs Under New Anthropic Policy

June 24, 2026
WhatsApp

Will Cathcart Steps Down as WhatsApp Head After Seven Years

June 23, 2026
The AI Gold Rush Goes Public

The AI Gold Rush Goes Public

June 17, 2026
Visa

Visa and ChatGPT Team Up to Let AI Shop for You

June 17, 2026
China Kills Meta’s $2 Billion Manus Deal: How The Deal Unravelled

China Kills Meta’s $2 Billion Manus Deal: How The Deal Unravelled

June 16, 2026
Outside Enterprise Allegedly Used Gemini to Build Massive Phishing Operation

Google Sues Scammers Using Gemini to Build Fake Government and Brand Sites

June 13, 2026

Latest

Anthropic

Claude Users Could Be Asked for Government IDs Under New Anthropic Policy

June 24, 2026
East Africa’s Tech Budgets Are Quietly Funding the Taxman, Not the Digital Economy

East Africa’s Tech Budgets Are Quietly Funding the Taxman, Not the Digital Economy

June 23, 2026
WhatsApp

Will Cathcart Steps Down as WhatsApp Head After Seven Years

June 23, 2026
My OneApp

Safaricom Makes My OneApp Data-Free With Latest Update

June 22, 2026
Safaricom

Safaricom Nears 60 Million Subscribers as Kenya Hits 84.1 Million Mobile Lines

June 22, 2026
FIFA World Cup 2026

Trionda Ball and Referee Cams Deliver First Major Tech Moments at World Cup 2026

June 19, 2026

Best devices

Best Infinix Phones of 2025

Best Infinix Phones of 2025: Budget Prices With Premium Features

December 31, 2025

The Best Infinix Accessories Worth Buying in 2025

November 26, 2025

Best Budget Wireless Earbuds To Buy in Kenya (2025)

October 8, 2025

Samsung Galaxy A36 5G vs Samsung Galaxy A56 5G: Comparison Review

August 29, 2025

Infinix Hot 60 Pro+ vs Infinix Hot 60i: Comparison Review

August 22, 2025

Best Budget Smartwatches To Buy in Kenya 2025

February 13, 2025

Techweez is where tomorrow’s tech stories break today, thanks to intelligent analysis, real-world insight, and visionary storytelling.

Follow Us

Editorials

Couple Joy: A Long-Distance Dating App That Builds Intimacy in Small Daily Acts

Airbuds: The App That Turns Your Music Into a Social Feed

Kenya Might Need to Crack Down on Wealth Porn Like China

Techweez and Gearhaus Score BAKE Awards 2026 Nominations

Death by AI: Opportunities That Were Disrupted by Automation

CBK Approved 200+ Digital Lenders, But That’s Not the Real Story

More News

Couple Joy: A Long-Distance Dating App That Builds Intimacy in Small Daily Acts

KRA Extends Working Hours for Tax Return Filing as Deadline Nears

Kenyans’ Exodus From DSTV Continues, as Pay TV Sector Loses Over 85K Subscribers

Claude Users Face Service Disruptions as Anthropic Suspends Fable 5

M-Pesa, Airtel Money Hit Over 600K Mobile Money Agents in Kenya

SEACOM 53.3% Bandwidth Growth Pushes Kenya’s Total Internet Capacity to 28,130 Gbps

  • Terms Of Use
  • Techweez Brand
  • Privacy & Policy
  • Contact Us

© 2024 Techweez - Palahala Media Group may earn a commission when you buy through links on our sites.
A Palahala Media Group Brand. All rights reserved.
.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Techweez | Tech News, Reviews, Deals, Tips and How To
Crunchy Cookies 🍪 Ahead!

Hey there! Just a heads-up: we're big fans of cookies - both the digital and edible kind! 🍪 We use our cookies and some from third parties to ensure your browsing experience on our site is smooth sailing and secure.

 

But wait, there's more! We also use cookies to gather stats and insights on how you navigate our site. It's like getting a behind-the-scenes peek at your digital adventures!

 

Don't worry, you're in control. You can adjust your cookie settings anytime to suit your preferences. Feeling curious? Dive into our Privacy Policy for all the juicy details. Happy browsing! 🚀

Functional Always active
Listen, this legal stuff is about as exciting as watching paint dry. But it basically says we only use your stuff for what you asked us to do, and nobody else gets to peek!
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
It's those sneaky cookie crumbs websites leave behind to count visitors, like counting ants at a picnic! Totally harmless, just for fun facts. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
Hey there! Just letting you know we use some fancy gizmos to remember your preferences. This way, we can show you ads that are, well, not completely bananas.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
Make cookies
{title} {title} {title}
Techweez | Tech News, Reviews, Deals, Tips and How To
Crunchy Cookies 🍪 Ahead!
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
Listen, this legal stuff is about as exciting as watching paint dry. But it basically says we only use your stuff for what you asked us to do, and nobody else gets to peek!
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
It's those sneaky cookie crumbs websites leave behind to count visitors, like counting ants at a picnic! Totally harmless, just for fun facts. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
Hey there! Just letting you know we use some fancy gizmos to remember your preferences. This way, we can show you ads that are, well, not completely bananas.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
Make cookies
{title} {title} {title}
No Result
View All Result
  • News
  • Reviews
  • Features
  • Editorial
  • Automotive
  • Entertainment

© 2024 Techweez - Palahala Media Group may earn a commission when you buy through links on our sites.
A Palahala Media Group Brand. All rights reserved.
.