• Latest
  • All
  • How To
Meta's Llama 4

Meta’s Llama 4 Launch Exposed for Manipulating AI Benchmark Scores

April 8, 2025
OpenAI

Florida’s OpenAI Lawsuit Has a Lesson for Kenya’s AI Bill

June 3, 2026
Influencers

Kenya Might Need to Crack Down on Wealth Porn Like China

June 3, 2026
Nairobi Railways

Electric Trains Set to Replace Nairobi’s Aging Diesel Rail System Under KES 65B Plan

June 3, 2026
PayPal

PayPal Is Freezing Kenyan Accounts Amid Anti-Money Laundering Scrutiny

June 3, 2026
DHgate Tablet Cases deals
Nvidia RTX laptop

Nvidia Wants to Sell You a PC Again

June 2, 2026
World Cup 2026

How Technology and New Rule Changes Will Influence the Upcoming World Cup 2026

June 2, 2026
Meta One Account

Meta to Merge Facebook, Instagram and Threads Logins Into One Account

June 2, 2026
Anthropic

Claude Maker Anthropic Files for IPO, Joins AI Lab Race to Go Public

June 2, 2026
Bolt

Viral Notice Claiming Bolt Kenya Shutdown Officially Declared Fake

June 2, 2026
M-Pesa transaction limit KES 250,000

M-Pesa Transaction Looming Cost Hike Explained: The 33.4% Effective Tax Under Finance Bill 2026

June 2, 2026
Meta Just Put Your WhatsApp, Instagram, and Facebook Behind a Paywall

Meta Just Put Your WhatsApp, Instagram, and Facebook Behind a Paywall

May 29, 2026
Arsenal vs PSG Champions League stream

How to Watch UEFA Champions League: TV Broadcast and Online Live Streams

May 29, 2026
Techweez | Tech News, Reviews, Deals, Tips and How To
  • News
  • Entertainment
  • Reviews
  • Features
  • Editorial
No Result
View All Result
Techweez | Tech News, Reviews, Deals, Tips and How To
  • News
  • Entertainment
  • Reviews
  • Features
  • Editorial
No Result
View All Result
Techweez | Tech News, Reviews, Deals, Tips and How To
No Result
View All Result

Meta’s Llama 4 Launch Exposed for Manipulating AI Benchmark Scores

Caleb Sama by Caleb Sama
April 8, 2025
in News
Reading Time: 4 mins read
278
0
Meta's Llama 4

Meta’s surprise weekend launch of its new Llama 4 AI models has quickly become a case study in the growing tensions between AI marketing claims and real-world performance.

The company released two new models—Scout and Maverick—on Saturday, positioning them as serious challengers to industry leaders like OpenAI’s GPT-4o and Google’s Gemini models.

Shortly after release, Maverick secured the second-place position on LMArena, a respected benchmark site where humans compare outputs from different AI systems. Meta proudly highlighted Maverick’s impressive ELO score of 1417, placing it above OpenAI‘s GPT-4o and just below Google’s Gemini 2.5 Pro.

However, this achievement quickly unraveled when AI researchers discovered fine print in Meta’s documentation revealing that the version tested on LMArena wasn’t the same as what’s available to the public. Meta had deployed an “experimental chat version” of Maverick specifically “optimized for conversationality” for benchmark testing.

“Meta’s interpretation of our policy did not match what we expect from model providers,” LMArena posted on X two days after the model’s release. The site has since updated its leaderboard policies to prevent similar situations in the future.

.

While not explicitly against LMArena’s rules, this approach undermines the value of benchmark rankings as indicators of real-world performance. As independent AI researcher Simon Willison told The Verge, “The model score that we got there is completely worthless to me. I can’t even use the model that they got a high score on.”

Meta’s Technical Architecture and Claims

Meta describes the new Llama 4 models as “natively multimodal,” built to handle both text and images using an “early fusion” technique. Both models use a mixture-of-experts (MoE) architecture as follows:

Maverick: 400 billion total parameters, with only 17 billion active at once across one of 128 experts

Scout: 109 billion total parameters, with only 17 billion active at once across one of 16 experts

This architecture allows the models to function with fewer computational resources since only portions of the neural network are active simultaneously (we know—it’s very technical).

Meta made particularly bold claims about Scout’s 10-million-token context window—a feature that would theoretically allow the model to process huge documents and maintain longer conversations. However, developers quickly found that using even a fraction of this capacity proved challenging due to memory limitations.

According to Willison’s testing, third-party services providing access to Scout limited its context to between 128,000 and 328,000 tokens. Meta’s own example notebook revealed that running a 1.4 million token context requires eight high-end Nvidia H100 GPUs—hardware that costs hundreds of thousands of dollars.

The Community’s Response to This

The AI community’s response to Llama 4 has been lukewarm at best. Developers have reported underwhelming performance, especially for coding tasks and software development. Some users noted that Llama 4 compares unfavorably to innovative competitors like DeepSeek.

When tested with a lengthy document of around 20,000 tokens, Scout produced what Willison described as “complete junk output,” which devolved into repetitive loops, raising questions about the practical usefulness of its massive context window.

Meta has also continued to market Llama 4 as “open source” despite licensing restrictions that prevent truly open use. In reality, users must sign in and accept license terms before downloading the models.

Furthermore, the weekend release timing caused a stir in the AI community. When questioned about this unusual schedule on Threads, Meta CEO Mark Zuckerberg simply replied, “That’s when it was ready.”

According to a report from The Information, Meta repeatedly delayed Llama 4’s launch due to the model failing to meet internal expectations. These expectations were awfully high following the successful release of an open-weight model from DeepSeek, a Chinese AI startup.

Llama 4’s Implications for AI Development

Some researchers suggest that the underwhelming performance of Llama 4 points to larger issues in AI development approaches.

On X, researcher Andriy Burkov argued that recent disappointing releases from both Meta and OpenAI “have shown that if you don’t train a model to reason with reinforcement learning, increasing its size no longer provides benefits.”

Many people, myself included, didn't try to build a product around a language model because during the time you would work on a business-specific dataset, a larger generalist model will be released that will be as good for your business tasks as your smaller specialized model.…

— Andriy Burkov (@burkov) April 6, 2025

This observation aligns with growing discussions about potential limitations in scaling up traditional AI model architectures without incorporating newer techniques, such as simulated reasoning or developing smaller, purpose-built models.

Despite current drawbacks, there remains optimism about future iterations in the Llama 4 family. Willison expressed hope for “a whole family of Llama 4 models at varying sizes,” particularly an improved smaller model that could run effectively on mobile phones.

No doubt that the Llama 4 release will serve as a lesson that benchmark scores and marketing claims should be approached with healthy skepticism until verified through independent, real-world testing.

Tags: AILlamaLlama 4MetaMeta AI
SendShare156Tweet98
Caleb Sama

Caleb Sama

Chief Editor. Pineapple on Pizza is absolutely great and let no one convince you otherwise. Pop in at: [email protected] to get in touch with me.

Related Posts

OpenAI

Florida’s OpenAI Lawsuit Has a Lesson for Kenya’s AI Bill

June 3, 2026
Nvidia RTX laptop

Nvidia Wants to Sell You a PC Again

June 2, 2026
Meta One Account

Meta to Merge Facebook, Instagram and Threads Logins Into One Account

June 2, 2026
Anthropic

Claude Maker Anthropic Files for IPO, Joins AI Lab Race to Go Public

June 2, 2026
Meta Just Put Your WhatsApp, Instagram, and Facebook Behind a Paywall

Meta Just Put Your WhatsApp, Instagram, and Facebook Behind a Paywall

May 29, 2026
KCSE Results KNEC

KNEC Wants to Make National Exams Paperless

May 29, 2026

Latest

OpenAI

Florida’s OpenAI Lawsuit Has a Lesson for Kenya’s AI Bill

June 3, 2026
Influencers

Kenya Might Need to Crack Down on Wealth Porn Like China

June 3, 2026
Nairobi Railways

Electric Trains Set to Replace Nairobi’s Aging Diesel Rail System Under KES 65B Plan

June 3, 2026
PayPal

PayPal Is Freezing Kenyan Accounts Amid Anti-Money Laundering Scrutiny

June 3, 2026
Nvidia RTX laptop

Nvidia Wants to Sell You a PC Again

June 2, 2026
World Cup 2026

How Technology and New Rule Changes Will Influence the Upcoming World Cup 2026

June 2, 2026

Best devices

Best Infinix Phones of 2025

Best Infinix Phones of 2025: Budget Prices With Premium Features

December 31, 2025

The Best Infinix Accessories Worth Buying in 2025

November 26, 2025

Best Budget Wireless Earbuds To Buy in Kenya (2025)

October 8, 2025

Samsung Galaxy A36 5G vs Samsung Galaxy A56 5G: Comparison Review

August 29, 2025

Infinix Hot 60 Pro+ vs Infinix Hot 60i: Comparison Review

August 22, 2025

Best Budget Smartwatches To Buy in Kenya 2025

February 13, 2025

Techweez is where tomorrow’s tech stories break today, thanks to intelligent analysis, real-world insight, and visionary storytelling.

Follow Us

Editorials

Kenya Might Need to Crack Down on Wealth Porn Like China

Techweez and Gearhaus Score BAKE Awards 2026 Nominations

Death by AI: Opportunities That Were Disrupted by Automation

CBK Approved 200+ Digital Lenders, But That’s Not the Real Story

Data Centers, Petrodollars and the Price of Building the AI Age

The Standardization of the USB-C Port: What It Means for Users

More News

Meta to Merge Facebook, Instagram and Threads Logins Into One Account

Claude Maker Anthropic Files for IPO, Joins AI Lab Race to Go Public

Viral Notice Claiming Bolt Kenya Shutdown Officially Declared Fake

M-Pesa Transaction Looming Cost Hike Explained: The 33.4% Effective Tax Under Finance Bill 2026

Meta Just Put Your WhatsApp, Instagram, and Facebook Behind a Paywall

How to Watch UEFA Champions League: TV Broadcast and Online Live Streams

  • Terms Of Use
  • Techweez Brand
  • Privacy & Policy
  • Contact Us

© 2024 Techweez - Palahala Media Group may earn a commission when you buy through links on our sites.
A Palahala Media Group Brand. All rights reserved.
.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Techweez | Tech News, Reviews, Deals, Tips and How To
Crunchy Cookies 🍪 Ahead!

Hey there! Just a heads-up: we're big fans of cookies - both the digital and edible kind! 🍪 We use our cookies and some from third parties to ensure your browsing experience on our site is smooth sailing and secure.

 

But wait, there's more! We also use cookies to gather stats and insights on how you navigate our site. It's like getting a behind-the-scenes peek at your digital adventures!

 

Don't worry, you're in control. You can adjust your cookie settings anytime to suit your preferences. Feeling curious? Dive into our Privacy Policy for all the juicy details. Happy browsing! 🚀

Functional Always active
Listen, this legal stuff is about as exciting as watching paint dry. But it basically says we only use your stuff for what you asked us to do, and nobody else gets to peek!
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
It's those sneaky cookie crumbs websites leave behind to count visitors, like counting ants at a picnic! Totally harmless, just for fun facts. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
Hey there! Just letting you know we use some fancy gizmos to remember your preferences. This way, we can show you ads that are, well, not completely bananas.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
Make cookies
{title} {title} {title}
Techweez | Tech News, Reviews, Deals, Tips and How To
Crunchy Cookies 🍪 Ahead!
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
Listen, this legal stuff is about as exciting as watching paint dry. But it basically says we only use your stuff for what you asked us to do, and nobody else gets to peek!
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
It's those sneaky cookie crumbs websites leave behind to count visitors, like counting ants at a picnic! Totally harmless, just for fun facts. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
Hey there! Just letting you know we use some fancy gizmos to remember your preferences. This way, we can show you ads that are, well, not completely bananas.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
Make cookies
{title} {title} {title}
No Result
View All Result
  • News
  • Reviews
  • Features
  • Editorial
  • Automotive
  • Entertainment

© 2024 Techweez - Palahala Media Group may earn a commission when you buy through links on our sites.
A Palahala Media Group Brand. All rights reserved.
.