• Latest
  • All
  • How To
Meta's Llama 4

Meta’s Llama 4 Launch Exposed for Manipulating AI Benchmark Scores

April 8, 2025
Infinix Smart 10 HD

Infinix Launches the Smart 10 Series Lineup

June 16, 2025
Infinix Hot 60 Pro+

Infinix Wants to Fit a 5,160 mAh Battery in the Slim Hot 60 Pro+ Frame

June 16, 2025
Aigov

U.S. Plans to Launch AI Hub for Government Agencies

June 16, 2025
multichoice-dstv-showmax

Multichoice Subscriber Numbers Drop by 15% As Kenyans Cut Back on Spending

June 16, 2025
DHgate Tablet Cases deals
Huawei-Watch

Huawei Surpasses Apple to Lead Global Wearables Market

June 12, 2025
Kenya-KICTANet-MindHYVE-ai-

Kenya Partners with US AI Firms to Co-Create National AI Policy with KICTANet

June 12, 2025
youtube-apple-ai

Creators Can Now Promote Videos Without Leaving YouTube Studio

June 12, 2025
Roam

Roam Launches New Electric Bike Built with Boda Boda Riders in Mind

June 11, 2025
Android 16

Google Releases Android 16 Early with Powerful New Tools

June 11, 2025
Social media surveillance

New Proposal to Give DCI KES 150M to Track Social Media Users

June 11, 2025
Snapchat

Snapchat Launches a Messaging App for Apple Watch

June 11, 2025
2025 Afrilabs Annual Gathering

AfriLabs Annual Event Returns to Nairobi With Big Plans for Tech Scene

June 10, 2025
Techweez | Tech News, Reviews, Deals, Tips and How To
  • News
  • Entertainment
  • Reviews
  • Features
  • Editorial
No Result
View All Result
Techweez | Tech News, Reviews, Deals, Tips and How To
  • News
  • Entertainment
  • Reviews
  • Features
  • Editorial
No Result
View All Result
Techweez | Tech News, Reviews, Deals, Tips and How To
No Result
View All Result

Meta’s Llama 4 Launch Exposed for Manipulating AI Benchmark Scores

Caleb Sama by Caleb Sama
April 8, 2025
in News
Reading Time: 4 mins read
271
0
Meta's Llama 4

Meta’s surprise weekend launch of its new Llama 4 AI models has quickly become a case study in the growing tensions between AI marketing claims and real-world performance.

The company released two new models—Scout and Maverick—on Saturday, positioning them as serious challengers to industry leaders like OpenAI’s GPT-4o and Google’s Gemini models.

Shortly after release, Maverick secured the second-place position on LMArena, a respected benchmark site where humans compare outputs from different AI systems. Meta proudly highlighted Maverick’s impressive ELO score of 1417, placing it above OpenAI‘s GPT-4o and just below Google’s Gemini 2.5 Pro.

However, this achievement quickly unraveled when AI researchers discovered fine print in Meta’s documentation revealing that the version tested on LMArena wasn’t the same as what’s available to the public. Meta had deployed an “experimental chat version” of Maverick specifically “optimized for conversationality” for benchmark testing.

“Meta’s interpretation of our policy did not match what we expect from model providers,” LMArena posted on X two days after the model’s release. The site has since updated its leaderboard policies to prevent similar situations in the future.

While not explicitly against LMArena’s rules, this approach undermines the value of benchmark rankings as indicators of real-world performance. As independent AI researcher Simon Willison told The Verge, “The model score that we got there is completely worthless to me. I can’t even use the model that they got a high score on.”

Meta’s Technical Architecture and Claims

Meta describes the new Llama 4 models as “natively multimodal,” built to handle both text and images using an “early fusion” technique. Both models use a mixture-of-experts (MoE) architecture as follows:

Maverick: 400 billion total parameters, with only 17 billion active at once across one of 128 experts

Scout: 109 billion total parameters, with only 17 billion active at once across one of 16 experts

This architecture allows the models to function with fewer computational resources since only portions of the neural network are active simultaneously (we know—it’s very technical).

Meta made particularly bold claims about Scout’s 10-million-token context window—a feature that would theoretically allow the model to process huge documents and maintain longer conversations. However, developers quickly found that using even a fraction of this capacity proved challenging due to memory limitations.

According to Willison’s testing, third-party services providing access to Scout limited its context to between 128,000 and 328,000 tokens. Meta’s own example notebook revealed that running a 1.4 million token context requires eight high-end Nvidia H100 GPUs—hardware that costs hundreds of thousands of dollars.

The Community’s Response to This

The AI community’s response to Llama 4 has been lukewarm at best. Developers have reported underwhelming performance, especially for coding tasks and software development. Some users noted that Llama 4 compares unfavorably to innovative competitors like DeepSeek.

When tested with a lengthy document of around 20,000 tokens, Scout produced what Willison described as “complete junk output,” which devolved into repetitive loops, raising questions about the practical usefulness of its massive context window.

Meta has also continued to market Llama 4 as “open source” despite licensing restrictions that prevent truly open use. In reality, users must sign in and accept license terms before downloading the models.

Furthermore, the weekend release timing caused a stir in the AI community. When questioned about this unusual schedule on Threads, Meta CEO Mark Zuckerberg simply replied, “That’s when it was ready.”

According to a report from The Information, Meta repeatedly delayed Llama 4’s launch due to the model failing to meet internal expectations. These expectations were awfully high following the successful release of an open-weight model from DeepSeek, a Chinese AI startup.

Llama 4’s Implications for AI Development

Some researchers suggest that the underwhelming performance of Llama 4 points to larger issues in AI development approaches.

On X, researcher Andriy Burkov argued that recent disappointing releases from both Meta and OpenAI “have shown that if you don’t train a model to reason with reinforcement learning, increasing its size no longer provides benefits.”

Many people, myself included, didn't try to build a product around a language model because during the time you would work on a business-specific dataset, a larger generalist model will be released that will be as good for your business tasks as your smaller specialized model.…

— Andriy Burkov (@burkov) April 6, 2025

This observation aligns with growing discussions about potential limitations in scaling up traditional AI model architectures without incorporating newer techniques, such as simulated reasoning or developing smaller, purpose-built models.

Despite current drawbacks, there remains optimism about future iterations in the Llama 4 family. Willison expressed hope for “a whole family of Llama 4 models at varying sizes,” particularly an improved smaller model that could run effectively on mobile phones.

No doubt that the Llama 4 release will serve as a lesson that benchmark scores and marketing claims should be approached with healthy skepticism until verified through independent, real-world testing.

Tags: AILlamaLlama 4MetaMeta AI
SendShare152Tweet95
Caleb Sama

Caleb Sama

Friendly neighborhood films, games, and tech reviewer. Expect dad jokes - lots of dad jokes.

Related Posts

Aigov

U.S. Plans to Launch AI Hub for Government Agencies

June 16, 2025
Kenya-KICTANet-MindHYVE-ai-

Kenya Partners with US AI Firms to Co-Create National AI Policy with KICTANet

June 12, 2025
2025 Afrilabs Annual Gathering

AfriLabs Annual Event Returns to Nairobi With Big Plans for Tech Scene

June 10, 2025
WhatsApp auto-download media quality

You Will Soon Be Able to Pick Download Quality on WhatsApp

June 10, 2025
Apple-WWDC25-iOS-26

Apple Unveils iOS 26: A Redesign with Liquid Glass and Integrated AI

June 10, 2025
Sora

Microsoft Brings Sora AI Video Creator to Bing App

June 5, 2025

Latest

Infinix Smart 10 HD

Infinix Launches the Smart 10 Series Lineup

June 16, 2025
Infinix Hot 60 Pro+

Infinix Wants to Fit a 5,160 mAh Battery in the Slim Hot 60 Pro+ Frame

June 16, 2025
Aigov

U.S. Plans to Launch AI Hub for Government Agencies

June 16, 2025
multichoice-dstv-showmax

Multichoice Subscriber Numbers Drop by 15% As Kenyans Cut Back on Spending

June 16, 2025
Huawei-Watch

Huawei Surpasses Apple to Lead Global Wearables Market

June 12, 2025
Kenya-KICTANet-MindHYVE-ai-

Kenya Partners with US AI Firms to Co-Create National AI Policy with KICTANet

June 12, 2025

Best devices

budget smartwatches 2025

Best Budget Smartwatches To Buy in Kenya 2025

February 13, 2025

Best Infinix Smartphones To Buy in Kenya 2024

February 13, 2025

Best Laptops for Battery Life in 2024

August 21, 2024

Best “Battery Warrior” Smartphones To Buy in 2024

August 22, 2024

Infinix Launches the Smart 10 Series Lineup

June 16, 2025

Infinix Wants to Fit a 5,160 mAh Battery in the Slim Hot 60 Pro+ Frame

June 16, 2025

Techweez is a fast growing influential source of technology news, reviews and analysis by leading tech geeks in the industry.

Follow Us

Editorials

Actors and Film Crews Are Worried About Veo 3 Taking Their Jobs

Samsung QLED TVs Now Officially Certified for Real Quantum Dot Technology

Trump’s Tariffs Will Be the End of Affordable Tech

5 Ways to Prep Your Tech for Resale

The Weaponization of PDFs: How Cybercriminals Are Exploiting a Trusted Format

Introducing A Brainbox Quiz: Techweez’s Monthly Trivia Night!

More News

Creators Can Now Promote Videos Without Leaving YouTube Studio

Roam Launches New Electric Bike Built with Boda Boda Riders in Mind

Google Releases Android 16 Early with Powerful New Tools

New Proposal to Give DCI KES 150M to Track Social Media Users

Snapchat Launches a Messaging App for Apple Watch

AfriLabs Annual Event Returns to Nairobi With Big Plans for Tech Scene

  • Terms Of Use
  • Techweez Brand
  • Privacy & Policy
  • Contact Us

© 2024 Techweez - Palahala Media Group may earn a commission when you buy through links on our sites.
A Palahala Media Group Brand. All rights reserved.
.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Techweez | Tech News, Reviews, Deals, Tips and How To
Crunchy Cookies 🍪 Ahead!

Hey there! Just a heads-up: we're big fans of cookies - both the digital and edible kind! 🍪 We use our cookies and some from third parties to ensure your browsing experience on our site is smooth sailing and secure.

 

But wait, there's more! We also use cookies to gather stats and insights on how you navigate our site. It's like getting a behind-the-scenes peek at your digital adventures!

 

Don't worry, you're in control. You can adjust your cookie settings anytime to suit your preferences. Feeling curious? Dive into our Privacy Policy for all the juicy details. Happy browsing! 🚀

Functional Always active
Listen, this legal stuff is about as exciting as watching paint dry. But it basically says we only use your stuff for what you asked us to do, and nobody else gets to peek!
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
It's those sneaky cookie crumbs websites leave behind to count visitors, like counting ants at a picnic! Totally harmless, just for fun facts. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
Hey there! Just letting you know we use some fancy gizmos to remember your preferences. This way, we can show you ads that are, well, not completely bananas.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
Make cookies
{title} {title} {title}
Techweez | Tech News, Reviews, Deals, Tips and How To
Crunchy Cookies 🍪 Ahead!
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
Listen, this legal stuff is about as exciting as watching paint dry. But it basically says we only use your stuff for what you asked us to do, and nobody else gets to peek!
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
It's those sneaky cookie crumbs websites leave behind to count visitors, like counting ants at a picnic! Totally harmless, just for fun facts. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
Hey there! Just letting you know we use some fancy gizmos to remember your preferences. This way, we can show you ads that are, well, not completely bananas.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
Make cookies
{title} {title} {title}
No Result
View All Result
  • News
  • Reviews
  • Features
  • Editorial
  • Automotive
  • Entertainment

© 2024 Techweez - Palahala Media Group may earn a commission when you buy through links on our sites.
A Palahala Media Group Brand. All rights reserved.
.