• Latest
  • All
  • How To
Meta's Llama 4

Meta’s Llama 4 Launch Exposed for Manipulating AI Benchmark Scores

April 8, 2025
nairobi barricades

Nairobi Barricades Spark Frustration. New Web App Offers Real-Time Updates

July 7, 2025
Voter Verification Portal Offline Indefinitely for System Maintenance

IEBC Portal Shutdown for Undisclosed Maintenance Period

July 7, 2025
Infinix Note 50 Pro x Java House Collab

Infinix Teams Up with Java House to Launch Note 50 Pro Series

July 4, 2025
Instagram threads

Instagram Threads Rolls Out Direct Messaging in Major Feature Update

July 4, 2025
DHgate Tablet Cases deals
Infinix Hot 60i Pre-order

Infinix Hot 60i Now Available for Pre-Order in Kenya

July 4, 2025
Windows

Microsoft’s Windows Usage Slumps as Macs, Linux, and Mobile Take Over

July 4, 2025
Microsoft

Microsoft Slashes 9,100 Jobs in Biggest Layoffs Since 2023

July 4, 2025
NASA

NASA Brings Space Missions to Netflix at No Extra Cost

July 3, 2025
ios 26-facetime

iOS 26 May Pause Video if You Start Undressing on FaceTime

July 3, 2025
apple iphone foldable

Apple Begins Testing Its First Foldable iPhone

July 3, 2025
Mobile Data

Kenya Draws More Tourists as Mobile Data Roaming Soars

July 3, 2025
Axian Telecom Considers Full Acquisition of Jumia

Axian Telecom Is Plotting a Full Buyout of Jumia

July 3, 2025
Techweez | Tech News, Reviews, Deals, Tips and How To
  • News
  • Entertainment
  • Reviews
  • Features
  • Editorial
No Result
View All Result
Techweez | Tech News, Reviews, Deals, Tips and How To
  • News
  • Entertainment
  • Reviews
  • Features
  • Editorial
No Result
View All Result
Techweez | Tech News, Reviews, Deals, Tips and How To
No Result
View All Result

Meta’s Llama 4 Launch Exposed for Manipulating AI Benchmark Scores

Caleb Sama by Caleb Sama
April 8, 2025
in News
Reading Time: 4 mins read
271
0
Meta's Llama 4

Meta’s surprise weekend launch of its new Llama 4 AI models has quickly become a case study in the growing tensions between AI marketing claims and real-world performance.

The company released two new models—Scout and Maverick—on Saturday, positioning them as serious challengers to industry leaders like OpenAI’s GPT-4o and Google’s Gemini models.

Shortly after release, Maverick secured the second-place position on LMArena, a respected benchmark site where humans compare outputs from different AI systems. Meta proudly highlighted Maverick’s impressive ELO score of 1417, placing it above OpenAI‘s GPT-4o and just below Google’s Gemini 2.5 Pro.

However, this achievement quickly unraveled when AI researchers discovered fine print in Meta’s documentation revealing that the version tested on LMArena wasn’t the same as what’s available to the public. Meta had deployed an “experimental chat version” of Maverick specifically “optimized for conversationality” for benchmark testing.

“Meta’s interpretation of our policy did not match what we expect from model providers,” LMArena posted on X two days after the model’s release. The site has since updated its leaderboard policies to prevent similar situations in the future.

While not explicitly against LMArena’s rules, this approach undermines the value of benchmark rankings as indicators of real-world performance. As independent AI researcher Simon Willison told The Verge, “The model score that we got there is completely worthless to me. I can’t even use the model that they got a high score on.”

Meta’s Technical Architecture and Claims

Meta describes the new Llama 4 models as “natively multimodal,” built to handle both text and images using an “early fusion” technique. Both models use a mixture-of-experts (MoE) architecture as follows:

Maverick: 400 billion total parameters, with only 17 billion active at once across one of 128 experts

Scout: 109 billion total parameters, with only 17 billion active at once across one of 16 experts

This architecture allows the models to function with fewer computational resources since only portions of the neural network are active simultaneously (we know—it’s very technical).

Meta made particularly bold claims about Scout’s 10-million-token context window—a feature that would theoretically allow the model to process huge documents and maintain longer conversations. However, developers quickly found that using even a fraction of this capacity proved challenging due to memory limitations.

According to Willison’s testing, third-party services providing access to Scout limited its context to between 128,000 and 328,000 tokens. Meta’s own example notebook revealed that running a 1.4 million token context requires eight high-end Nvidia H100 GPUs—hardware that costs hundreds of thousands of dollars.

The Community’s Response to This

The AI community’s response to Llama 4 has been lukewarm at best. Developers have reported underwhelming performance, especially for coding tasks and software development. Some users noted that Llama 4 compares unfavorably to innovative competitors like DeepSeek.

When tested with a lengthy document of around 20,000 tokens, Scout produced what Willison described as “complete junk output,” which devolved into repetitive loops, raising questions about the practical usefulness of its massive context window.

Meta has also continued to market Llama 4 as “open source” despite licensing restrictions that prevent truly open use. In reality, users must sign in and accept license terms before downloading the models.

Furthermore, the weekend release timing caused a stir in the AI community. When questioned about this unusual schedule on Threads, Meta CEO Mark Zuckerberg simply replied, “That’s when it was ready.”

According to a report from The Information, Meta repeatedly delayed Llama 4’s launch due to the model failing to meet internal expectations. These expectations were awfully high following the successful release of an open-weight model from DeepSeek, a Chinese AI startup.

Llama 4’s Implications for AI Development

Some researchers suggest that the underwhelming performance of Llama 4 points to larger issues in AI development approaches.

On X, researcher Andriy Burkov argued that recent disappointing releases from both Meta and OpenAI “have shown that if you don’t train a model to reason with reinforcement learning, increasing its size no longer provides benefits.”

Many people, myself included, didn't try to build a product around a language model because during the time you would work on a business-specific dataset, a larger generalist model will be released that will be as good for your business tasks as your smaller specialized model.…

— Andriy Burkov (@burkov) April 6, 2025

This observation aligns with growing discussions about potential limitations in scaling up traditional AI model architectures without incorporating newer techniques, such as simulated reasoning or developing smaller, purpose-built models.

Despite current drawbacks, there remains optimism about future iterations in the Llama 4 family. Willison expressed hope for “a whole family of Llama 4 models at varying sizes,” particularly an improved smaller model that could run effectively on mobile phones.

No doubt that the Llama 4 release will serve as a lesson that benchmark scores and marketing claims should be approached with healthy skepticism until verified through independent, real-world testing.

Tags: AILlamaLlama 4MetaMeta AI
SendShare152Tweet95
Caleb Sama

Caleb Sama

Friendly neighborhood films, games, and tech reviewer. Expect dad jokes - lots of dad jokes.

Related Posts

Microsoft

Microsoft Slashes 9,100 Jobs in Biggest Layoffs Since 2023

July 4, 2025
Premier League and Microsoft Announce Strategic AI Partnership for Enhanced Fan Engagement

Football Meets AI as Microsoft Joins Forces with the EPL

July 3, 2025
DeepSeek_vs_ChatGPT

Hackers Build Malware That Tries to Reprogram AI Security Tools

July 1, 2025
google gmail

Google Rolls Out AI Cleanup Tool on Web for Messy Gmail Inboxes

June 30, 2025
Facebook

New Facebook AI Tool Can Access Private Images on Your Phone

June 30, 2025
xiaomi_ai_glasses

Xiaomi Enters Smart Glasses Market with Its First AI-Powered Pair

June 27, 2025

Latest

nairobi barricades

Nairobi Barricades Spark Frustration. New Web App Offers Real-Time Updates

July 7, 2025
Voter Verification Portal Offline Indefinitely for System Maintenance

IEBC Portal Shutdown for Undisclosed Maintenance Period

July 7, 2025
Infinix Note 50 Pro x Java House Collab

Infinix Teams Up with Java House to Launch Note 50 Pro Series

July 4, 2025
Instagram threads

Instagram Threads Rolls Out Direct Messaging in Major Feature Update

July 4, 2025
Infinix Hot 60i Pre-order

Infinix Hot 60i Now Available for Pre-Order in Kenya

July 4, 2025
Windows

Microsoft’s Windows Usage Slumps as Macs, Linux, and Mobile Take Over

July 4, 2025

Best devices

budget smartwatches 2025

Best Budget Smartwatches To Buy in Kenya 2025

February 13, 2025

Best Infinix Smartphones To Buy in Kenya 2024

February 13, 2025

Best Laptops for Battery Life in 2024

August 21, 2024

Best “Battery Warrior” Smartphones To Buy in 2024

August 22, 2024

Nairobi Barricades Spark Frustration. New Web App Offers Real-Time Updates

July 7, 2025

IEBC Portal Shutdown for Undisclosed Maintenance Period

July 7, 2025

Techweez is a fast growing influential source of technology news, reviews and analysis by leading tech geeks in the industry.

Follow Us

Editorials

Abductions and Arrests! Kenyan Government’s Fear and Hate of X Users Makes No Sense

Actors and Film Crews Are Worried About Veo 3 Taking Their Jobs

Samsung QLED TVs Now Officially Certified for Real Quantum Dot Technology

Trump’s Tariffs Will Be the End of Affordable Tech

5 Ways to Prep Your Tech for Resale

The Weaponization of PDFs: How Cybercriminals Are Exploiting a Trusted Format

More News

Microsoft Slashes 9,100 Jobs in Biggest Layoffs Since 2023

NASA Brings Space Missions to Netflix at No Extra Cost

iOS 26 May Pause Video if You Start Undressing on FaceTime

Apple Begins Testing Its First Foldable iPhone

Kenya Draws More Tourists as Mobile Data Roaming Soars

Axian Telecom Is Plotting a Full Buyout of Jumia

  • Terms Of Use
  • Techweez Brand
  • Privacy & Policy
  • Contact Us

© 2024 Techweez - Palahala Media Group may earn a commission when you buy through links on our sites.
A Palahala Media Group Brand. All rights reserved.
.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Add New Playlist

Techweez | Tech News, Reviews, Deals, Tips and How To
Crunchy Cookies 🍪 Ahead!

Hey there! Just a heads-up: we're big fans of cookies - both the digital and edible kind! 🍪 We use our cookies and some from third parties to ensure your browsing experience on our site is smooth sailing and secure.

 

But wait, there's more! We also use cookies to gather stats and insights on how you navigate our site. It's like getting a behind-the-scenes peek at your digital adventures!

 

Don't worry, you're in control. You can adjust your cookie settings anytime to suit your preferences. Feeling curious? Dive into our Privacy Policy for all the juicy details. Happy browsing! 🚀

Functional Always active
Listen, this legal stuff is about as exciting as watching paint dry. But it basically says we only use your stuff for what you asked us to do, and nobody else gets to peek!
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
It's those sneaky cookie crumbs websites leave behind to count visitors, like counting ants at a picnic! Totally harmless, just for fun facts. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
Hey there! Just letting you know we use some fancy gizmos to remember your preferences. This way, we can show you ads that are, well, not completely bananas.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
Make cookies
{title} {title} {title}
Techweez | Tech News, Reviews, Deals, Tips and How To
Crunchy Cookies 🍪 Ahead!
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
Listen, this legal stuff is about as exciting as watching paint dry. But it basically says we only use your stuff for what you asked us to do, and nobody else gets to peek!
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
It's those sneaky cookie crumbs websites leave behind to count visitors, like counting ants at a picnic! Totally harmless, just for fun facts. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
Hey there! Just letting you know we use some fancy gizmos to remember your preferences. This way, we can show you ads that are, well, not completely bananas.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
Make cookies
{title} {title} {title}
No Result
View All Result
  • News
  • Reviews
  • Features
  • Editorial
  • Automotive
  • Entertainment

© 2024 Techweez - Palahala Media Group may earn a commission when you buy through links on our sites.
A Palahala Media Group Brand. All rights reserved.
.