A senior executive from Meta firmly denied allegations on Monday that the company deliberately trained its latest AI models to achieve misleadingly high results on benchmark tests.
Ahmad Al-Dahle, Meta’s Vice President of Generative AI, took to social media to address rumors circulating online, categorically stating that claims Meta trained its Llama 4 Maverick and Llama 4 Scout models on specific benchmark test sets were “simply not true.” Such training practices, if they occurred, could artificially inflate performance scores, giving an inaccurate impression of a model’s real capabilities.
Over the weekend, unverified posts emerged on social media alleging the tech giant intentionally boosted benchmark results by training its AI systems on data sets reserved strictly for evaluating model performance. The rumors trace back to a message on a Chinese social platform from an unidentified individual claiming to have quit their position at Meta in protest over questionable benchmarking methods.
Adding to public skepticism, some users pointed out marked differences in the performance of Meta’s models depending on the platforms hosting them. Critics specifically noted stark discrepancies between the publicly downloadable Maverick model and another experimental version hosted on LM Arena, an AI benchmarking platform widely used in the industry. Posts circulating on social media have also detailed apparent shortcomings in Maverick and Scout models on various tasks, which further fueled suspicions.
Addressing these issues, Al-Dahle acknowledged that users have experienced “mixed quality” results when accessing Maverick and Scout via external cloud hosting services. He explained these inconsistencies as interim problems, noting the public rollout took place immediately upon completion, with ongoing fine-tuning processes expected to resolve current disparities.
“We released the models as soon as they were ready, and we anticipate it will take a few days for all implementations across providers to be correctly calibrated,” Al-Dahle stated, promising continued oversight of bug fixes and the onboarding process for hosting partners.