Google’s latest artificial intelligence model, Gemini 2.5 Flash, scores lower on certain safety metrics compared to its predecessor, Gemini 2.0 Flash, according to a recent internal report published by the company. Specifically, Gemini 2.5 Flash showed declines of 4.1% in “text-to-text safety” and 9.6% in “image-to-text safety.”
Text-to-text safety measures a model’s likelihood of producing outputs that conflict with Google’s safety guidelines when given textual prompts. Image-to-text safety assesses how consistently the model adheres to safety standards when responding to image-based prompts. Both metrics are evaluated using automated testing methods, rather than human oversight.
The company acknowledged these findings in a statement, confirming that Gemini 2.5 Flash indeed performed worse in both categories compared to Gemini 2.0 Flash. According to Google’s technical report, the newer model’s decreased performance in safety tests coincides with its increased ability to follow user instructions, even when those instructions potentially cross into problematic areas.
Google attributed part of the regression to false-positive results, though it admitted that Gemini 2.5 Flash occasionally generates content violating safety protocols when explicitly prompted. The report addressed the inherent tension between accommodating user requests on sensitive topics and maintaining strict compliance with safety policies.
These findings arise at a time when many AI developers, including Meta and OpenAI, are intentionally shifting their models to be more accommodating towards controversial and sensitive topics, adopting what some critics view as a more permissive approach. Meta, in particular, recently stated it had adjusted its latest line of Llama AI models to avoid endorsing one view over another and to more readily engage with politically contentious prompts. Earlier this year, OpenAI similarly announced intentions to let its models respond with multiple perspectives on sensitive issues.
However, these permissive adjustments are not without risks. TechCrunch previously reported on OpenAI’s recent experience where the default ChatGPT model enabled minors to generate inappropriate erotic dialogues due to what the company described as a “bug.”
Testing performed independently by TechCrunch via AI platform OpenRouter revealed that Google’s Gemini 2.5 Flash model readily produced essays supporting extreme measures like replacing human judges with AI systems, undermining due process protections, and instituting broad surveillance programs without warrants.
Thomas Woodside, co-founder of the Secure AI Project, criticized the limited transparency in Google’s safety evaluation. He noted that Google’s sparse reporting on the specifics of incidents makes independent assessments challenging, emphasizing the importance of transparency for external scrutiny and safety assurance.
Google has faced criticism previously for inconsistent and opaque practices in disclosing model safety issues. Earlier this year, the company waited several weeks before releasing safety assessments for Gemini 2.5 Pro, its most advanced AI model, initially leaving out crucial details regarding safety performance. Following industry backlash, Google published an updated and more detailed report on Gemini 2.5 Pro’s safety evaluations shortly afterward.