Google’s AI research division, DeepMind, has unveiled a new artificial intelligence system named AlphaEvolve, designed specifically to solve math and science problems that have clearly defined, machine-gradeable solutions.
According to DeepMind, AlphaEvolve is capable of assisting in optimizing internal tasks at Google, including improving the infrastructure that powers AI model training. Currently, the team is developing an accessible user interface for AlphaEvolve and preparing to launch an early access program targeted toward selected academic researchers. A broader public release might follow this initial phase.
AI models commonly face the challenge of generating inaccurate or hallucinated outputs due to inherent probabilistic components in their operations. In recent AI developments, this tendency toward hallucination has reportedly worsened, with models like OpenAI’s “o3” displaying a notably higher frequency of inaccuracies compared to earlier generations.
AlphaEvolve addresses the issue of hallucinations through an automated evaluation process, incorporating three distinct steps—generation, critical review, and evaluation to converge upon accurate solutions. This methodology isn’t entirely novel—similar techniques have been explored within DeepMind itself and other research groups for tackling challenging mathematical problems—but DeepMind claims AlphaEvolve’s effectiveness surpasses earlier approaches largely because it integrates Google’s latest Gemini-generation AI models.
Using AlphaEvolve involves specifying a particular problem to the system, ideally accompanied by instructions, formulas, code snippets, or pertinent reference materials. Additionally, users must provide a method or formula enabling AlphaEvolve to self-assess its results automatically.
This requirement inherently limits AlphaEvolve’s applicability to clearly defined, numerical style problems predominantly found in areas like computer science, optimization, and mathematical domains. AlphaEvolve outputs solutions exclusively in algorithmic terms, making it unsuitable for problems demanding descriptive or qualitative solutions.
DeepMind tested AlphaEvolve by assigning the system approximately fifty diverse mathematical challenges spanning multiple mathematical fields, such as geometry and combinatorics. According to their published results, AlphaEvolve successfully rediscovered the most optimal known solutions about 75 percent of the time and even identified improved solutions in roughly 20 percent of cases.
Real-world applications were also part of DeepMind’s evaluation: AlphaEvolve produced an algorithm improving the efficiency of Google’s global computing resources by approximately 0.7 percent. It also suggested changes leading to a reduction of roughly 1 percent in the training time needed for Google’s Gemini series of AI models. Yet the company acknowledged these enhancements weren’t groundbreaking or entirely unprecedented. For example, an apparently novel design optimization for Google’s TPU accelerator chip recommended by AlphaEvolve had been previously identified through other techniques.
Despite not presenting profound breakthroughs, DeepMind highlighted AlphaEvolve’s potential to streamline technical tasks, allowing human experts more freedom to concentrate their efforts on higher-level problem-solving and innovation.