KEY HIGHLIGHTS
- DeepMind’s Milestone: DeepMind, Google’s AI research unit, achieves a milestone by using a large language model-based chatbot with a fact-checker to crack an unsolvable math problem.
- Filtering Reliable Outputs: The language model generates millions of responses but only submits those verified as accurate by a fact-checking layer, marking a departure from previous AI models designed specifically for tasks.
- Risk of Hallucinations in LLMs: Large Language Models (LLMs), like OpenAI’s GPT-4 and Google’s Gemini, are susceptible to producing false outputs known as “hallucinations,” highlighting a challenge in their application.
- FunSearch Model and Fact-Checking Layer: DeepMind’s FunSearch, a generalized LLM, is equipped with a fact-checking layer specifically designed for math and computer science problems, making it easier to verify outputs rapidly.
- Efficient Problem Solving: FunSearch, despite susceptibility to hallucinations, efficiently solves complex problems in math and computer science by generating and filtering potential solutions through an evaluator layer.
- Real-world Implications: FunSearch’s capabilities extend beyond theoretical problem-solving; it outperforms existing algorithms in the bin-packing problem, hinting at applications in industries like transport and logistics.
DeepMind’s Innovative Solution to Unsolvable Math Problems
Google’s DeepMind AI research unit has announced a significant achievement, claiming to have successfully solved a previously deemed unsolvable math problem. They utilized a large language model-based chatbot named FunSearch, equipped with a fact-checking mechanism to sift through generated responses and ensure accuracy.
This breakthrough is notable as prior achievements by DeepMind involved task-specific AI models tailored for activities like weather prediction or protein design. These models were trained on precise datasets, setting them apart from large language models (LLMs) such as OpenAI’s GPT-4 or Google’s Gemini.
LLMs like GPT-4 and Gemini are trained on extensive and diverse datasets, allowing them versatility in performing various tasks and discussing numerous subjects. However, they are susceptible to “hallucinations,” wherein they produce false outputs, as observed in Gemini’s recent release incorrectly answering basic questions like the Oscars winner.
To address this issue, researchers propose adding a verification layer above the AI model to ensure the accuracy of outputs before reaching users. Establishing such a safety net becomes challenging due to the wide-ranging topics LLMs are trained to discuss.
DeepMind’s approach involves creating the FunSearch LLM, a generalized model based on Google’s PaLM2. The researchers incorporated a fact-checking layer, referred to as an “evaluator,” specifically tailored for solving math and computer science problems by generating computer code. DeepMind claims that focusing on a specific domain makes it easier to rapidly verify the outputs.
Although FunSearch remains prone to hallucinations and potentially generating inaccurate results, the evaluator effectively filters them out, delivering users reliable and verified information. This innovation marks a step forward in addressing the challenges associated with the use of LLMs and enhancing their reliability in problem-solving scenarios.
“We think that perhaps 90% of what the LLM outputs is not going to be useful,” Fawzi said. “Given a candidate solution, it’s very easy for me to tell you whether this is actually a correct solution and to evaluate the solution, but actually coming up with a solution is really hard. And so mathematics and computer science fit particularly well.”
FunSearch: Advancing Large Language Models in Problem Solving
Alhussein Fawzi reveals that FunSearch, the AI breakthrough from Google’s DeepMind, has achieved a groundbreaking milestone for Large Language Models (LLMs). Unlike its predecessors, FunSearch is not only capable of solving complex math problems but also generating new scientific knowledge and ideas, representing a notable advancement in LLM capabilities.
To assess FunSearch’s abilities, researchers presented it with problems along with basic source code solutions. The model then generated a database of new solutions, rigorously verified by the evaluator for accuracy. The most dependable solutions were reintroduced into FunSearch, along with prompts to enhance its ideas. Fawzi explains that this iterative process results in millions of potential solutions converging to yield the most efficient outcome.
In addressing mathematical challenges, FunSearch adopts a unique approach, writing computer code to find solutions instead of directly tackling the problems. One notable task assigned to FunSearch was the cap set problem, involving the identification of patterns in points where no three points form a straight line. Despite the problem’s escalating complexity with an increasing number of points, FunSearch devised a solution involving 512 points across eight dimensions, surpassing any human mathematician’s achievement. The results were documented in the journal Nature.
While the cap set problem may not be encountered by most individuals, its significance lies in being an open question perplexing even the best human mathematicians. Terence Tao, a University of California professor, hails FunSearch as a “promising paradigm” with potential applications across various math problems.
FunSearch’s versatility became evident when tasked with the bin-packing problem, a challenge involving efficient placement of objects into containers. Fawzi reports that FunSearch outperformed existing algorithms designed for this problem, suggesting potential implications for industries like transport and logistics.
Notably, FunSearch distinguishes itself by providing transparency in its output generation process. Unlike other LLMs functioning as “black boxes,” users can observe how FunSearch generates outputs, offering a learning opportunity. This transparency sets FunSearch apart, opening new possibilities for understanding and leveraging the capabilities of large language models.
Source(s): Google DeepMind
The information above is curated from reliable sources, modified for clarity. Slash Insider is not responsible for its completeness or accuracy. Please refer to the original source for the full article. Views expressed are solely those of the original authors and not necessarily of Slash Insider. We strive to deliver reliable articles but encourage readers to verify details independently.