A model code-named “Strawberry”, part of OpenAI’s newly-released O1 product, introduces a structured planning module designed to enhance how large language models (LLMs) solve problems.
Tailored for Chain-of-Thought (CoT) Reasoning
While the idea of using more computation to search for optimal solutions isn’t new (e.g. this paper “Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models”), OpenAI’s Strawberry Model seems to have been tailored specifically for Chain-of-Thought (CoT) reasoning.
In this approach, models break down complex problems into intermediate steps, leading to significant performance improvements across various STEM and coding benchmarks—often achieving results comparable to those of PhD students.
Currently, LLMs spend the same amount of computation time on both simple and complex problems. Earlier strategies, like CoT prompting, aimed to guide the model to spend more time thinking while writing out each part of a solution. Strawberry takes this a step further by planning multiple solution paths and actively exploring alternatives to find the best possible answer. This approach shifts from simply running a model to output a variable-quality result based on a prompt to an active process searching for the best solution.
Challenges in Decision-Making and the Future of AI
That said, knowing “what is best”, requires judgment and reasoning. While Strawberry improves the exploration of potential solutions, models like GPT-4o and GPT-4o-mini likely still handle individual steps in the broader O1 product. These models continue to struggle with basic reasoning tasks—such as the oversimplified version of the “Farmer, Wolf, Goat and Cabbage” problem—where optimal solutions are often missed. In this example, we had to modify the story to prevent the model answer from its trained weights. Observe that, Instead of choosing the obvious strategy of transporting everyone at once, the model gets trapped due to being conditioned to solve the problem over several trips.
Though exploration capabilities are advancing, LLMs’ decision-making abilities still have room for growth, particularly when it comes to avoiding common pitfalls or dead ends in problem solving. Additionally, these advancements in AI problem-solving come with increased computational costs, leading to usage limits on the O1 models. The O1-Preview model allows ChatGPT Pro subscribers 30 messages per week, while the O1-Mini model permits 50 messages, reflecting the compute-intensive nature of the complex planning algorithms that assess multiple paths to find solutions.
This measured approach shows that OpenAI is playing the long game here—gradually enhancing the problem-solving ability of its models without pushing them into tasks they aren’t equipped to handle. By incorporating more complex planning and CoT processes, O1 and Strawberry are positioned to shoulder more of the problem-solving burden autonomously.
The introduction of these advanced models raises an important question for the AI industry: How much added value will business customers derive from these new capabilities, and how will this be reflected in their ability and willingness to pay?
Takeaways
While shortcomings remain, we are excited about the progress and are looking forward to applying these new capabilities in our internal projects and our client solutions. Each iteration brings us closer to AI models that not only generate human-like text but also solve complex problems with greater autonomy. As the AI industry continues to evolve, the value derived from these advanced capabilities will be critical in shaping the future of AI-driven problem-solving Stay tuned as we explore how these advancements will shape the future of AI-driven problem-solving.
Contact Us
Ready to transform your business processes with AI? Contact us today!