The distinction between conventional models and reasoning models can be likened to the two styles of thinking outlined by Nobel laureate economist Daniel Kahneman in his 2011 book Thinking Fast and Slow: the quick and instinctual System-1 thinking and the slower, more analytical System-2 thinking.
The type of model that enabled the creation of ChatGPT is known as a large language model (LLM), which delivers immediate responses to prompts by accessing a vast neural network. While these outputs can often be impressively intelligent and coherent, they sometimes struggle with questions requiring step-by-step reasoning, such as basic arithmetic.
An LLM can be prompted to simulate deliberative reasoning if instructed to formulate a plan and then adhere to it. However, this strategy isn’t consistently reliable, and models generally face challenges when tasked with problems that necessitate extensive and careful planning. Companies like OpenAI, Google, and now Anthropic are employing a machine learning technique called reinforcement learning to help their latest models generate reasoning that leads to accurate answers, relying on additional training data from humans to tackle specific issues.
According to Penn, Claude’s reasoning capability has been enhanced with extra data focused on business applications, including coding, utilizing computers, and addressing intricate legal inquiries. “The areas where we see improvements are primarily in … technical subjects or topics demanding lengthy reasoning,” Penn states. “We’ve received considerable interest from our clients in integrating our models into their operational tasks.”
Anthropic asserts that Claude 3.7 excels at resolving coding problems that involve step-by-step reasoning, outperforming OpenAI’s o1 on certain benchmarks like SWE-bench. The company is launching a new tool called Claude Code, specifically tailored for AI-assisted programming.
“The model is already proficient at coding,” Penn explains. However, “enhanced reasoning capabilities would be advantageous for scenarios that involve very complex planning, such as analyzing a substantially large codebase for a corporation.”