DeepSeek Has Gotten OpenAI Fired Up

It has been slightly over a week since DeepSeek disrupted the AI landscape. The arrival of its open-weight model—reportedly trained on a fraction of the specialized computing hardware utilized by leading organizations—sent shockwaves through OpenAI. Employees claimed to notice signs suggesting that DeepSeek had “inappropriately distilled” OpenAI’s models to develop its own, while the startup’s achievements prompted Wall Street to question whether firms like OpenAI were overindulging in compute expenses.

“DeepSeek R1 represents AI’s Sputnik moment,” stated Marc Andreessen, one of Silicon Valley’s most prominent and provocative innovators, on X.

In reaction, OpenAI is set to unveil a new model today, ahead of its initially scheduled launch. The model, known as o3-mini, will be introduced in both API and chat formats. Sources indicate it possesses o1-level reasoning combined with 4o-level speed. In simpler terms, it’s efficient, cost-effective, intelligent, and crafted to outperform DeepSeek.

The situation has inspired OpenAI employees. Within the organization, there’s a growing sentiment that—especially as DeepSeek captures the spotlight—OpenAI must enhance its efficiency or risk being eclipsed by its latest rival.

One issue stems from OpenAI’s beginnings as a nonprofit research entity before it transitioned into a profit-driven powerhouse. Employees assert that an ongoing conflict between the research and product divisions has led to a division between the teams focused on advanced reasoning and those centered on chat. OpenAI spokesperson Niko Felix disputes this, asserting that the leaders of these teams, chief product officer Kevin Weil and chief research officer Mark Chen, “meet every week and collaborate closely to align on product and research priorities.”

Some within OpenAI advocate for the development of a unified chat product, one model capable of discerning whether a question requires advanced reasoning. As of now, that vision has not materialized. Instead, a drop-down menu in ChatGPT prompts users to choose between GPT-4o (“suitable for most inquiries”) or o1 (“utilizes advanced reasoning”).

Certain employees claim that while chat generates the bulk of OpenAI’s revenue, o1 receives greater focus—and more computing resources—from leadership. “Leadership is not interested in chat,” remarked a former employee who worked on (you guessed it) chat. “Everyone is eager to contribute to o1 because it’s exciting, but the code base wasn’t designed for experimentation, so progress is lacking.” The former employee requested anonymity due to a nondisclosure agreement.

OpenAI invested years refining its reinforcement learning methods to fine-tune the model that ultimately became the advanced reasoning system known as o1. (Reinforcement learning is a technique that trains AI models through a system of rewards and penalties.) DeepSeek leveraged the reinforcement learning advancements that OpenAI had pioneered to establish its own advanced reasoning system, called R1. “They capitalized on understanding that reinforcement learning, applied to language models, is effective,” stated a former OpenAI researcher who is not authorized to comment publicly about the company.

“The reinforcement learning that [DeepSeek] executed is akin to what we employed at OpenAI,” mentioned another former OpenAI researcher, “but they did it using superior data and a cleaner stack.”

According to OpenAI staff, the research underpinning o1 was conducted within a code base referred to as the “berry” stack, optimized for speed. “There were trade-offs—experimental thoroughness for speed,” noted a former employee with direct insights into the matter.

Those trade-offs were justified for o1, essentially an extensive experiment despite the code base constraints. However, they offered less rationale for chat, a product relied upon by millions of users that was developed on a different, more stable stack. As o1 was introduced and became a commercial product, flaws began to surface in OpenAI’s internal workings. “It was like, ‘why are we doing this in the experimental codebase, shouldn’t this be in the main product research codebase?’” the employee elaborated. “There was significant resistance to that thought internally.”