The release of a new AI chatbot by a small Chinese company has been accompanied by plummeting stock market values and extravagant claims. What sets it apart?
The launch of China’s new DeepSeek AI-powered chatbot app has shaken up the technology sector. It swiftly surpassed OpenAI’s ChatGPT to become the most downloaded free iOS app in the United States, leading to a staggering loss of nearly $600 billion (£483 billion) in market value for chip manufacturer Nvidia in just one day—a record for the US stock market.
What accounts for this upheaval? The app is driven by a “large language model” (LLM) that boasts reasoning abilities comparable to those of US models like OpenAI’s, yet it reportedly requires only a fraction of the cost to train and operate.
Dr. Andrew Duncan serves as the Director of Science and Innovation for Fundamental AI at the Alan Turing Institute in London, UK.
DeepSeek asserts that it has accomplished this by implementing various technical strategies that have minimized both the computation time needed to train its model, known as R1, and the memory required for storage.
According to DeepSeek, these reductions in overhead have led to significant cost savings. The base model, R1 V3, reportedly took 2.788 million hours to train, utilizing multiple graphical processing units (GPUs) simultaneously, with an estimated cost of less than $6 million (£4.8 million).
In contrast, OpenAI CEO Sam Altman states that training GPT-4 exceeded $100 million (£80 million). Despite a decline in Nvidia’s market value, research from the company indicates that the DeepSeek models were trained on approximately 2,000 Nvidia H800 GPUs.
These chips are a modified version of the widely used H100 chip, designed to comply with export regulations to China. It is likely that these chips were stockpiled before the Biden administration implemented stricter restrictions in October 2023, effectively prohibiting Nvidia from exporting H800s to China.
Given these constraints, DeepSeek has likely been compelled to discover innovative methods to maximize the effectiveness of its available resources. Lowering the computational costs associated with training and running models may also help address concerns regarding the environmental impact of AI.
The data centers that support these operations have substantial electricity and water requirements, primarily to prevent the servers from overheating. While most tech companies do not reveal the carbon footprint of their model operations, a recent estimate suggests that ChatGPT’s monthly carbon dioxide emissions exceed 260 tons – equivalent to 260 flights from London to New York. Therefore, enhancing the efficiency of AI models would be a positive advancement for the industry from an environmental perspective.
Of course, it remains to be seen whether DeepSeek’s model will actually lead to energy savings, or whether cheaper, more efficient AI will encourage more people to use it, increasing overall energy consumption. At the very least, it may help push sustainable AI up the agenda at the upcoming Paris AI Action Summit, so that the AI tools we use in the future will also be kinder to the planet. DeepSeek surprised many how quickly it emerged with such a competitive large-scale language model. The company was founded just in 2023 by Liang Wenfeng, who is now hailed as an “AI hero” in China.
The model is built from a collection of smaller models, each specializing in specific areas of expertise.
The newest DeepSeek model is notable for its openly released “weights”—the numerical parameters derived from the training process—along with a technical paper that outlines its development. This transparency allows other organizations to operate the model on their own systems and modify it for different applications.
Furthermore, this openness enables researchers worldwide to investigate the inner workings of the model, in contrast to OpenAI’s o1 and o3 models, which function as black boxes. However, some information is still lacking, including the datasets and code utilized for training the models, prompting groups of researchers to work on reconstructing these elements.
Not all of DeepSeek’s cost-cutting strategies are novel; some have been implemented in other large language models (LLMs). In 2023, Mistral AI publicly launched its Mixtral 8x7B model, which was comparable to the leading models of the time. Both Mixtral and DeepSeek’s models utilize the “mixture of experts” approach, where the overall model consists of a collection of smaller models, each specializing in different areas.
When presented with a task, the mixture model selects the most suitable “expert” to handle it. DeepSeek has also disclosed its unsuccessful efforts to enhance LLM reasoning using other technical methods, including Monte Carlo Tree Search, a strategy that has long been considered a potential way to improve LLM reasoning processes. Researchers will use this information to explore how the model’s already remarkable problem-solving abilities can be further refined—advancements that are likely to be incorporated into the next generation of AI models.
What does all of this imply for the future of the AI industry? DeepSeek appears to be showing that building advanced AI models does not require extensive resources. I predict that we will begin to see highly capable AI models developed with increasingly fewer resources as companies discover ways to enhance the efficiency of model training and operation.
Until now, the AI landscape has largely been controlled by “Big Tech” firms in the US, with Donald Trump referring to DeepSeek’s emergence as “a wake-up call” for the US technology sector. However, this development may not necessarily spell trouble for companies like Nvidia in the long run: as the financial and time investments needed to create AI products decrease, both businesses and governments will find it easier to adopt this technology.
This, in turn, will stimulate demand for new products and the chips that power them, perpetuating the cycle. It seems probable that smaller companies like DeepSeek will increasingly contribute to the development of AI tools that could enhance our lives. Underestimating this potential would be a mistake.