The success of DeepSeek suggests that the US and China tech cold war had an unexpected turn. Chinese tech companies have been severely hampered by US export controls in their ability to compete on AI in the American way, or vastly scaling up by purchasing more bits and receiving coaching for a longer period of time. In consequence, the majority of Chinese businesses have concentrated on developing river programs rather than creating their own models. However, DeepSeek’s most recent release shows that there is another way to win: by changing the fundamental structure of AI concepts and making use of scarce resources more effectively.
” Unlike some Chinese AI companies that rely heavily on access to advanced technology, DeepSeek has focused on maximizing software-driven tool optimization”, explains Marina Zhang, an associate professor at the University of Technology Sydney, who studies Chinese improvements. ” DeepSeek has adopted open source methods, bringing together our collective knowledge, and encouraging collaborative technology. This strategy sets DeepSeek aside from more aloof competitors because it reduces resource constraints and speeds up the development of cutting-edge systems.
So who is behind the AI business? And why do they just instantly release a top-notch model for free and give it away? WiRED spoke with professionals on China’s AI sector and watched DeepSeek leader Liang Wenfeng go through in-depth interviews to piece together the history of the business ‘ rapid rise. Numerous inquiries made by WIRED did not prompt responses from DeepSeek.
A Star Hedge Fund in China
Even within the Chinese AI market, DeepSeek is an innovative person. It started as Fire-Flyer, a deep-learning study unit of High-Flyer, one of China’s best-performing statistical wall money. The hedge fund was established in China in 2015, becoming the first quant hedge fund to raise more than 100 billion RMB ( roughly$ 15 billion ). ( Since 2021, the number has dipped to around$ 8 billion, though High-Flyer remains one of the most important quant hedge funds in the country. )
For decades, High-Flyer had been saving GPUs and building Fire-Flyer mainframes to evaluate financial data. Therefore, in 2023, Liang, who has a master’s degree in computer science, decided to put the firm’s resources into a new organization called DeepSeek that would establish its own cutting-edge models—and finally grow artificial general intelligence. It appeared as though Jane Street had decided to start an AI company and spend its money on cutting-edge study.
Bold perspective. But apparently, it worked. ” DeepSeek represents a new era of Chinese technology companies that prioritize long-term technological progress over quick commercialization”, says Zhang.
Liang claimed that the decision was motivated more by technological interest than a desire to make a gain. ” I wouldn’t be able to find a commercial reason]for founding DeepSeek ] even if you ask me to”, he explained. ” Because it’s not worth it economically. Basic scientific research has a really small return-on-investment ratio. When OpenAI’s beginning owners gave it wealth, they know weren’t thinking about how much profit they may get. Instead, it was that they actually wanted to do this thing”.
Nowadays, DeepSeek is one of the single leading AI companies in China that doesn’t depend on financing from technology giants like Baidu, Alibaba, or ByteDance.
A Fresh Group of Geniuses Eager to Prove Themselves
According to Liang, when he put up DeepSeek’s research group, he was not looking for skilled engineers to create a consumer-facing solution. Otherwise, he focused on PhD students from China’s best institutions, including Peking University and Tsinghua University, who were eager to prove themselves. Some had published in prestigious journals and received awards at foreign academic conferences, but many, according to the Chinese technology journal QBitAI, lacked industry experience.
” Our key technical jobs are primarily filled by graduates who graduated this year or in the last one or two times,” Liang told 36Kr in 2023. The hiring approach helped to foster a work environment in which employees were free to utilize plentiful technology resources to complete unusual research projects. It has a very unique way of working than well-established Chinese online businesses, where teams are frequently battling for resources. ( A recent example is when ByteDance accused a former intern of sabotaging his team’s work in order to divert more computing resources for his team. )
Liang said that kids can be a better match for high-investment, low-profit analysis. ” Most folks, when they are young, you devote themselves totally to a goal without utilitarian factors”, he explained. His ball to potential employers states that DeepSeek was founded to” address the hardest issues in the world.”
According to experts, the fact that these younger scientists are almost fully educated in China increases their motivation. This younger generation furthermore embodies a sense of loyalty, explains Zhang, especially as they explore US restrictions and difficult hardware and software systems. Their resolve to overcome these obstacles is reflected in both their personal ambitions and their wider commitment to strengthening China’s position as a leader in international development.
Innovation Born out of a Problems
The US government began putting together trade controls in October 2022 that greatly restricted Chinese AI firms from obtaining cutting-edge cards like Nvidia’s H100. The shift presented a concern for DeepSeek. The business had a stash of 10,000 H100s when it first launched, but it needed more to compete with companies like OpenAI and Meta. In a subsequent interview in 2024, Liang said,” The issue we are facing is never the money, but rather the export control of advanced cards.”
DeepSeek needed to develop more effective techniques for training its types. Peter Chang, a software engineer turned plan scientist at the Mercator Institute for China Studies, describes how they modified their model architecture by using a number of engineering tricks, including custom communication schemes between chips, reducing the size of fields to protect memory, and making creative use of the mix-of-models approach. ” Many of these approaches aren’t new ideas, but combining them successfully to produce a cutting-edge model is a remarkable feat”.
Multi-head Latent Attention ( MLA ) and Mixture-of-Experts, two technical designs that make DeepSeek models more cost-effective by requiring fewer computing resources to train, have also made significant progress. According to the research institute Epoch AI, DeepSeek’s most recent model is so effective that it needed one-tenth the computational power of Meta’s comparable Llama 3. 1 model to train.
DeepSeek has a lot of goodwill with the world’s AI research community because of its willingness to share these innovations with the general public. For many Chinese AI companies, developing open source models is the only way to play catch-up with their Western counterparts, because it attracts more users and contributors, which in turn help the models grow. ” They’ve now demonstrated that cutting-edge models can be built using less, though still a lot of, money and that the current norms of model-building leave plenty of room for optimization”, Chang says. ” We are certain to see a lot more attempts in this direction going forward,” he said.
The current US export controls that are aimed at creating bottlenecks in computing resources may be in trouble with the news. Existing estimates of how much AI computing power China has and what they can do with it might be skewed, according to Chang.