DeepSeek-R1 is not Sputnik

2025-01-30 @ 9 AM - #starred, #ai

DeepSeek-R1’s lead is fundamentally different than what Sputnik’s was for many reasons, the primary one being the difference in access to powerful GPUs. DeepSeek did not design R1 to be trained on H800s just to see if it was possible—there were monetary and political incentives for them to create a powerful model on such limited hardware. In contrast, American AI companies have not felt any need to optimize model training, since they are much more focused on a different goal: fast, cheap inference. DeepSeek has been doing great work, but their work should not be any sort of scare for the American AI market, especially since R1 benchmarks extremely closely with o1.

As an analogy, I think of it as a student writing a compiler: it takes hard work for someone of their age, and fortells their ability to do much more complicated work as a future computer scientist. However, the same compiler could have just been created by a computer scientist who has specialized in compiler design for a decade. In this same way, DeepSeek is training impressive models on limited hardware, showing their architecture’s potential for training an even more powerful model if they had access to more powerful hardware. However, OpenAI already has access to the powerful hardware and is training their models using it, allowing them to easily train models with the same performance as R1, even with a worse model architecture. Therefore, even if DeepSeek is a student who—through a lot of hard effort—created a compiler, OpenAI is an experienced researcher who creates a similar result with much less effort.

Since American AI companies have access to the supplier of powerful GPUs (Nvidia) and now know a more performant training architecture through DeepSeek’s open research, there’s nothing stopping them from easily creating more powerful reasoning models than DeepSeek-R1. That’s the main difference compared to Sputnik—there shouldn’t be any perceived technical gap because DeepSeek’s innovation is unnecessary in the eyes of American AI companies (but it will still benefit these companies immensely).

Additionally, it’s not as if DeepSeek is using Chinese-made GPUs—if they were doing that, it would definitely should be a scare to American AI companies. But right now, DeepSeek and other Chinese AI companies still have a heavy reliance on Nvidia, allowing the United States to easily control the technological gap between it and China.