DeepSeek-R1

2025-01-20 @ 6 PM - #ai, #hacker-news

via Hacker News

DeepSeek’s models have been excellent for their limited resources, and the same holds true with their first-generation reasoning model. DeepSeek-R1 benchmarks similarly to OpenAI’s o1 and—as is the norm with many new AI startups—the model weights are MIT licensed and openly available. The distilled models are also extremely interesting as they allow for reasoning capabilities at model sizes as low as 1.5B, meaning that they can easily be run on-device rather than relying on DeepSeek’s API. R1 seems to be performing pretty well overall, but this Hacker News comment has an amazing transcript of the model contradicting itself on the amount of ’r’s in “strawberry” (as is tradition), so there’s clearly room for future improvements.