#ai

DeepSeek-R1


DeepSeek’s models have been excellent for their limited resources, and the same holds true with their first-generation reasoning model. DeepSeek-R1 benchmarks similarly to OpenAI’s o1 and—as is the norm with many new AI startups—the model weights are MIT licensed and openly available. The distilled models are also extremely interesting as they allow for reasoning capabilities at model sizes as low as 1.5B, meaning that they can easily be run on-device rather than relying on DeepSeek’s API. R1 seems to be performing pretty well overall, but this Hacker News comment has an amazing transcript of the model contradicting itself on the amount of ’r’s in “strawberry” (as is tradition), so there’s clearly room for future improvements.

# 2025-01-20 - #ai, #hacker-news

Individual AI’s Environmental Impacts


While this article misses the recurring environmental impacts from different companies racing to train increasingly better LLMs, it provides a good analysis of the impacts of individual ChatGPT questions. I’ve always noted that promoting individual behavior changes has little impact on preventing environmental damage, and found this quote to be especially good for proving that point:

Getting worried about whether you should use LLMs is as much of a distraction to the real issues involved with climate change as worrying about whether you should stop the YouTube video you’re watching 12 seconds early for the sake of the Earth.

While watching 12 seconds of a YouTube video and asking a ChatGPT question are obviously not able to be directly compared, it’s pretty clear that neither are especially important to worry about in terms of their impact. However, I do agree that the promotion of LLMs will cause further model training which causes more environmental impacts, but hopefully the necessity for training entirely new models will diminish quickly as the relative increase of model performance plateaus.

# 2025-01-18 - #ai, #hacker-news

Ndea


A new AI startup cofounded by François Chollet, the creator of Keras. Ndea’s main focus seems to be developing AGI, which is obviously a shared interest of all the major AI companies.

As described on their website, Ndea is utilizing guided program synthesis to reach AGI:

Instead of interpolating between data points in a continuous embedding space, program synthesis searches for discrete programs, or models, that perfectly explain observed data.

While their description of program synthesis is a bit overcomplicated, it seems to simply be searching for certain programs that interpret and output data in a correct way. Since this program search can be guided through deep learning, Ndea claims that programs can be found which perfectly model observed data without necessitating compromises.

It will be interesting to see if this startup will be able to make a large, independent impact on AI research, or if it will just be another endeavour that results in being acquired by a larger company.

# 2025-01-15 - #ai, #bluesky

o1 Thinking In Chinese


While I haven’t personally encountered it in a while, ChatGPT chats used to rarely autogenerate titles in Spanish instead of English. I assume that o1 thinking in Chinese occurs for a similar set of reasons, since both seem related to the multilingual datasets that the models are trained on. As an aside, I’m curious if Chinese makes up a significantly higher percentage of the training data for Chinese LLMs like DeepSeek, since both English and Chinese already make up the significant part of the corpuses for English models.

# 2025-01-15 - #ai, #openai, #slashdot