mariontagg - AllMy•Bio

Simon Willison's Weblog

That design was trained in part utilizing their unreleased R1 "thinking" design. Today they have actually released R1 itself, in addition to a whole family of new models obtained from that base.

There's a whole lot of stuff in the new release.

DeepSeek-R1-Zero seems the base model. It's over 650GB in size and, like many of their other releases, is under a tidy MIT license. DeepSeek caution that "DeepSeek-R1-Zero encounters obstacles such as endless repetition, poor readability, and language mixing." ... so they also launched:

DeepSeek-R1-which "integrates cold-start information before RL" and "attains performance equivalent to OpenAI-o1 throughout math, code, and thinking jobs". That one is also MIT licensed, and is a similar size.

I do not have the ability to run models bigger than about 50GB (I have an M2 with 64GB of RAM), so neither of these two models are something I can quickly play with myself. That's where the brand-new distilled models are available in.

To support the research neighborhood, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and 6 dense models distilled from DeepSeek-R1 based upon Llama and Qwen.

This is a fascinating flex! They have designs based upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).

Weirdly those Llama models have an MIT license attached, which I'm uncertain is compatible with the underlying Llama license. Qwen designs are Apache certified so maybe MIT is OK?

(I likewise simply observed the MIT license files state "Copyright (c) 2023 DeepSeek" so they might need to pay a little bit more attention to how they copied those in.)

Licensing aside, these distilled models are remarkable monsters.

Running DeepSeek-R1-Distill-Llama-8B-GGUF

Quantized variations are already beginning to appear.