The dust from the DeepSeek hype train has settled a bit, and more became clear about DeepSeek.
- Training might have cost $6M, but that may not be all the expenses. Plus they already had 2000 Nvidia H800s.
- OpenAI accuses DeepSeek’s developers of using their models at some stage of training.
- Quality is comparable to the best models, and it’s the best open-source model.
- Price is 2x lower than the fresh o3-mini and 25x lower than o1.
- At inference, the low cost is achieved by using cheap cards (8/12-bit weights + H800-specific optimizations) and Sparse Mixture-of-Experts (not all layers and experts are active on each token).
- Even if they used OpenAI at some stage, that trick can be pulled off in the future. It’s unclear how long this factor will persist. Will it be possible to repeat this forever?
- Now even mid-sized companies can move part of their processes to this model, cutting costs. Or even run the model themselves, cutting costs further. Plus it can be used by those who are currently on their own Llama due to some restrictions.
- Since the model is open, removing its restrictions won’t be hard. We can expect even smarter spam bots, scammers, and other bad actors.
- After a while, MoE and other techniques will be replicated by others, which will further reduce inference prices.