OpenAI rolled out o1 for complex tasks

— The model got smarter because it first thinks and then answers. We don’t see that process, only a compressed description of the result. Inside it has Chain-of-Thought and, apparently, some critique and aggregation of the result, multiplied by fine-tuning for all of that. — o1 is great at math, physics, and code, follows instructions well, but knows less about the world around it. — The model solves complex math and programming problems at the level of medalists of international olympiads. In physics it’s at a graduate student level (around 75–80% correct answers). — The new o1 doesn’t need special prompting. It will do everything under the hood by itself. — The model is available to all paid ChatGPT Plus subscribers with a limit: 30 messages per week for the large model and 50 for the small one. Prices will bite hard, because it’s more expensive both in resources per token and in the number of tokens (for reasoning). — OpenAI are already testing an improved model, but aren’t releasing it yet. Looks like it will have a larger context and more time to think.

https://openai.com/index/introducing-openai-o1-preview/

Conclusions:

  • Perhaps models will split into “controllable” and “all-in-one” for solving different classes of tasks. Somewhere you need to think through something complex, and somewhere you just need to do it and want at least some control. For example, support agents.
  • For now I don’t see any use for myself beyond ChatGPT. Too expensive for my tasks. But I need to test it.
  • I’ll need to return to experiments with groups of agents. At least it’s fun 🙂