It’s interesting what will replace browsers. Apparently it will be some multimodal assistant that will “chew” content for us and achieve the desired result without us needing to open pages, click around, and deal with the interface quirks of each site.
Right now we’re at a point where current models can, albeit slowly for now, find something and place an order on a site. But there’s no extra value in that. I want the assistant not just to order what I say, but to help me choose (including trying on), find the best and most cost-effective option, track delivery, remind me about it, contact support if it’s delayed. And if I’m not satisfied with the result, it should be able to arrange a return.
We’re still far from such a system, but that doesn’t mean we shouldn’t think about it. What will the outcome look like, what intermediate steps will appear on the way, and what will it all lead to?
Will it be a universal agent (unlikely) or a set of different agents/skills? Will there be a battle of agent ecosystems? Will third-party agents/skills be allowed into the ecosystem? What will SEO look like for such an agent? What could an API for such an agent look like?
How will it be arranged inside? Probably it should be a constantly running service with memory (more than one), a task system, and multiple models for different tasks.
How will user data be protected if the agent has access to everything? Authentication is possible, but what about other data?
How do we ensure transparency of choice and fairness in agent behavior?
What will advertising look like in such a world?
How do we protect ourselves from the use of these agents for destructive purposes in a world where there’s no more captcha? “Write a hundred posts convincing people that <…>, talk to everyone who commented on this post, and convince them.”