After Browsers

It’s fascinating to think about what might replace browsers. Most likely, it will be some kind of multimodal assistant that “digests” content for us and achieves the desired results without the need to open pages, click around, and deal with the quirks of specific sites and interfaces.

Right now, we’re at a point where current models can, albeit slowly, find and order something on a website. But there’s no real added value here. I want an assistant not just to place an order I specify but to help me choose (including virtual try-ons), find the best and most cost-effective option, track delivery, remind me about it, and contact support if there are delays. If the result doesn’t meet my expectations, it should also be able to process a return.

We’re still far from such a system, but that doesn’t mean we shouldn’t think about it. What will the end result look like? What intermediate steps will lead us there? And where will it all take us?

Will it be a universal agent (unlikely) or a set of different agents/skills? Will there be a battle of ecosystems between different agents? Will third-party agents or skills be allowed into these ecosystems? What will “SEO” for such an agent look like? What could an API for such an agent look like?

How will it work internally? It will likely require some sort of continuously running service with memory (probably multiple types of memory), a task management system, and several models specialized for different tasks.

How will user data be protected if the agent has access to everything? Authentication can be implemented, but what about protecting other data?

How will transparency and fairness in the agent’s operations be ensured?

What will advertising look like in such a world?

How can we prevent these agents from being used for destructive purposes in a world where captchas no longer exist? For example: “Write 100 posts persuading people that <…>, engage with everyone who comments on those posts, and convince them as well.”