LLM+API vs LMM+UI
The two most famous startups focused on making Agents seem to be Imbue and Adept. Both companies' goal is to have a large model use a computer effectively, but it is interesting how they seem to bet on two different approaches.
- Imbue: use LLMs (text-based) with API-based tools.
- Adept: use LMMs (multimodal) with UI-based tools.
I don't currently hold a view on which is more likely to succeed. The LLM+API approach seems easier to implement but less likely to work, and LMM+UI approach harder to implement but much easier to deploy once working.