LLM+API vs LMM+UI

The two most famous startups focused on making Agents seem to be Imbue and Adept. Both companies' goal is to have a large model use a computer effectively, but it is interesting how they seem to bet on two different approaches.

  • Imbue: use LLMs (text-based) with API-based tools.
  • Adept: use LMMs (multimodal) with UI-based tools.

I don't currently hold a view on which is more likely to succeed. The LLM+API approach seems easier to implement but less likely to work, and LMM+UI approach harder to implement but much easier to deploy once working.