stream Best pastry in Tallinn 1. Pulla Bakery Location: Old Town, Voorimehe 7. Price: 3€ per bun. The front of this tiny cafe is just a few windows along a narrow but busy passageway in the Old Town of Tallinn, making it easy to walk past. Inside, though, are the best buns you'll
stream Frontier LLMs come every 1.5-2 years I have a theory that the LLM frontier moves every 1.5-2 years. Let me qualify. I mean that a major leap of the whatever is the best model (currently GPT-4) happens with that interval. Incremental improvements don't count: e.g. Claude 3 Opus is claimed to be
stream Explaining quiet quitting Quiet quitting is a recent name for something I am sure has been happening for all of time. It probably has something to do with the person's life circumstances, or something about the macro environment being depressing, or whatever. But let me propose a simple economic justification. Consider
stream SAD lights It seems like everyone I talk to recently is thinking in the same direction: managing their seasonal affective disorder, or SAD. If you live in Estonia, or anywhere with a long winter, or maybe even the less sunny parts of California, you'll know what I'm talking
stream Strive for off-grid discipline We lost a floorball game yesterday. I can't stop thinking about it. Maybe because it felt like a pointless loss, like we lost because of something fully within our grasp. What happened? Two periods into the game, we were ahead 3:1. We had kept our defence intact,
stream Momentum An object that has a lot of momentum is hard to stop. A bowling ball. An ocean liner. A person who will not allow themselves to be derailed. Physically, an object of any amount of momentum could be stopped in a very short time, if enormous force is exerted. And
stream Non-judgemental awareness is curative Rather than try to fix things, all you have to do is notice, non-judgementally, that you’re doing them. Even while you’re involved in this non-judgemental noticing, you will notice a barrage of impulses to try different things, to intervene, to try, to fix. That’s fine, just notice
stream Curiosity arises from lack of feed Here's a hypothesis: I think I'll always find something I am curious about. I am a naturally curious person, and I would actually guess that everyone is? It seems impossible that there would ever be a moment where nothing could interest me. But I think curiosity
stream Are LLMs deterministic? No. You can see for yourself: setting the temperature variable to 0 (meaning you always sample the most likely token from the output distribution), you'd expect GPT-3.5 and 4 to produce the same output every time you call them. However, they don't. Why do LLMs
stream Recent LLM launches, and LLM rumors Llama 3 is already training according to Zuck. There are conflicting sources & rumors, and the release date claims vary across all of 2024. For GPT-5 there are even no reliable rumors; if training started within the past few months then my back of napkin is that it may be
stream A wild speed-up from OpenAI Dev Day I'll share more thoughts on OpenAI Dev Day announcements soon, but one huge problem for any developer is LLM API latency. And boy, did OpenAI deliver. On a quick benchmark I ran: * gpt-4-1106-preview ("gpt-4-turbo") runs in 18ms/token * gpt-3.5-turbo-1106 ("the newest version of gpt-3.
stream LLM+API vs LMM+UI The two most famous startups focused on making Agents seem to be Imbue and Adept. Both companies' goal is to have a large model use a computer effectively, but it is interesting how they seem to bet on two different approaches. * Imbue: use LLMs (text-based) with API-based tools. * Adept:
stream Simplicity is essential in a generative world * "less is more" (proverb) * "perfection is finally attained not when there is no longer anything to add, but when there is no longer anything to take away" (Antoine de Saint Exupéry) * "omit needless words" (Strunk & White) * "It is vain to do with
stream Diverge-converge cycles in LLMs The Double Diamond is, roughly, a design framework consisting of 4 steps: 1. Diverging on problem (Discover). Explore widely to gather a broad range of insights and challenges related to the problem. 2. Converging on problem (Define). Analyze and synthesize the gathered insights to define a clear and specific problem
stream LLM latency is linear in output token count All top LLMs, including all GPT-family and Llama-family models, generate predictions one token at a time. It's inherent to the architecture, and applies to models running behind an API as well as local or self-deployed models. Armed with this knowledge, we can make a very accurate model of
stream RAG is more than just embedding 90% of time when people say "Retrieval-augmented generation" they mean that the index is built using an embedding model like OpenAI's text-embedding-002 and a vector database like Chroma, but it doesn't have to be this way. Retrieval is a long-standing problem in computer science
stream Retrieval-augmented generation Retrieval-augmented generation, or RAG, is a fancy term hiding a simple idea: Problem: LLMs can reason, but they don't have the most relevant facts about your situation. They don't know the location of your user, or the most relevant passage from the knowledge base, or what
stream Properties of a good memory Apps with no Memory are boring. Compare a static website from the 90s with any SaaS or social network or phone app: the former knows nothing about you, the latter knows a lot. From UI preferences (dark or light mode?) to basic personal data (name?) to your friend list to
stream Things I've underestimated - Sep 2023 After attending the Ray Summit in San Francisco this week, I realized I had previously discounted several interesting things. Here's what I now want to explore more. Semantic Kernel I've gotten so used to langchain that I haven't really considered switching... all the while
stream Launching OpenCopilot Across the many experiments I've made this year (and which I've written about here) I've felt the need for better tools. Specifically, for the past few months I have been building copilots, and doing so from scratch takes a bunch of work every time.
stream Context engineering is information retrieval The stages of an LLM app seem to go like this: * Hardcode the first prompt, get the end-to-end app working. * Realise that the answers are bad. * Do some prompt engineering. * Realise the answers are still bad. * Do some more prompt engineering. * Discover vector databases!!!1 * Dump a ton of data
stream Making GPT API responses faster GPT APIs are slow. Just in the past week, the OpenAI community has had 20+ questions around that. And not only is it rare for users to tolerate 30-second response times in any app, it is also extremely annoying to develop when even basic tests take several minutes to run.
stream agentreader - simple web browsing for your Langchain agent This is a short link-post to a new repo I just released. While working on Why AutoGPT fails and how to fix it I created a handy web-browsing Tool for langchain Agents and now finally got around to open-sourcing it. Here is the repository: github.com/taivop/agentreader.
stream Why AutoGPT fails and how to fix it A couple weeks after AutoGPT came out we tried to make it actually usable. If you don't know yet, it looks amazing on first glance, but then completely fails because it creates elaborate plans that are completely unnecessary. Even asking it to do something simple like "find
stream Core innovations of AutoGPT AutoGPT (repo) went viral on Github and looks impressive on Twitter, but almost never works. In the process of trying to improve it I dug into how it works. Really there are two important parts to AutoGPT: a plan-and-execute workflow, and looped Chain-of-thought (CoT). Plan-and-execute Say the user input is