stream The GPU export limit might start to matter White House's new chip rules seem to not have huge impact right now, in the "tier 2" countries like Estonia. It is unlikely we ever would have been able to build a $1B data center anyway. But this restriction could still become significant. I don'
stream Good LLM devtools are also good human devtools What does a well designed developer tool (programming language, library, API, CLI utility) look like -- for an LLM? The more we write code with Copilot and Cursor and chat-assistants, the more important this question is. Off the top of my head, a good tool for LLM programmers: * Chunks, abstracts
stream LLMs highlight lack of progress in real world With LLMs now multimodal and extremely cheap, they can do much of what a human can, when limited to only the virtual. This rapid progress in the world of bits, however, is a sore reminder of how slowly things improve in the physical world. I need an AI that can:
stream Creating single-file apps with LLMs To prototype a mobile-optimized writing app I used LLMs as code generators. The idea isn't new but I got inspiration from reading Simon Willison's experiments creating micro-UIs. The idea is to prompt for whatever functionality you need, and ask for a single HTML file containing CSS
stream Writing on mobile is different Whenever I write here, I do it on my laptop, almost never on the phone. I do have a Bluetooth keyboard that connects to my phone, but it's rare that I remember to bring it with me, and have a moment to take it out of the bag.
stream Best pastry in Tallinn 1. Sumi by Põhjala Location: Kalamaja, Krulli quarter, Kopli 70a. Price: 3.5€ per bun. An offshoot of Põhjala Tap Room, this place is a dual concept of a bakery and open-fire grill dinner. Põhjala used to only offer pastry on Sundays, but now the amazing French/American inspired pastry
stream Frontier LLMs come every 1.5-2 years I have a theory that the LLM frontier moves every 1.5-2 years. Let me qualify. I mean that a major leap of the whatever is the best model (currently GPT-4) happens with that interval. Incremental improvements don't count: e.g. Claude 3 Opus is claimed to be
stream Explaining quiet quitting Quiet quitting is a recent name for something I am sure has been happening for all of time. It probably has something to do with the person's life circumstances, or something about the macro environment being depressing, or whatever. But let me propose a simple economic justification. Consider
stream SAD lights It seems like everyone I talk to recently is thinking in the same direction: managing their seasonal affective disorder, or SAD. If you live in Estonia, or anywhere with a long winter, or maybe even the less sunny parts of California, you'll know what I'm talking
stream Strive for off-grid discipline We lost a floorball game yesterday. I can't stop thinking about it. Maybe because it felt like a pointless loss, like we lost because of something fully within our grasp. What happened? Two periods into the game, we were ahead 3:1. We had kept our defence intact,
stream Momentum An object that has a lot of momentum is hard to stop. A bowling ball. An ocean liner. A person who will not allow themselves to be derailed. Physically, an object of any amount of momentum could be stopped in a very short time, if enormous force is exerted. And
stream Curiosity arises from lack of feed Here's a hypothesis: I think I'll always find something I am curious about. I am a naturally curious person, and I would actually guess that everyone is? It seems impossible that there would ever be a moment where nothing could interest me. But I think curiosity
stream Non-judgemental awareness is curative Rather than try to fix things, all you have to do is notice, non-judgementally, that you’re doing them. Even while you’re involved in this non-judgemental noticing, you will notice a barrage of impulses to try different things, to intervene, to try, to fix. That’s fine, just notice
stream Are LLMs deterministic? No. You can see for yourself: setting the temperature variable to 0 (meaning you always sample the most likely token from the output distribution), you'd expect GPT-3.5 and 4 to produce the same output every time you call them. However, they don't. Why do LLMs
stream Recent LLM launches, and LLM rumors Llama 3 is already training according to Zuck. There are conflicting sources & rumors, and the release date claims vary across all of 2024. For GPT-5 there are even no reliable rumors; if training started within the past few months then my back of napkin is that it may be
stream A wild speed-up from OpenAI Dev Day I'll share more thoughts on OpenAI Dev Day announcements soon, but one huge problem for any developer is LLM API latency. And boy, did OpenAI deliver. On a quick benchmark I ran: * gpt-4-1106-preview ("gpt-4-turbo") runs in 18ms/token * gpt-3.5-turbo-1106 ("the newest version of gpt-3.
stream LLM+API vs LMM+UI The two most famous startups focused on making Agents seem to be Imbue and Adept. Both companies' goal is to have a large model use a computer effectively, but it is interesting how they seem to bet on two different approaches. * Imbue: use LLMs (text-based) with API-based tools. * Adept:
stream Simplicity is essential in a generative world * "less is more" (proverb) * "perfection is finally attained not when there is no longer anything to add, but when there is no longer anything to take away" (Antoine de Saint Exupéry) * "omit needless words" (Strunk & White) * "It is vain to do with
stream Diverge-converge cycles in LLMs The Double Diamond is, roughly, a design framework consisting of 4 steps: 1. Diverging on problem (Discover). Explore widely to gather a broad range of insights and challenges related to the problem. 2. Converging on problem (Define). Analyze and synthesize the gathered insights to define a clear and specific problem
stream LLM latency is linear in output token count All top LLMs, including all GPT-family and Llama-family models, generate predictions one token at a time. It's inherent to the architecture, and applies to models running behind an API as well as local or self-deployed models. Armed with this knowledge, we can make a very accurate model of
stream RAG is more than just embedding 90% of time when people say "Retrieval-augmented generation" they mean that the index is built using an embedding model like OpenAI's text-embedding-002 and a vector database like Chroma, but it doesn't have to be this way. Retrieval is a long-standing problem in computer science
stream Retrieval-augmented generation Retrieval-augmented generation, or RAG, is a fancy term hiding a simple idea: Problem: LLMs can reason, but they don't have the most relevant facts about your situation. They don't know the location of your user, or the most relevant passage from the knowledge base, or what
stream Properties of a good memory Apps with no Memory are boring. Compare a static website from the 90s with any SaaS or social network or phone app: the former knows nothing about you, the latter knows a lot. From UI preferences (dark or light mode?) to basic personal data (name?) to your friend list to
stream Things I've underestimated - Sep 2023 After attending the Ray Summit in San Francisco this week, I realized I had previously discounted several interesting things. Here's what I now want to explore more. Semantic Kernel I've gotten so used to langchain that I haven't really considered switching... all the while
stream Launching OpenCopilot Across the many experiments I've made this year (and which I've written about here) I've felt the need for better tools. Specifically, for the past few months I have been building copilots, and doing so from scratch takes a bunch of work every time.