stream Gemini 2.5 Pro isn't quite Claude 3.7 at coding Gemini 2.5 Pro was just released and it could be a big deal, if its coding abilities pan out. The current positioning of Gemini has been roughly that it is a tad behind the OpenAI/Anthropic models of the same class, and far behind the coding capabilities of Claude
stream Hiring as the marriage problem At one point in my career, our company was hiring for role of a Head of Marketing. I got a strong referral from a friend, reached out, the match was great, and she passed the interview process with ease. And then we just... sat on it. The hiring manager said
stream Which European institution is running a massive LLM hackathon? Anthropic, OpenAI and possibly others are doing one in the US with 1,000 participants from public sector research institutes. It's called an "AI Jam" and the companies are obviously doing it out of commercial interest. Even so, the government sector must have such obvious use
stream Will AI take us towards refinement of the self? In Would You Rather Have Married Young?, Lillian Fishman writes about the "fundamental ethos that had long governed the young secular woman" in the last 50 years: Experience, we hoped, would broaden us. The new object seems to be the inverse: the contraction and refinement of the self,
stream Half AI, half random curiosities I rarely look at the search analytics for my blog, but today I did. I am really amused by the most common search queries that bring people to this website, which definitely show the variety of content I post here. Here are the top ones in the past month, with
stream Vibe-code with stable infrastructure Vibe-coding produces close-to-unmaintainable code right now. But it is an acceptable trade-off in places where it produces decent results and maintainability matters less -- for example making simple UIs. So maybe a good approach is to vibe-code with stable infrastructure. This riffs off of Facebook's engineering principle "
stream The GPU export limit might start to matter White House's new chip rules seem to not have huge impact right now, in the "tier 2" countries like Estonia. It is unlikely we ever would have been able to build a $1B data center anyway. But this restriction could still become significant. I don'
stream Good LLM devtools are also good human devtools What does a well designed developer tool (programming language, library, API, CLI utility) look like -- for an LLM? The more we write code with Copilot and Cursor and chat-assistants, the more important this question is. Off the top of my head, a good tool for LLM programmers: * Chunks, abstracts
stream LLMs highlight lack of progress in real world With LLMs now multimodal and extremely cheap, they can do much of what a human can, when limited to only the virtual. This rapid progress in the world of bits, however, is a sore reminder of how slowly things improve in the physical world. I need an AI that can:
stream Creating single-file apps with LLMs To prototype a mobile-optimized writing app I used LLMs as code generators. The idea isn't new but I got inspiration from reading Simon Willison's experiments creating micro-UIs. The idea is to prompt for whatever functionality you need, and ask for a single HTML file containing CSS
stream Writing on mobile is different Whenever I write here, I do it on my laptop, almost never on the phone. I do have a Bluetooth keyboard that connects to my phone, but it's rare that I remember to bring it with me, and have a moment to take it out of the bag.
stream Best pastry in Tallinn 1. Sumi by Põhjala Location: Kalamaja, Krulli quarter, Kopli 70a. Price: 3.5€ per bun. An offshoot of Põhjala Tap Room, this place is a dual concept of a bakery and open-fire grill dinner. Põhjala used to only offer pastry on Sundays, but now the amazing French/American inspired pastry
stream Frontier LLMs come every 1.5-2 years I have a theory that the LLM frontier moves every 1.5-2 years. Let me qualify. I mean that a major leap of the whatever is the best model (currently GPT-4) happens with that interval. Incremental improvements don't count: e.g. Claude 3 Opus is claimed to be
stream Explaining quiet quitting Quiet quitting is a recent name for something I am sure has been happening for all of time. It probably has something to do with the person's life circumstances, or something about the macro environment being depressing, or whatever. But let me propose a simple economic justification. Consider
stream SAD lights It seems like everyone I talk to recently is thinking in the same direction: managing their seasonal affective disorder, or SAD. If you live in Estonia, or anywhere with a long winter, or maybe even the less sunny parts of California, you'll know what I'm talking
stream Strive for off-grid discipline We lost a floorball game yesterday. I can't stop thinking about it. Maybe because it felt like a pointless loss, like we lost because of something fully within our grasp. What happened? Two periods into the game, we were ahead 3:1. We had kept our defence intact,
stream Momentum An object that has a lot of momentum is hard to stop. A bowling ball. An ocean liner. A person who will not allow themselves to be derailed. Physically, an object of any amount of momentum could be stopped in a very short time, if enormous force is exerted. And
stream Curiosity arises from lack of feed Here's a hypothesis: I think I'll always find something I am curious about. I am a naturally curious person, and I would actually guess that everyone is? It seems impossible that there would ever be a moment where nothing could interest me. But I think curiosity
stream Non-judgemental awareness is curative Rather than try to fix things, all you have to do is notice, non-judgementally, that you’re doing them. Even while you’re involved in this non-judgemental noticing, you will notice a barrage of impulses to try different things, to intervene, to try, to fix. That’s fine, just notice
stream Are LLMs deterministic? No. You can see for yourself: setting the temperature variable to 0 (meaning you always sample the most likely token from the output distribution), you'd expect GPT-3.5 and 4 to produce the same output every time you call them. However, they don't. Why do LLMs
stream Recent LLM launches, and LLM rumors Llama 3 is already training according to Zuck. There are conflicting sources & rumors, and the release date claims vary across all of 2024. For GPT-5 there are even no reliable rumors; if training started within the past few months then my back of napkin is that it may be
stream A wild speed-up from OpenAI Dev Day I'll share more thoughts on OpenAI Dev Day announcements soon, but one huge problem for any developer is LLM API latency. And boy, did OpenAI deliver. On a quick benchmark I ran: * gpt-4-1106-preview ("gpt-4-turbo") runs in 18ms/token * gpt-3.5-turbo-1106 ("the newest version of gpt-3.
stream LLM+API vs LMM+UI The two most famous startups focused on making Agents seem to be Imbue and Adept. Both companies' goal is to have a large model use a computer effectively, but it is interesting how they seem to bet on two different approaches. * Imbue: use LLMs (text-based) with API-based tools. * Adept:
stream Simplicity is essential in a generative world * "less is more" (proverb) * "perfection is finally attained not when there is no longer anything to add, but when there is no longer anything to take away" (Antoine de Saint Exupéry) * "omit needless words" (Strunk & White) * "It is vain to do with
stream Diverge-converge cycles in LLMs The Double Diamond is, roughly, a design framework consisting of 4 steps: 1. Diverging on problem (Discover). Explore widely to gather a broad range of insights and challenges related to the problem. 2. Converging on problem (Define). Analyze and synthesize the gathered insights to define a clear and specific problem