Slab Technical Teardown

This is a retrospective on vibecoding Slab 17 for others interested in the engineering.

I set out to make Slab without writing any of the code myself. I’d done some vibecoding experiments of small toys and concluded that all of the capabilities I would need were entirely within the wheelhouse of existing tools. The only question was a matter of how it would hold up working on a vibecoded project over the course of a month or longer. The answer is in the affirmative: I have written none of the code in this project myself. Beyond that, I haven’t even read most of it. I’ve peeked at things like database RLS policies and webworker sandboxing to audit the most obvious malicious attack vectors, but otherwise my interaction with the app has been through using it and describing the desired functionality to my assistant of choice.

The original design goals were

a cross-platform mobile app
that loads a new puzzle every day
and stores a user’s progress

Since the graphical component of the game is quite simple, the frontend is entirely web-based in React. In order to get cross-platform deploys, I chose Tauri. For loading new puzzles and also storage I chose Supabase. An alternative would be to make the puzzles up front and package them, and use localstorage or Tauri’s sqlite for progress. One of the benefits of using remote storage was the ability to let users create their own puzzles and share with one another (which was helpful in my early testing with friends who made puzzles for me so I could experience the game from the player’s seat).

With those choices, React, Tauri, and Supabase, I set off to vibe it up. Here are some of the takeaways.

Adding analytics with Plausible took 20 minutes. Once my free trial was running out, creating my own logs table and frontend view and migrating off of Plausible took 10 minutes. The marginal value of SaaS is going to plummet. I imagine someone is working on the SaaS-killing app which effectively bootstraps whichever products you use into vibed, usable-enough replacements given your infrastructure.
Anything that got into the Tauri plugin ecosystem and required actual mobile builds to test was a real pain in comparison to the rest of the development lifecycle. That said, the deeplink plugin worked quite easily, handing off the link to the react router. For my next native project I might start out with a particular mobile platform and then see how well code assistants can directly migrate all the code over into idiomatic code for the other platform. This makes maintenance more complex, but I expect once we hit high enough reliability with codegen we’ll start to see spec and prompts as source of truth with cheap QA processes to vet the results (which by and large succeed). That is to say you’ll check in prompting, generate apps, recommit with bug reports, close a PR with finished functional builds.
Getting responsive displays right (highly custom layout in general) is a pain with vibecoding. Visual artifacts and misalignments are difficult for the model to understand. I wonder what kind of work is being done to align their code output with visual understanding of rendered html/css, plus throwing vision in the loop so that the agent has context of what things look like (and can silently iterate until it hits a spec).
Supabase makes vibing up storage incredibly easy. I expect postgrest and the frontend client (with whatever remote postgres you prefer) could do just as well if you got the tooling right. I don’t plan for this to scale to the point of needing to reduce costs, plus the ease of authentication (and the addition of anonymous authentication, which they didn’t have when I made a different game three years ago which prompted me to use Firebase) make it a great choice.
Adding a prompt to the puzzle creator so that a user can paste it into an AI and get executable javascript which implements the evaluation for a slab turned out to work really well (most models can one-shot rigorous evaluation functions). I also learned about the shadowrealm spec which I expect to be useful for more generated code sandboxing in the future (depending on the source of the output given it’s not primarily about security, but the synchronous execution is really nice), though I’m using a webworker which is fine if annoying to deal with async. (I also gave a talk on another pattern using generative code to bootstrap interactions.)
It might have been nice to document my prompting more but I cared more about making my game than about studying or sharing the method whereby it was produced. The tools are changing fast enough too that I don’t expect that information to be super long lived. I am curious about efforts here (and saw a vibed game some months back where they had checked in every single prompt).

All told the functional code clocks in around 10k lines. I did maybe two (AI-gen) refactor sessions to make files smaller and break components apart (both for reuse, and also because big files seem to distract the AI and lead to a worse dev cycle with buggier implementations and more re-prompting). Hot module reloading made for a really nice developer experience where I was mostly gated by the waiting time of the code generation. If I were to use an asynchronous workflow I would probably have the AI writing and running tests, including visually appraising the effects of what they’d done. As is I wrote no tests and introduced maybe 2 regressions which were caught quickly. Knowing the full scope of what I wanted to build before I started let me focus on just getting the game out there rather than longer term maintainability and development — i.e. the ability to call software “done” as a design goal allows stripping away a ton of additional engineering because what you’re making is basically a prototype, but a polished and usable one that doesn’t need to become anything else.

Finally I want to give thanks to the many people who played and gave feedback throughout this time. Their inputs were consistently the biggest source of direction and support which made Slab more accessible and playable beyond a mathematical toy understood by few. From the friends who played regularly and shared puzzles, gave feedback, ideas, and screen recordings, to the person in a Discord server who recorded an hour and a half long play session with narration on all the things they wished were different, thank you. It’s been a great lesson for me in getting community feedback. For anyone who knows someone who is creating, here’s a reminder that your honest experience can be an invaluable source of information to them.

(PS The code is up on github, but I haven’t read almost any of it. Have fun.)