Writing a regular expression engine
The last 4 months have seen me consumed in my latest side project, kleingrep. It’s a regex engine and command-line tool, written completely from scratch in Go. When I say “from scratch”, I mean that all the code was written by me; Russ Cox’s excellent articles on regular expressions and Ken Thompson’s seminal paper on the topic served as the theoretical foundations for my project.
This post accompanies the project. It mostly serves as a walkthrough of my implementation, without any code. If you’re already familiar with regular expressions, feel free to skim through. The final section describes some additional features I added to the engine (and their implementation), which provide feature parity with mainstream regex engines.
It is no longer safe to move our governments and societies to US clouds
The very short version: it is madness to continue transferring the running of European societies and governments to American clouds. Not only is it a terrible idea given the kind of things the “King of America” keeps saying, the legal sophistry used to justify such transfers, like the nonsense letter the Dutch cabinet sent last week, has now been invalidated by Trump himself. And why are we doing this? Convenience. But it is very scary to make yourself 100% dependent on the goodwill of the American government merely because it is convenient. So let’s not.
Brainwash an executive today!
A huge amount of the economy is driven by people who are, simply put, highly suggestible. That is to say that it is very, very easy to get them excited and willing to spend money.
Consider, for example, what it would take to get you to approach your company’s lawyer and suggest software to them, totally unprompted, because you saw an advertisement last night. Scratch that, make it every lawyer at your company as each and every one of them goes “I… have never heard of that”. But you just keep going because the next one might tell you that the Shamwow is an awesome product.
There is a massive industry that is built around gathering people that fit the “thinks LinkedIn is studying” profile into rooms, who also have access to organizational money, and then charging sales teams for permission to get into that room.
Systematically terraforming a brownfield of cloud infrastructure
Large-scale Architecture must be Functional and Composable
However, Terraform itself is not functional. It is a highly stateful, object oriented system, with what amounts to remote procedure calls. Complects structure + behaviour + state. The ansible system is imperative in the small, but functional in the large because it is an idempotent transform of infra → infra, with UNIX-like composable workflows, dynamic in-process state as a reference (dynamic inventory).
Constrain change, not people
As a designer, I strongly prefer to build a system that helps us:
Strictly gate-keep, structure, and micro-manage changes to live production, at one extreme of the management spectrum.
And at the other extreme, enable a fresh engineer to play with infrastructure willy-nilly. On day one of employment, a newcomer should be able to clone the codebase and follow a bunch of instructions to spin up, modify, destroy production-like infrastructure without any supervision whatsoever.
Any infrastructure automation system must limit the “blast radius” of changes. The longer a thing lives, the more the odds that something is going to go awry. At the same time, one must make experimentation easy and safe, to ease change management, and to onboard new people through tons of hands-on practice.
If a system has such a capability, it signals to me that it has solved for strategic risk management as well as daily-driver operator safety.
Use a mono-repo for great good
A single repository can contain a directory for each department. It’s always possible to split out each department into its own repo later, if some hard constraint dictates it be so. Absent that, it’s far better to put it all in one repo. Doing this well is invaluable in helping departments share practices while protecting secrets.
Choosing an init process for multi-process containers
If you are developing containers you must have heard the “single process per container” mantra. Inherently, there’s nothing wrong with running multiple processes in a container, as long as your
ENTRYPOINT
is a proper init process. Some use cases are having processes are aiding each other (such as a sidecar proxy process) or porting legacy applications.
ual
ual is a high-level, stack-based language designed for use on minimal or retro platforms with small virtual machines or embedded hardware. It bridges the gap between low-level hardware control and high-level programming abstractions, making it ideal for resource-constrained environments.
ual draws inspiration from several established languages:
From Lua: The clean syntax, function definitions, local variables, tables as the primary data structure, and multiple return values.
From Forth: The stack-based computational model, direct hardware access, and emphasis on small implementation footprint.
From Go: The package system with uppercase/lowercase visibility rules, clear import declarations, and pragmatic approach to language design.
Research debt
It’s tempting to think of explaining an idea as just putting a layer of polish on it, but good explanations often involve transforming the idea. This kind of refinement of an idea can take just as much effort and deep understanding as the initial discovery.
This leaves us with no easy way out. We can’t solve research debt by having one person write a textbook: their energy is spread too thin to polish every idea from scratch. We can’t outsource distillation to less skilled non-experts: refining and explaining ideas requires creativity and deep understanding, just as much as novel research.