Chosen links

Links - 7th May 2023

The “Build Your Own Database” book is finished

Databases are a fascinating topic. They are a foundation of modern computing. Learning how they work should be an important part of software engineering education.

As many of today’s (2023+) coders do not have a formal CS/SE education, basic things such as databases, compilers, operating systems, etc. are often seen as magical black boxes. That’s why I started the “Build Your Own X” book series. To learn and teach the basics in a “from scratch” approach, through succinct & condensed books.

There are some important topics that we can learn from database systems:

  1. Persistence. How not to lose or corrupt your data. Recovering from a crash.

  2. Indexing. Efficiently querying and manipulating your data. (B-tree).

  3. Concurrency. How to handle multiple (large number of) clients. And transactions.

Abstract Machine Models - Also: what Rust got particularly right

Ever since 2010, I have studied the “meta” of software, by studying (and thinking about) the continued dialogue between programming language designers, computer designers, and programmers.

The following constitutes a snapshot of my current thinking.

From there, I focused on the following: “what’s in the mind of programmers, when they choose one way of doing things over another that’s functionally equivalent?”

The one thing that was clear from the start, is that most programmers “simulate” the behavior of their program in their mind, to predict how the program will behave at run-time.

As we’ve determined above, that simulation does not happen in the functional model of the programming language.

Meanwhile, I knew from my teaching practice that nobody really understands hardware computers, and so this mental simulation was also not happening with a model of a hardware platform. In fact, I’ve found that folk would rather not think about hardware at all, and thankfully so: this made it possible, over and over, to port software from one hardware platform to another, without rewriting the software.

This meant that all programmers are able to construct a somewhat abstract model of their computer in their mind, but not so abstract that it becomes purely functional.

That is when I coined the phrase abstract machine model, and it became the anchor of my subsequent study.

One thing that bothered me much early on was whether AMMs were truly distinct from programming languages or the computers that we use.

The question was really: when a programmer thinks about the run-time behavior of their program, are they only able to formulate their thoughts within the confines of the language they’re using to write the program or the computer they’re working with?

In summary, I incrementally developed an understanding that:

  • Programmers use AMMs to write software.

  • AMMs exist separately from programming languages, and separately from hardware platforms.

  • There is more than one AMM, and AMMs differ in prediction rules and expressivity.

  • An AMM can sometimes be used to program effectively across multiple languages, but not all.

  • An AMM can sometimes be used to program effectively across multiple hardware computers, but not all.

And so it was interesting to me to wonder: “when do AMMs appear? When does a programming language designer push for a new AMM, and when can they slip into the shoes of an existing community?”

While building the table above and studying PL history, I discovered that language designers come in three groups:

  1. machine-first designers, who start with one or more hardware platform that’s sufficiently different from everything that was done before that it needs a new AMM, and often a new programming language to program it.

  2. second-language designers, who assume the existence of some machine/language ecosystem, adopts it and simply adds new abstractions / expressivity on top.

  3. AMM-first designers, who are interested to control the way programmers think first (usually, due to some idea about how this will result in better software quality), and who merely think about hardware diversity as an inconvenience that needs to be hidden from programmers.

I am now able to explain that what makes certain programming problems “hard” or “interesting” is not related to oddities in hardware or programming languages, but rather to the way programmers think about machines, i.e. the properties of their AMMs.

This makes me able to connect related software challenges across programming language boundaries, or to recognize when similar-looking programs in different languages have, in fact, extremely different semantics.

It also makes me able to estimate how much time or effort it will take me to learn a new technology stack or programming language: if I can track its ancestry and design principles, I can estimate its conceptual distance to AMMs I already know.

It also makes me able to estimate whether an already-written program will work well on a new computer, with or without translation to a different language or machine instruction set (ISA), depending on what I know of the AMM that its programmer likely had in mind when the program was written.

The Prodigal Techbro

The Prodigal Tech Bro is a similar story, about tech executives who experience a sort of religious awakening. They suddenly see their former employers as toxic, and reinvent themselves as experts on taming the tech giants. They were lost and are now found. They are warmly welcomed home to the center of our discourse with invitations to write opeds for major newspapers, for think tank funding, book deals and TED talks. These guys — and yes, they are all guys — are generally thoughtful and well-meaning, and I wish them well. But I question why they seize so much attention and are awarded scarce resources, and why they’re given not just a second chance, but also the mantle of moral and expert authority.

Today, when the tide of public opinion on Big Tech is finally turning, the brothers (and sisters) who worked hard in the field all those years aren’t even invited to the party. No fattened calf for you, my all but unemployable tech activist. The moral hazard is clear; why would anyone do the right thing from the beginning when they can take the money, have their fun, and then, when the wind changes, convert their status and relative wealth into special pleading and a whole new career?

Avoiding the rewrite trap

Someone has to run and modify the old system while you’re writing the new one. But that job sucks, and they’re likely to quit before you’re done.

You are imagining that your whole team can swarm on the new thing and just knock it out. If you could do the rewrite in a few weeks, maybe. But more likely, you have to keep some people back to keep the old system running, fix bugs, or even add new features to that old system. If those people think that they are on a sinking ship, they are likely to quit, leaving you with a code base that no one wants to support but is still critical to paying the bills. Sure you could rotate the team through supporting the old system, but over time the people who know the old system are likely to leave, and the newcomers will disdain learning the legacy stack.

The seven programming ur-languages

But not all languages have the same set of patterns. The patterns for looping in C or Python are very different from the patterns of recursion in Standard ML or Prolog. The way you organize a program in Lisp, where you name new language constructs, is very different from how you organize it in APL, where fragments of symbol sequences are both the definitions of behavior and become the label for that behavior in your mind.

These distinct collections of fundamentals form various ur-languages. Learning a new language that traces to the same ur-language is an easy shift. Learning one that traces to an unfamiliar ur-language requires significant time and effort and new neural pathways.

I am aware of seven ur-languages in software today. I’ll name them for a type specimen, the way a species in paleontology is named for a particular fossil that defines it and then other fossils are compared to the type specimen to determine their identity. The ur-languages are:

  • ALGOL

  • Lisp

  • ML

  • Self

  • Forth

  • APL

  • Prolog

The UX research reckoning is here

There are three types of work that UX Researchers need to do:

  • Macro-research is strategic in nature, business-first, and future-thinking. It provides concrete frameworks that guide macro business decisions.

  • Middle-range research is focused on user understanding and product development.

  • Micro-research is closer to technical usability, eye tracking, and detailed interaction development.

The biggest reason UX Research is facing this reckoning is that we do way, way too much middle-range research.

Middle-range research is a deadly combination of interesting to researchers and marginally useful for actual product and design work. It’s disproportionately responsible for the worst things people say and think about UXR. Doing so much of it just doesn’t deliver enough business value.

So many common forms of research questions live in the middle-range:

  • How do users think/feel about X functionality/activity?

  • What are the concerns or challenges with Y?

  • Why are users using/not using Z feature?

Middle-range findings are usually not specific enough. They tend to be too general and descriptive, even when a researcher does an amazing job communicating. They’re hard to turn into specific recommendations and thus easy to poke holes in or ignore. They are most likely to trigger the post-hoc bias, which invokes the stereotype that researchers work for months only to tell us things we already know.

Of course, a talented researcher can mitigate some of these issues. But there’s still the structural disadvantage that comes from asking mid-altitude questions that most cross-functional partners think they already have the answers to anyway. All of this erodes the real and perceived business value of even the best research. And we haven’t even gotten to the worst bit yet.