Chosen links

Links - 5th September 2021

Agile as trauma

The Agile Manifesto is an immune response on the part of programmers to bad management. The document is an expression of trauma, and its intellectual descendants continue to carry this baggage. While the Agile era has brought about remarkable advancements in project management techniques and development tools, it remains a tactical, technical, and ultimately reactionary movement. As long as Agile remains in this position it will be liable to backfire, vulnerable to the very depredations of bad management it had initially evolved to counter.

Partitioning use cases with PostgreSQL

To determine the best execution plan, Postgres makes decisions based on statistics. They are obtained from a sample of the table (the default_statistic_target which is 100 by default).

By default, postgres will collect 300 × default_statistic_target rows, that is 30 000 rows. With a table of several hundred million rows, this sample is sometimes too small.

We can drastically increase the sample size, but this approach has some drawbacks:

  • It increases the planning time

  • It makes the ANALYZE more heavy.

  • Sometimes it is not enough if the data are not well distributed. For example, if you take a few hundred thousand rows from a table with several hundred million rows, you may miss the rows that are in delivery status.

With partitioning, we could have the same sample but per partition, which allows us to increase the accuracy.

This would also be useful when we have correlated data between columns. I will take the example of orders. We have a whole year’s worth of orders: all the orders that are more than one month old are delivered, those of the last month are 90% delivered (10% are in progress).

Intuitively, if I look for an order in progress more than 6 months ago, I should not get any result. On the other hand, if I search for orders in progress for the last month, I should get 10% of the table. But postgres doesn’t know that, for it, the orders in progress are spread over the whole table.

With a partitioning by date, it can estimate that there are no orders in progress for deliveries of more than one month. This approach is mainly used to reduce an estimation error in an execution plan.

Devious SQL: Message Queuing Using Native PostgreSQL

Hmm, let’s think. The DELETE statement is able to return data to us as if it were a SELECT statement via the RETURNING clause, however DELETE lacks a LIMIT clause, nor can we take row locks on it explicitly. That said, we can use the USING clause to join to itself and get both limits and row locks. This gives us a final query as follows:

DELETE FROM
    queue_table
USING (
    SELECT * FROM queue_table
    LIMIT 10
    FOR UPDATE
    SKIP LOCKED
) q
WHERE q.id = queue_table.id
RETURNING queue_table.*;

We are using the self-join method of deletion here with a subquery to both lock the underlying rows being returned, as well as deleting the rows and returning all values for the set of rows as if this were an original SELECT statement.

Writing, technically

I feel like the idea that you can write software that’s so clear, that it doesn’t need any documentation, hinges upon some delusion about how clean you can make the abstractions. I think you can always fantasize about having an extremely fancy type system which would capture more of the properties. People work on type systems that capture performance characteristics so you could know some upper bound on the asymptotic complexity of the operations, just by the type signature. But that really is a fantasy. There are no realistic type systems that do a good job of this. And any type system that people use in a real language that actually is practical has quite severe limitations on its expressiveness, on how much it can tell you about what’s actually going on in the code.

If you write documentation that’s very full of examples and very detail-driven, it can be a real slog to get through. It can just be long. And there’s something lovely about documentation that just says the absolute minimal thing that an educated, or at least a particular kind of educated consumer wants in order to understand what’s going on. I often feel this way when I read the documentation about a new programming language or a new web framework or whatever. And I feel like I understand this space pretty well. If I talked to the right person, they could give me seven sentences and I would suddenly know exactly in what quadrant of the design space this thing was and how they’re approaching it. And I would immediately get a lot of knowledge very quickly and very efficiently. And it would be great for me, and for other people it would be utterly useless.