Chosen links

Links - 8th October 2023

MMO Architecture: Source of truth, Dataflows, I/O bottlenecks and how to solve them

In online games, the source of truth of the state of the world is the in-memory world state, not the database.

A shallow survey of OLAP and HTAP query engines

Focused mostly on data layout and query execution. Query planning seems more or less the same as OLTP systems, and I’m ignoring distribution and transactions for now. Also see my full notes here.

It was hard to figure out what systems are even worth studying. There is so much money in this space. Search results are polluted with barely concealed advertising (eg “How to choose between FooDB and BarDB” hosted on foodb.com) and benchmarketing. Third-party benchmarks are crippled by that fact that most databases TOS prohibit publishing benchmarks. Besides which, benchmarking databases is notoriously error-prone.

Seamless integration of Parquet files into data processing

Relational database systems are still the most powerful tool for data analysis. However, the steps necessary to bring existing data into the database make them unattractive for data exploration, especially when the data is stored in data lakes where users often use Parquet files, a binary column-oriented file format.

This paper presents a fast Parquet framework that tackles these problems without costly ETL steps. We incrementally collect information during query execution. We create statistics that enhance future queries. In addition, we split the file into chunks for which we store the data ranges. We call these synopses. They allow us to skip entire sections in future queries.

We show that these techniques only add a minor overhead to the first query and are of benefit for future requests. Our evaluation demonstrates that our implementation can achieve comparable results to database relations and that we can outperform existing systems by up to an order of magnitude.

How to clear cache and cookies on a customer’s device

A relatively new HTTP header, available in most modern browsers, allows developers to declaratively clear data associated with a given origin via one simple response header: clear-site-data.

TBM 246: why didn’t they say no?

When they said no in the past, it felt like they were being hauled into the courthouse and put on the defendant’s stand. Saying no involves a deep test of wills, being doubted, and being questioned over every minute detail. It gets tiring, so they just say yes to make people disappear.

Someone has told them that saying No is not an option, probably because it would be too painful and politically damaging to have to defend the team’s decision. It is better to say yes and let everything slip a bit than to say No.

pg_stat_io and PostgreSQL 16 performance

Learn about pg_stat_io 's debugging power: PostgreSQL 16 blesses users around the world with many features which ensure an even better user experience. One of those features is a system view called pg_stat_io. It offers deep insights into the I/O behavior of your PostgreSQL database server. From PostgreSQL 16 onward, it will make it far easier to debug and assess performance-related problems.

Change data capture for microservices

In this talk, I would like to talk about one concept and one tool, Change Data Capture, which can help us to build software and systems which live up to this real time promise. This is what we are going to talk about, Change Data Capture as a tool, as part of our toolbox. Secondly, a few use cases in the context specifically of microservices for Change Data Capture, or CDC for short. Lastly, also I want to talk a little bit about some of the challenges which you might encounter when deploying CDC into practice