Nix : un modèle de déploiement de logiciel purement fonctionnel

Nix est un gestionnaire de composant pour Linux permettant d’obtenir des déploiements fiables et reproductibles, il est à la base d’une distribution de Linux appelée NixOS.

Le document de référence sur Nix est la thèse de doctorat de son créateur auteur : Eelco Dolstra.

La thèse n’est pas uniquement centrée sur l’outil mais surtout une présentation des différents problèmes qui se posent lorsqu’on veut construire et déployer des composants qui dépendent les uns des autres.

Si le cas étudié est principalement celui de bibliothèques ou d’application Linux, il est tout aussi adapté à d’autres types de composants, comme des services dans un SI.

À l’inverse des solutions de type conteneurs qui visent à des installations isolées, l’idée de Nix est de permettre de faire des installations et des mises à jour sur un système partagé mais sans interférence, ce qui permet le partage efficace de ressource.

L’idée principale est de faire des installations parallèles des différentes versions des composants dans des répertoires séparés, et d’utiliser différents mécanismes — comme la gestion du PATH — pour faire en sorte que chaque élément ne puisse accéder qu’aux bonnes versions de ses dépendances.

L’ambition de Nix est de pouvoir fonctionner avec des briques applicatives existantes, l’écosystème Linux, en ayant un minimum d’impact sur ces briques. Elle prend donc en compte les questions de gestion du legacy.

Deux éléments mis en lumière par le texte et qui m’ont particulièrement parlé :

L’adhérence masquée

Il est très difficile de maîtriser toutes les adhérences entre un composant qui n’a pas été conçu pour ça — comme c’est le cas de l’essentiel des briques logicielles Linux — et l’environnement où il a été construit.

Cela peut porter par exemple par des informations intégrées dans le composant (des chemins de fichiers, variables d’environnements…) ou des dépendances mal identifiées à des bibliothèques installées sur les systèmes servant à la construction mais pas celles qui serviront au déploiement.

Toutes ces inconnues augmentent le risques ou la complexité du déploiement, en plus de rendre certains audits de sécurité plus difficiles.

Nix a participé à donner de la visibilité à ce sujet, qui a depuis été pris en compte par différentes distributions Linux comme Debian.

La compatibilité

Tester la compatibilité entre versions de composants est souvent plus difficile qu’on ne le croit, surtout lorsqu’on ne dispose pas des tests automatisés adéquats.

Les personnes qui ont goûté au DLL Hell en ont probablement gardé de mauvais souvenirs.

Mettre à jour des bibliothèque partagées lorsque les chaînes de dépendances sont longues est un exercice risqué.

Nix propose une approche de mise à jour transactionnelle, avec gestion du retour en arrière, qui s’approche beaucoup de ce qu’on peut faire dans certains SI.

Conclusion

La thèse se lit très bien et les solutions proposées, dans leurs forces et leurs faiblesses, sont convaincantes.

C’est un très bon tour d’horizon aux questions de build et de déploiement.

Je la conseillerai aux personnes, qu’elles soient du côté architecture, développement ou administration système, qui s’intéressent à ces sujets.

Cela m’a donné envie que Nix soit un succès, même si je n’en ai pas l’usage personnellement.

Enfin l’utilisation d’un langage de description purement fonctionnel pour les descriptifs des composants donne un cas d’usage où ce type d’outil semble très bien adapté, assez éloigné de ce que j’ai l’impression de voir d’habitude.

Quelques citations

This thesis is about getting computer programs from one machine to another — and having them still work when they get there. This is the problem of software deployment. Though it is a part of the field of Software Configuration Management (SCM), it has not been a subject of academic study until quite recently. The development of principles and tools to support the deployment process has largely been relegated to industry, system administrators, and Unix hackers. This has resulted in a large number of often ad hoc tools that typically automate manual practices but do not address fundamental issues in a systematic and disciplined way.

This is evidenced by the huge number of mailing list and forum postings about deployment failures, ranging from applications not working due to missing dependencies, to subtle malfunctions caused by incompatible components. Deployment problems also seem curiously resistant to automation: the same concrete problems appear time and again. Deployment is especially difficult in heavily component-based systems—such as Unix-based open source software — because the effort of dealing with the dependencies can increase super-linearly with each additional dependency.

Software deployment is the problem of managing the distribution of software to end-user machines. That is, a developer has created some piece of software, and this ultimately has to end up on the machines of end-users. After the initial installation of the software, it might need to be upgraded or uninstalled.

Presumably, the developer has tested the software and found it to work sufficiently well, so the challenge is to make sure that the software works just as well, i.e., the same, on the end-user machines. I will informally refer to this as correct deployment: given identical inputs, the software should behave the same on an end-user machine as on the developer machine.

This should be a simple problem. For instance, if the software consists of a set of files, then deployment should be a simple matter of copying those to the target machines. In practice, deployment turns out to be much harder. This has a number of causes. These fall into two broad categories: environment issues and manageability issues.

Environment issues The first category is essentially about correctness. The software might make all sorts of demands about the environment in which it executes: that certain other software components are present in the system, that certain configuration files exist, that certain modifications were made to the Windows registry, and so on. If any of those environmental characteristics does not hold, then there is a possibility that the software does not work the same as it did on the developer machine. Some concrete issues are the following:

A software component is almost never self-contained; rather, it depends on other components to do some work on its behalf. These are its dependencies. For correct deployment, it is necessary that all dependencies are identified. This identification is quite hard, however, as it is often difficult to test whether the dependency specification is complete. After all, if we forget to specify a dependency, we don’t discover that fact if the machine on which we are testing already happens to have the dependency installed.

Dependencies are not just a runtime issue. To build a component in the first place we need certain dependencies (such as compilers), and these need not be the same as the runtime dependencies, although there may be some overlap. In general, deployment of the build-time dependencies is not an end-user issue, but it might be in source-based deployment scenarios; that is, when a component is deployed in source form. This is common in the open source world.

Dependencies also need to be compatible with what is expected by the referring component. In general, not all versions of a component will work. This is the case even in the presence of type-checked interfaces, since interfaces never give a full specification of the observable behaviour of a component. Also, components often exhibit build-time variability, meaning that they can be built with or without certain optional features, or with other parameters selected at build time. Even worse, the component might be dependent on a specific compiler, or on specific compilation options being used for its dependencies (e.g., for Application Binary Interface (ABI) compatibility).

Even if all required dependencies are present, our component still has to find them, in order to actually establish a concrete composition of components. This is often a rather labour-intensive part of the deployment process. Examples include setting up the dynamic linker search path on Unix systems, or the CLASSPATH in the Java environment.

Components can depend on non-software artifacts, such as configuration files, user accounts, and so on. For instance, a component might keep state in a database that has to be initialised prior to its first use.

Components can require certain hardware characteristics, such as a specific processor type or a video card. These are somewhat outside the scope of software deployment, since we can at most check for such properties, not realise them if they are missing.

Finally, deployment can be a distributed problem. A component can depend on other components running on remote machines or as separate processes on the same machine. For instance, a typical multi-tier web service consists of an HTTP server, a server implementing the business logic, and a database server, possibly all running on different machines.

So we have two problems in deployment: we must identify what our component’s requirements on the environment are, and we must somehow realise those requirements in the target environment. Realisation might consist of installing dependencies, creating or modifying configuration files, starting remote processes, and so on.

Manageability issues The second category is about our ability to properly manage the deployment process. There are all kinds of operations that we need to be able to perform, such as packaging, transferring, installing, upgrading, uninstalling, and answering various queries; i.e., we have to be able to support the evolution of a software system. All these operations require various bits of information, can be time-consuming, and if not done properly can lead to incorrect deployment. For example:

When we uninstall a component, we have to know what steps to take to safely undo the installation, e.g., by deleting files and modifying configuration files. At the same time we must also take care never to remove any component still in use by some other part of the system.

Likewise, when we perform a component upgrade, we should be careful not to overwrite any part of any component that might induce a failure in another part of the system. This is the well-known DLL hell, where upgrading or installing one application can cause a failure in another application due to shared dynamic libraries. It has been observed that software systems often suffer from the seemingly inexplicable phenomenon of “bit rot,” i.e., that applications that worked initially stop working over time due to changes in the environment.

Administrators often want to perform queries such as “to what component does this file belong?”, “how much disk space will it take to install this component?”, “from what sources was this component built?”, and so on.

Maintenance of a system means keeping the software up to date. There are many different policy choices that can be made. For instance, in a network, system administrators may want to push updates (such as security fixes) to all client machines periodically. On the other hand, if users are allowed to administer their own machines, it should be possible for them to select components individually.

When we upgrade components, it is important to be able to undo, or roll back the effects of the upgrade, if the upgrade turns out to break important functionality. This requires both that we remember what the old configuration was, and that we have some way to reproduce the old configuration.

In heterogeneous networks (i.e., consisting of many different types of machines), or in small environments (e.g., a home computer), it is not easy to stay up to date with software updates. In particular in the case of security fixes this is an important problem. So we need to know what software is in use, whether updates are available, and whether such updates should be performed.

Components can often be deployed in both source and binary form. Binary packages have to be built for each supported platform, and sometimes in several variants as well. For instance, the Linux kernel has thousands of build-time configuration options. This greatly increases the deployment effort, particularly if packaging and transfer of packages is a manual or semi-automatic process.

Since components often have a huge amount of variability, we sometimes want to expose that variability to certain users. For instance, Linux distributors or system administrators typically want to make specific feature selections. A deployment system should support this.

Package management is a perennial problem in the Unix community. In fact, entire operating system distributions rise and fall on the basis of their deployment qualities. It can be argued that Gentoo Linux’s quick adoption in the Linux community was entirely due to the perceived strengths of its package management system over those used by other distributions. This interest in deployment can be traced to Unix’s early adoption in large, advanced and often academic installations (in contrast to the single PC, single user focus in the PC industry in a bygone era).

Also, for better or for worse, Unix systems have traditionally insisted on storing components in global namespaces in the file system such as the /usr/bin directory. This makes management tools indispensable. But more importantly, modern Unix components have fine-grained reuse, often having dozens of dependencies on other components. Since it is not desirable to use monolithic distribution (as is generally done in Windows and Mac OS X, as discussed below), a package management tool is absolutely required to support the resulting deployment complexity. Therefore Unix (and specifically, Linux) package management is what we will look at first.

As we shall see, conventional deployment tools treat the file system as a chaotic, unstructured component store, similar to how an assembler programmer would treat memory. In contrast, modern programming languages impose a certain discipline on memory, such as rigidly defined object layouts and prohibitions against arbitrary pointer formation, to enable features such as garbage collection and pointer safety. The idea is that by establishing a mapping between notions in the two fields, solutions from one field carry over to the other. In particular, the techniques used in conservative garbage collection serve as a sort of apologia for the hash scanning approach used to find runtime dependencies.

The main objective of the research described in this thesis was to develop a system for correct software deployment that ensures that the deployment is complete and does not cause interference. This objective was successfully met in the Nix deployment system, as the experience with Nixpkgs described in Section 7.1.5 has shown.

The objective of improving deployment correctness is reached through the two main ideas described in this thesis. The first is the use of cryptographic hashes in Nix store paths. It gives us isolation, automatic support for variability, and the ability to determine runtime dependencies. This however can be considered an (important) implementation detail — maybe even a “trick”. However, it addresses the deployment problem at the most fundamental level: the storage of components in the file system.

The second and more fundamental idea is the purely functional deployment model, which means that components never change after they have been built and that their build processes only depend on their declared inputs. In conjunction with the hashing scheme, the purely functional model prevents interference between deployment actions, provides easy component and composition identification, and enables reproducibility of configurations both in source and binary form — in other words, it gives predictable, deterministic semantics to deployment actions.