Chosen links

Links - 7th April 2019

The siren song of little languages

Some programming languages languish due to obscurity. They lack breathless blog posts exclaiming how much nicer they are to use.

Other languages are too ambitious. They aspire to support so many features that the original implementers struggle to get a first version working. For example, the type system in Fortress required constraint solving which took exponential time.

Sometimes a usable language struggles simply because it’s too much fun to write your own. Developers end up building their own implementation rather than actually using the language.

The most obvious implementation-focused language is BF. Despite having many implementations, BF programmers have to encourage the implementers to actually try using the language!

How design, architecture, and operation of modern systems conflict with GDPR

In this paper, we review GDPR from a system design perspective, and identify how its regulations conflict with the design, architecture, and operation of modern systems. We illustrate these conflicts via the seven privacy sins: storing data forever; reusing data indiscriminately; walled gardens and black markets; risk-agnostic data processing; hiding data breaches; making unexplainable decisions; treating security as a secondary goal. Our findings reveal a deep-rooted tussle between GDPR requirements and how modern systems have evolved. We believe that achieving compliance requires comprehensive, grounds up solutions, and anything short would amount to fixing a leaky faucet in a burning building.

Lost in translation: exposing hidden compiler optimization opportunities

Producing cost-effective software requires a high degree of productivity without sacrificing quality. The relationship between these two factors is complex. Typically, a careless increase in productivity aiming at reducing software development costs can harm quality, while poor end-product quality can lead to increased costs of deploying and maintaining an application. Furthermore, while application quality in terms of a reduced number of bugs being discovered is easy to quantify, assessing other quality metrics such as execution time and energy consumption is often less straightforward. For example, consider a mobile application that has been extensively tested and offers to the end user an almost bug-free experience and a good overall response time. Still, it is difficult to argue that this application has reached its maximum potential in regards to execution time and energy efficiency on a given architecture. Perhaps, there is another more energy-efficient version of the same application that provides the same functionality? Considering that this application may run on millions of devices, the aggregate effect of even small energy savings can be substantial.

Compilers are at the heart of software development. Their primary goal is to increase software productivity. They are a key element of the software stack, providing an abstraction between high-level languages and machine code. The challenge now lies with the compiler engineers as they have to support a vast amount of architectures and programming languages and adapt to their rapid advances. To mitigate this, modern compilers, such as the LLVM and GCC compilers, are designed to be modular. For example, they make use of a common optimizer across all the architectures and programming languages supported. The common optimizer exposes to the software developer a large number of available code optimizations via compiler flags; for example, the LLVM’s optimizer has 56 documented transformations. The challenge then becomes to select and order the flags to create optimization configurations that can achieve the best resource usage possible for a given program and architecture. Due to the huge number of possible flag combinations and their possible orderings, it is impractical to explore the complete space of optimization configurations. Thus, finding optimal optimization configurations is still an open challenge.

Another dimension to the problem is the non-disclosure of hardware implementation details by processor vendors. This has two serious implications. First, compilers are slow in adapting to architectural performance innovations. Even worse, in some cases legacy optimization techniques which performed well on previous hardware generations can actually perform poorly on newer hardware (for example, see if-conversion and interactions with branch prediction potential optimizations reported in Section 4.2). Secondly, programmers often have no clear view on the architecture’s and compiler’s internals and thus may produce code that is neither compiler nor architecture-friendly.

Retargetable compiler frameworks achieve their generality by abstracting target architecture properties and by relying on cross-target heuristics in the front and middle-end compilation passes. The abstract properties may be parameterized by quantitative characteristics of each actual target used, but the decision heuristics and the actual sequence of optimizations are often defined by experimentation and, once established, are seldom questioned in subsequent releases of the compiler framework. Therefore, evaluating the pertinence and the quality of the heuristics used in a compiler may provide valuable insights into the quality of the current compiler configuration and its potential for further improvement.

The standard approach to tuning a compiler’s common optimizer remains the repetitive testing of the compiler on a variety of benchmarks and mainstream architectures. This approach is typically called nightly regression testing, and it mainly aims at validating benchmark results in terms of correctness and improving performance (or, in some application domains, code size). The output of a nightly regression session is typically a report with information about the compilation time, execution time and the correctness of the output for each test. These results are then compared to a reference point, usually the result of a previous nightly-regression run that passed all the validation tests and exhibits the best achievable execution and compilation times so far. The purpose of nightly regression is to constantly monitor the quality of the modifications in a compiler towards the release of a new version.

All observed regressions (either correctness failures or significant degradations in the execution time of a benchmark relative to its reference point) have to be investigated by a compiler engineer. However, the detection of a regression does not offer any insights into what actually caused it and requires the engineer to manually examine and track the source of the problem. Depending on the engineer’s experience and the complexity of the issue, the identification of the root cause of a regression can be an extremely time-consuming task. Furthermore, a standard nightly regression system will only report regressions or improvements for individual tests, but will not directly pinpoint behaviors common across multiple benchmarks and architectures that can indicate hidden optimization opportunities.

A conversation with Adam Bosworth

Now, in the world we used to live in, client code generally lived on employees' desks, so when it broke it was bad — but it wasn’t really all that terrible. You just took your new code, redeployed it on all your employees' desks and life went on. But if we’re talking about applications that flow across the Internet, there’s no way you can go and redeploy every other application just because you change your mind about how your app is supposed to work. So applications can’t be designed in such a way that they depend on the implementation of other applications. There has to be absolute confidence that you can change the implementation of your app without breaking others. The funny thing is that when we did object-oriented computing, we assumed we were solving this problem by adding properties and encapsulations. So you didn’t actually know how data were being stored because they were encapsulated and hidden through methods. And we assumed that protected us from implementation changes. We also assumed we could deal with change because we had interfaces that we thought would support both a new interface and an old interface whenever you evolved an object. So, rather smugly, we thought: Yes, we’ve cracked the code. We now have an evolvable object model.

But we found out otherwise, didn’t we?

Yes, it turns out that there actually are two problems. An object is stateful and you have very fine-grained access precisely because you also have encapsulation. And what we learned through hard experience is that you can’t just call any method or set any property any time you want to. There are all sorts of rules that dictate how you talk to an object-providing choreography of sorts that you really have to observe. If you should ever change the order, you’ll often find that things don’t work at all. And no one really knows what the rules are because of the complexity of the interfaces, which are so fine-grained in the state machine that they’d be much too hard to describe and far too hard for anyone to understand. But given that you don’t know what the rules are, it’s as though you rented a house with a big fat lease full of fine print you couldn’t possibly read or understand.