Chosen links

Links - 14th April 2019

The deliberate revolution

This article attempts to dive beneath the hype, examining how XML web services differ from existing architectures, and how they might help build customer solutions. Let’s begin by describing features of XML web services, which:

  • Are not remote procedure calls. Now to stir the pot a little: good web services are not modeled as remote procedure calls (RPCs). Section 7 of the SOAP 1.1 specification describes how to express a remote procedure call in the Body element of a SOAP message. This is a useful mechanism for tunneling RPCs between systems, but SOAP-RPC is a poor approach to designing an interoperable web service. RPC is primarily designed to support object invocation between tightly bound but topologically distributed systems. Yes, you can still achieve interoperability while using “RPC/Encoded” SOAP messages, but the focus on objects and method invocation is fundamentally contrary to the document-centric philosophy of web services. SOAP messages can be in two styles, “RPC” and “Document”, and can use two serialization formats, “Encoded” and “Literal”. In practice, SOAP-RPC uses “RPC/Encoded”, while document-centric web services use “Document/Literal”. By failing to focus on the service contract, you provide for brittle integration. Minor changes to a method signature automatically get propagated to the service, causing current clients to break. Great toolkits exist for doing tightly-bound RPC over SOAP from various vendors. If this is what you are looking for, find the one that works best for your platform. But if you are trying to design for interoperability, design your messages first and then write your methods to support them, not the other way around.

  • Is not CORBA. Making the messages and the service contracts the design center of web services is the fundamental difference between the web services architecture and CORBA. There are strong analogies between elements of the two architectures, such as the respective roles of WSDL and IDL, but CORBA is fundamentally object-oriented. The messages CORBA passes are manipulated by instantiating an object. The document-style messages used in web services offer more flexibility for manipulation; for example, an interception service (more on this pattern later) might operate on one document header element, without having the logic to understand the rest of the document. As the technologies of web service rapidly evolve in the years ahead, standards bodies, including the W3C and IETF, will be instrumental in smoothing the rougher edges of the web service specifications. But these technologies are mature enough — and widely adopted enough — to be actionable today.

Building a PostgreSQL load tester

When faced with a technical problem it’s often better to use an existing tool than jump into writing one yourself. Having benchmarked Postgres clusters before, I was already familiar with a tool called pgreplay that I thought could do the job.

My benchmarking strategy with pgreplay is pretty simple: first capture logs from your production cluster that contain all executed queries then feed this to pgreplay which will replay those queries against the new cluster. Post-processing logs from the new cluster will show how the machines performed under production load, helping determine if changes are going to degrade performance.

This process had worked well before but broke as soon as I applied it to this migration. Watching Postgres during a replay, there were spikes of activity followed by periods of quiet.

The new cluster was sufficiently different that several queries were now much slower than they had been in the original log capture. pgreplay will ensure queries execute in their original order- any queries that take longer to execute in the replay will block subsequent queries. The graph shows how several badly degraded queries caused pgreplay to stall, leading to the periods of inactivity.

Benchmarks can take several hours to execute and having the replay stall for problematic queries adds more time to an already slow process. The inactivity also impacts the realism of your tests- users don’t respond to a system under load by forming a queue and politely waiting for their first query to finish!

A conversation with Jim Gray

There is this really elegant theory about transactions, having to do with concurrency and recovery. A lot of people had implemented these ideas implicitly. I wrote a mathematical theory around it and explained it and did a fairly crisp implementation of the ideas. The Turing Award committee likes crisp results. The embarrassing thing is that I did it with a whole bunch of other people. But I wrote the papers, and my name was often first on the list of authors. So, I got the credit.

But to return to your question, the fundamental premise of transactions is that we needed exception handling in distributed computations. Transactions are the computer equivalent of contract law. If anything goes wrong, we’ll just blow away the whole computation. Absent some better model, that’s a real simple model that everybody can understand. There’s actually a mathematical theory lying underneath it.

In addition, there is a great deal of logic about keeping a log of changes so that you can undo things, and keeping a list of updates, called locks, so that others do not see your uncommitted changes.The theory says: "`If that’s what you want to do, this is what you have to do to get it

The algorithms are simple enough so most implementers can understand them, and they are complicated enough so most people who can’t understand them will want somebody else to figure it out for them.

AWS security maturity roadmap

Planning for Disaster Recovery involves a trade-off between how quickly you need to recover and how much money you’re willing to spend

Not all of the public AMIs (Amazon Machine Images) used to create EC2s are vetted by AWS, and there is not an easy way to identify the trust-worthiness of the source account ID that an AMI comes from. There have been issues27 in the past of public AMIs with malware in them. You should identify what AMIs you want to be allowed to run in your environment. You should also document guidelines around the instance types to be used and the OS’s, so you don’t end up with a a dozen flavors of Linux running in your accounts without good reason.

Separating use and reuse to improve both

In Java, C++, Scala and C#, subclassing implies subtyping. A Java subclass declaration, such as class A extends B {} does two things at the same time: it inherits code from B; and it creates a subtype of B. Therefore a subclass must always be a subtype of the extended class. Such design choice where subclassing implies subtyping is not universally accepted. Historically, there has been a lot of focus on separating subtyping from subclassing. This separation is claimed to be good for code-reuse, design and reasoning. There are at least two distinct situations where the separation of subtyping and subclassing is helpful.

Allowing inheritance/reuse even when subtyping is impossible: Situations where inheritance is desirable are prevented by the enforced subtyping relation. A well-known example are the so-called binary methods. For example, consider a class Point with a method Point sum(Point o){return new Point(x+o.x,y+o.y);}. Can we reuse the Point code so that ColorPoint.sum would take and return a ColorPoint? In Java/C# declaring class ColorPoint extends Point{…} would result in sum still taking a Point and returning a Point. Moreover, manually redeclaring a ColorPoint sum(ColorPoint that) would just induce overloading, not overriding. In this case we would like to have inheritance, but we cannot have (sound) subtyping.

Preventing unintended subtyping: For certain classes we would like to inherit code without creating a subtype even if, from the typing point of view, subtyping is still sound. A typical example is Sets and Bags. Bag implementations can often inherit from Set implementations, and the interfaces of the two collection types are similar and type compatible. However, from the logical point-of-view a Bag is not a subtype of a Set.

Structural typing may deal with the first situation, but not the second. Since structural subtyping accounts for the types of the methods only, a Bag would be a subtype of a Set if the two interfaces are type compatible. For dealing with the second situation, nominal subtyping is preferable: an explicit subtyping relation must be signalled by the programmer. Thus if subtyping is not desired, the programmer can simply not declare a subtyping relationship.

While there is no problem in subtyping without subclassing, in most nominal OO languages subclassing fundamentally implies subtyping. This is because of what we call the this-leaking problem, illustrated by the following (Java) code, where method A.ma passes this as A to Utils.m. This code is correct, and there is no subtyping/subclassing.

class A{ int ma(){ return Utils.m(this); } }
class Utils{ static int m(A a){/*…*/} }

Now, lets add a class B:

class B extends A{ int mb(){return this.ma();} }

We can see an invocation of A.ma inside B.mb, where the self-reference this is of type B. The execution will eventually call Utils.m with an instance of B. However, this can be correct only if B is a subtype of A.

Suppose Java code-reuse (the extends keyword) did not introduce subtyping: then an invocation of B.mb would result in a run-time type error. The problem is that the self-reference this in class B has type B. Thus, when this is passed as an argument to the method Utils.m (as a result of the invocation of B.mb), it will have a type that is incompatible with the expected argument of type A. Therefore, every OO language with the minimal features exposed in the example (using this, extends and method calls) is forced to accept that subclassing implies subtyping.

What the this-leaking problem shows is that adopting a more flexible nominally typed OO model where subclassing does not imply subtyping is not trivial: a more substantial change in the language design is necessary. In essence we believe that in languages like Java, classes do too many things at once. In particular they act both as units of use and reuse: classes can be used as types and can be instantiated; classes can also be subclassed to provide reuse of code.

A crash course in compilers

I love languages because, of everything I’ve encountered in computing, languages are by far the weirdest. They combine the brain-bending rigor of abstract math, the crushing pressures of capitalistic industry, and the irrational anxiety of a high school prom. The decision to adopt or avoid a language is always a mix of their perceived formal power (“Does this language even have this particular feature?”), employability (“Will this language get me a job?”), and popularity (“Does anyone important use this language anymore?”). I can’t think of another engineering tool that demands similar quasi-religious devotion from its users. Programming languages ask us to reshape our minds, and that makes them deeply personal and subjective.

Efficiency machines: the operating system recontextualized

Operating Systems are boring, monotonous spaces. Ever since the personal computer was popularized, function has been the standard priority. Attempts to personalize the OS, such as Microsoft Bob and Packard Bell Navigator, never really caught on. Their ideas didn’t translate well to the tech available at the time, and were regarded as failures since they were slower and clunkier to use when compared to the desktop interface. As commercial OS design evolved, interfaces became more branded and conventional. Creative expression was relegated to decoration with wallpapers, themes and widgets.

Treating operating systems as mere productivity tools dismisses how personal they are. We don’t use computers just to work. We play, communicate and create with them. It ends up being a environment filled with our memories and art, which means there is a lot of potential to be explored.

The relentless optimism often present in our lives is a tool of oppression, one that aims to shame those who aren’t as productive as society wants to. When we reach out for help, the most common answers are: “You must try harder.”, “You’re exaggerating.” and, of course, “Everything is going to be OK.” Instead of learning how to heal from pain, we are taught to hide it and to fight through it. Because no trauma is harsh enough for capitalism to stop milking all the labor it can get.

There’s a parallel here with the design philosophy of operating systems. An illusion of freedom is sold to the user, one where they can do whatever they want inside the OS. While ad campaigns and mission statements attempt to convince us that these systems aim to empower us, their real goal is to increase the amount of labor they can exploit. Cleaner interfaces and faster startup times are sold as main selling points of systems that are also tracking our data, limiting what software we can use and even slowing down themselves down to force us to buy a new version.

This allows them to distract us from who is really controlling these environments, much like neoliberalism champions individuality to divert us from questioning it. If the only thing stopping us from having better lives is our own decisions, we don’t have anyone else to blame but ourselves. An OS who actively gets in your way, then, forces you to see beyond this illusion of control.