17/09/2017

Are Delayed Issues Harder to Resolve? Revisiting Cost-to-Fix of Defects throughout the Lifecycle

Many practitioners and academics believe in a delayed issue effect (DIE); i.e. the longer an issue lingers in the system, the more effort it requires to resolve. This belief is often used to justify major investments in new development processes that promise to retire more issues sooner.

This paper tests for the delayed issue effect in 171 software projects conducted around the world in the period from 2006–2014. To the best of our knowledge, this is the largest study yet published on this effect. We found no evidence for the delayed issue effect; i.e. the effort to resolve issues in a later phase was not consistently or substantially greater than when issues were resolved soon after their introduction.

This paper documents the above study and explores reasons for this mismatch between this common rule of thumb and empirical data. In summary, DIE is not some constant across all projects. Rather, DIE might be an historical relic that occurs intermittently only in certain kinds of projects. This is a significant result since it predicts that new development processes that promise to faster retire more issues will not have a guaranteed return on investment (depending on the context where applied), and that a long-held truth in software engineering should not be considered a global truism.

Nail Polish Bot
tumblr oso8beQgxs1uo5d9jo1 540

One thing that immediately sets Nail Polish Bot apart from other Twitter bots is that it uses 3D raytracing to generate its images. Which significantly ups the implementation complexity (it uses povray) but also expands the systemic complexity behind the image.

Delivering Billions of Messages Exactly Once

The single requirement of all data pipelines is that they cannot lose data. Data can usually be delayed or re-ordered–but never dropped.

To satisfy this requirement, most distributed systems guarantee at-least-once delivery. The techniques to achieve at-least-once delivery typically amount to: “retry, retry, retry”. You never consider a message ‘delivered’ until you receive a firm acknowledgement from the consumer.

But as a user, at-least-once delivery isn’t really what I want. I want messages to be delivered once. And only once.

Unfortunately, achieving anything close to exactly-once delivery requires a bullet-proof design. Each failure case has to be carefully considered as part of the architecture–it can’t be “bolted on” to an existing implementation after the fact. And even then, it’s pretty much impossible to have messages only ever be delivered once.

In the past three months we’ve built an entirely new de-duplication system to get as close as possible to exactly-once delivery, in the face of a wide variety of failure modes.

The new system is able to track 100x the number of messages of the old system, with increased reliability, at a fraction of the cost. Here’s how.

duplication is far cheaper than the wrong abstraction

I started asking questions and came to see the following pattern:

  1. Programmer A sees duplication.

  2. Programmer A extracts duplication and gives it a name.
    This creates a new abstraction. It could be a new method, or perhaps even a new class.

  3. Programmer A replaces the duplication with the new abstraction.
    Ah, the code is perfect. Programmer A trots happily away.

  4. Time passes.

  5. A new requirement appears for which the current abstraction is almost perfect.

  6. Programmer B gets tasked to implement this requirement.
    Programmer B feels honor-bound to retain the existing abstraction, but since isn’t exactly the same for every case, they alter the code to take a parameter, and then add logic to conditionally do the right thing based on the value of that parameter.
    What was once a universal abstraction now behaves differently for different cases.

  7. Another new requirement arrives.
    Programmer X.
    Another additional parameter.
    Another new conditional. Loop until code becomes incomprehensible.

  8. You appear in the story about here, and your life takes a dramatic turn for the worse.

modules + network = microservices

Really, It is more important to build a system that admits microservices, than it is to built out of them entirely. Once you admit something is running across the network, it isn’t much of a stretch to admit it running on a different service entirely. Without a common framework or ecosystem for microservices, the maintenance burden will outweigh many potential benefits.

A well engineered distributed system will likely have some elements of loose coupling, uniformity, and modularity, all essential for making microservices successful. The real question is not “should I write my system as microservices”, but “What sort of modules should I break my system into” and “what benefit is there from running it as a distinct service”.