Why "collapse" (not "rot") is the way to think about software problems

Originally published at: https://boingboing.net/2019/05/08/tech-debt-shear.html

1 Like

Sometimes the underlying reason for collapse is rot.


I rather like The Big Ball of Mud and the anti patterns therein


Well, he is right that there is a difference between collapse and rot. I like to think of collapse as being the action, and rot as merely one of many factors that can create a collapse. Your building may collapse because of damage during a storm (external attack), because the ground it was built upon has shifted (erosion), or because the building materials have degraded (rot), or simply because you put heavier things inside than the building was designed to handle (stress).


You missed the most likely culprit: the building may collapse if wasn’t engineered to withstand the use it was put to. (quality)


I considered that stress, as quality is more a case of using substandard materials in construction or not properly documenting the limits.

Having read the full article now, the conclusions of this article are kind of off-base. He could have easily managed those same issues with a free CM tool like Subversion to manage his external dependencies.
Just because they introduce a new version of Python doesn’t mean you’re required to move to it, unless there is a feature there you really want. And moving your software to a new platform is not the ground shifting under you; it’s you that’s shifting.
I personally maintain a post-flight telemetry product that’s around 15 years old, that I myself originally wrote when .NET was still young, with many others having worked on it over the years as well. We’ve never, ever had a collapse. I’m not saying it’s a particularly well-written piece of code. In fact, I cringe at some of the things we did back then. We’ve made massive internal changes over the years, for example moving to WCF for the client/server communications, but those changes were deliberate in order to achieve something and were often put off until an opportune time arrived. It’s not like the previous versions stopped working all of a sudden. The ground has never shifted under us; instead we choose to shift to new ground when that ground poses advantages. All it takes is a little bit of CM to tame that particular beast.


Yep, that’s basically the equivalent of moving your IKEA furniture into a different apartment. You need to keep track of the screws, hope that you still know how it fits together once you are in the new place, replace the bits that broke off in transit or no longer fit, get some newer and shinier stuff as well…


It seems like the article is focused on small-audience academic software, which is a fairly particular thing. If your application is built on top of Matlab + various open-source specialist libraries and plugins, that’s a very tall stack of blocks, and collapse is indeed its likely failure mode.

Commercial software development is more robust against that sort of thing. Developers mostly avoid depending on third-party components, and OS vendors are pretty serious about maintaining the foundations. Microsoft and Apple take almost diametrically opposite approaches to that, but they both put a lot of work into it.

I agree, though, that software is like buildings in that it lasts if it’s built to last. That’s why the 1990s vision of “rapid application development” tools sank beneath the waves; it turns out just stacking up a bunch of RVs isn’t a clever shortcut to a making a decent apartment building.


…we don’t really know how to make software that remains reliable when its underlying substrates change…

I’ve got no quarrel with the “collapse” metaphor, but this leads me to wonder: Do we know how to design any highly engineered, complex artifact that remains stable when the highly engineered, complex underlying substrates change?


Ah, “good code”. I’ve heard of that.

Is this really still true, or is it one of those legendary statistics that people like to toss around that doesn’t actually correspond to reality? I recall it was a major concern with Y2K.


It is true. I work for a company managing over $1B in assets, and our core systems are still Cobol running on a mainframe, and they are stable as a rock.


According to one of the linked articles -

I love how reality conflicts with the narrative we are always fed of technology and the pace of change and adoption of new modes, medias, and technologies increasing. The US air force has planes on the go for 60+ years, last ones rolled off the production line in early 60s and they are predicted to last until the plane is 100 years old! That’s unthinkable last century!


Don’t dis COBOL. It may be archaic and crappy to program, but at least it’s stable crap. In contrast, Python has quite a few good traits (esp. if you can get over the syntactically significant whitespace shudder). However, there are major language changes with every version. Right now, I have to keep versions 2.7, 3.5, and 3.7 on my system (and I’m not even developing software) because the apps I use are all built on different versions – which don’t interoperate.

A while back I was using Visual Python to animate a celestial mechanics phenomenon, until VP was abandoned for something completely different (Jupyter) that’s built on the assumption that the code will run remotely and display in a browser.

So, sometimes the ground under your building collapses because it’s been dug out from below.


or giving up and rebuilding the house every time it falls down.

Sometimes we give up and start over before we are even in production. Giant shit show over here.


Visited a client last month whose “rock solid” mainframe systems are maintained by a 79 year old who comes in 4 hrs a day. There’s nobody in line to replace him and no plan for succession or replacement.

Granted this is a management problem, not a technology problem - but I have seen this over-reliance on old platforms in many areas over my career


What you propose is freezing all dependencies, i.e. never update anything. That can indeed be a temporary solution (though in scientific computing, which is what my article is about, this rarely work). You can do it at the source code level, as you describe (and nowadays you can even rely on Software Heritage to archive the source for you if it’s public), or you can do it at the binary level and keep a virtual machine or container with your dependencies.

There are three reasons which cause this approach to fail in the end:

  1. The lowest-level foundation of your frozen software stack disappears. For example, you can’t get any new hardware that runs the operating system you rely on.
  2. You actually need a later version of some dependency, e.g. because of important bug fixes (security etc.).
  3. You can no longer build your software source code stack because of changes in build tools (compilers …)

You can try to fight the last cause by also freezing all build tools, but that becomes very difficult in terms of bookkeeping. Nowadays, Guix provides all it takes to do this. Time will tell if it works out in practice.


Back in the 1990’s I read somewhere (perhaps in an essay by Bruce Sterling about Japan in some tech magazine? the details escape me) that American programmers speak of “bugs”, while Japanese programmers speak of “spoilage”.

I know “spoilage” sounds like “rot” but that’s beside the point here. What I like about “spoilage” is that the blame is inherent in the software – as opposed to a “bug”, which artificially construes the problem as an externality.

I love me some Python - in fact, it’s the first language I’ve used that I can actually say I loved - and I really don’t get the hate on syntactically significant whitespace; needing to keep track of indentation is no more nonsensical than needing to keep track of curly braces, and is much easier to see.
All that being said, however, what the actual hell is up with all the “Ewwwwww! COBOL!” reactions? New and shiny it ain’t, and I doubt that anybody has ever actually loved working with it, but it is stable as a rock.


No, what we do is update dependencies intentionally. Like I said, you choose to move and do it deliberately, often while development continues in the trunk. The author made it out like because Python changed his software stopped working, when actually he broke his software moving the newer version of Python. Said another way, his house didn’t collapse one day, he picked it up and moved it, and when he put it down it collapsed. It’s a completely different discussion than Rot.

I haven’t seen number 1 happen in a very, very long time. Windows apps generally can go from version to version without too much difficulty (we have a product developed under Windows XP that still runs under Windows 10) so long as it’s not too hardware dependent. His app sounded like pretty much pure software.

2 is a good reason to move, but not an absolute requirement. For example, in his case it sounds like a standalone app, so security isn’t as big of an issue as with a client/server or cloud application.

3 is a maybe, although we’re still using Visual Studio 2010 here. However, at least in the Microsoft eco-system, moving from 2010 to 2017 was a complete non-issue.

Not sure you need any specific tools for this other than a good CM system and a halfway descent set of development processes.