Rogers (one of the major Internet providers in Canada) is having major issues today. It is also demonstrating the fragility of our infrastructure.
I called Rogers, but apparently they had already turned their system off and turned it back on again.
Look, I know itâs a stressful time for them, but they didnât need to be that rude.
In the chaos, some return to the old ways.
Having had theâŚpleasureâŚof being involved in much smaller scale cases of âthe redundancy you thought you had has just turned into complexity for the root cause analysis and fix to restore a business critical system that The Business wants back up 10 minutes agoâ(thankfully none of them were ones I designed, so I got a pat on the head for helping out rather than a pink slip) I would like to express my empathy for the techs that are no doubt having the time from hell right now.
That out of the way; I would also express my sadness and disappointment at the fact that things that are so critical(and are certainly represented as being trustworthy infrastructure) have proven to be so brittle.
Say what you want about the constant, looming, threat of global thermonuclear annihilation; but at least it did inspire some thinking about network resilience and redundancy among team ARPANET that todayâs focus on cost-optimization just doesnât hammer home quite the same way.
Iâd be interested to know how many of the âoldâ systems are actually old system far enough back to HQ to be useful(probably at least some, since surviving payphones most likely survive due to neglect rather than ongoing upgrades); vs. how many look like they stepped out of another era but are actually maybe 100 meters of copper back to a utility box with a fiber converter that has 10 minutes of battery backup; or just an LTE-to-copper converter box that lets you use your beloved rotary phone just like the old days; but which will fall over and die exactly as fast as your cellphone; because it is one.
I picked Verizonâs example specifically because, after Hurricane Sandy destroyed a bunch of copper infrastructure they pushed to just not replace it and simply offer fixed location cellular to anyone looking for classic PSTN in that area; with the attendant reduction in capital costs and change in regulatory framework(mostly in their favor).
Obviously that anecdote isnât directly relevant to the status of these payphones; but I suspect that the same incentives are at play: telcos are happy enough to milk legacy copper; but they donât much like maintaining it; and (depending on their spectrum ownership and competitive situation) generally prefer to either try to ditch it entirely and just sell cell service; or upgrade it to fiber(absolutely better performance; typically a major regression in terms of levels of battery backup versus old-school copper and giant telco office battery farms).
A good observation:
On the other hand, there was a couple days of quiet when Russia blocked western social media.
I went and had coffee on Friday at a cafe and it was after the peak chaos. The manager was going through the point of sale records with the staff and it was a mix of âthey will be back later with cashâ and âthey gave up and leftâ and âthey ended up having enough cashâ. It sounded like they were not as magnanimous with extending credit as the two above and were limiting it to regulars. However the accounting for that morning sounds like it will involve a bunch of hand waving and saying âclose enough â.
For those that did not see: the blame for the outage is being placed on maintenanceâŚ. Again (happened 15months ago as well).
I was on a road trip that day and it was chaos. You couldnât buy anything on debit, ATMs were down, and a lot of gas pumps were down. I think we all didnât realize how much back end financial infrastructure depended on the Rogers cell network. Luckily credit cards are on a different system and those still worked everywhere.
CBC One did a call in show about the issue that was insightful. IT infrastructure experts called in to say that the issue was that Rogers didnât have a rollback plan for the huge upgrade they were doing. Thatâs IT 101: before doing an upgrade (especially a big one) make sure you can roll back instantly to the old system if something goes wrong. This is what Rogers seemingly failed to do, and this was a huge upgrade they were doing.
The other really good point raised is that this should put a bullet in the attempted merger of Rogers and Shaw. The infrastructure is clearly already too concentrated. Itâll get even more fragile and expensive if we let the big three telcos consolidate further.
I guessed an Update From Hell as the most likely. I think also that theyâve centralized the access to the border routers so that a surviving segment canât skip over to the outside Internet without going through their main (dead) hub. There might be security reasons for that, but it seems more like business logic beating out good technical sense.
Somewhat unlikely. I mean, that never happens. Anywhere. Ever.
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.