MS Excel was created in Hell to torture humanity.
Paging @LutherBlisset for why this shouldn’t have been done in excel. I seem to recall a rant on the topic back in the summer.
Yeah. Take THAT, world! We have some interns, some copies of Excel, a shell company that pays tax in Antigua, and we are a deep data mining company. That will be £40 million (or whatever), please. Ka-ching! One of the interns suggested we ought to have a checksum or something. Soon got shot of that troublemaker, right?
Cronyism at its finest.
The perfect software for the Tories, then. Heavens forfend they spend precious dollars creating a robust and proper database and simple front end to track these things. Why, it might have cost them 150 pounds!
Maybe I’m not cynical enough, but that sounds too ridiculous, even for contractors engaged by this government.
The Guardian’s report is different: it wasn’t the number of cases that tipped things over the edge, it was the number of tests.
In this case, the Guardian understands, one lab had sent its daily test report to PHE [Public Health England] in the form of a CSV file – the simplest possible database format, just a list of values separated by commas. That report was then loaded into Microsoft Excel, and the new tests at the bottom were added to the main database.
But while CSV files can be any size, Microsoft Excel files can only be 1,048,576 rows long. When a CSV file longer than that is opened, the bottom rows get cut off and are no longer displayed. That means that, once the lab had performed more than a million tests, it was only a matter of time before its reports failed to be read by PHE.
Now let’s be fair here. Excel has a lot of fine qualities which make it useful for any number of tracking and listing uses, and once you know how to create complex formulas, the entire platform really begins to sing. Combine that with nice graphics and fonts, and you can really create some useful documents, like this spreadsheet I created to share with writing friends so they can keep track of their daily/monthly/yearly writing goals.
Really, when you break it down, I… I am… I… sigh, okay fine, I’m an Excel geek and love making spreadsheets. I’ll show myself out now…
I’m not an expert in Excel… but even I wonder WTF they were thinking making them columns. Like… ANY spreadsheet software. Why would you do that???
Yup - came here to say exactly this and post exactly this link. My work here has already been done.
I guess that’s what happens when you hire “Weirdos and misfits with odd skills”, Dom
I can have a go. I have used Excel, and even Lotus 123 in the distant past. I do not like it but it has its fans. The idea behind spreadsheets is you can lay out everything on a giant 2-D piece of squared paper, and everything is visible. This is not great, particularly with 3D data, but you can see all your values in a table, and Excel can add them up and give you a total at the bottom. If you are trying to do something uncomplicated, the normal thing is to pick a nice, well used piece of software, and use that, hoping that all the bugs have been got out.
If you have a table of ASCII data it is often possible to import this into Excel using a CSV (comma separated values) import tool. This will put everything up to the first comma in the first column, everything up to the second comma in the second column, and so on. When it finds a carriage return, it starts on the next record.
Can you see the problem yet? Unless the CSV import actually failed (which is rare because it is written to do something) then you have no idea whether it has worked the way it should unless you look at it. Here, the tables had thousands of lines. Supposing it had two carriage returns in a row. Has it put in a completely blank entry? Do you get a black entry at the end of the line?
If you see a flaw, correcting it may make matters worse. You may have the cells set up so the values are all summed at the bottom. You will see this as a chink of pseudocode in the edit window if you click on the box at the bottom. Now suppose you delete some cells and type in new ones. Does it still sum the old ones? Does it stop the sum at the new ones. You can only see by looking at the whole chart, and see whether the totals column code fits what you wanted to do. I found this myself when I used to do my monthly accounts on a spreadsheet, and some minor fix had stopped my totals a little short of the bottom. It is easy to make errors like this, and hard to detect them. nasty and dangerous.
Suppose I were condemned to write a spreadsheet. I might have stolen fire from the Gods, you never know. I would have some light coloured lines to show the horizontal and vertical ranges of calculation, so you might spot any damage to the operations. I would have some intelligent error checking so you could check whether you had got a 2D grid of similar values, or whether there were some missing. But at this point, the eagle eating my liver also looks good.
This looks like someone who used Excel 20 years ago telling some newbies how to process the data (but not on the good computer because he runs his golf program on that). You can blame Excel, but you can also cut it some slack because it is an old program, and not built for ineptitude on this titanic scale. Or I’m an old softy: you be the judge.
The problem is using Excel to transfer data between systems (never mind critical data like this)
i can’t believe the UK government is relying in batch transfer in 2020, like we did in the late 90s with diskettes
I can’t believe the testing labs don’t have an automatic data collection system deployed over the internet to upload the results in real life
I can’t believe a critical system, used to make decisions on extreme measures such as lock down, quarantine, closures of business and severe limitation of personal freedoms has no built in validation of data transfer
I just can’t…
My job is in building database systems, I can believe it. You can build it like you mentioned, and that would be the correct way, but it would take longer and require someone with expertise.
Also, odds are good that data is generated by one proprietary system and has to be loaded into another, and neither supports direct database access (for safety, this isn’t a bad practice) but everything supports CSV import / export. And if someone in this process then opens the CSV in Excel (which will happen automatically if you double click the file on a Windows machine with Office installed), Excel will mess up the file as mentioned. The usual applies: cheap, fast, good: pick two. They picked cheap and fast here.
I can fully understand this happening in batch transfers. If there’s not a need for real time info and you’re dealing with different systems from different labs, then you want a simple, reliable export to map to your setup.
One fuck up is someone using Excel to ingest the CSV file, who seemingly doesn’t understand the limitations. What I’m more curious about though is that it’s phrased as if the lab is sending all the test every time, rather than a delta.
(and this is from someone who occasionally deals with TRADACOMS retail data for work)
Ok, out of curiosity I created a 1200000 rows csv file and tried importing it in Excel.
This is what happens:
So, double incompetence, first using Excel for the task, second clicking away warnings.
Edit: as the BBC reports, it would seem that the problem would lie in exporting the data with old .xls format, that only allows for 65535 rows.
I’m quite sure they could have caught that too.
I think the Conservative party and the Blairites were too, so it’s a good fit.
I’m just a librarian but we use a diff file generated on powershell to do our CSV data transfers rather than the whole data each time.
A shocking % of the world runs off excel. Have a friend who was a trader for a while, said that 10 figure portfolios are still managed with Excel and VBA. Even today.
The shit just works. There’s something to be said for that, and for a platform most everyone already knows and has at least some understanding of (even if they don’t get the difference between rows and columns).