A "digital rosetta stone" for translating obsolete computer files


#1

Originally published at: https://boingboing.net/2018/03/13/a-digital-rosetta-stone-fo.html


#2

Years ago I had a client who was hanging on to this ancient Apple computer that had some MacWrite (I think) formatted file on it. He held onto the computer so - if someday he needed to view that file he COULD.

He had spent many hours and maybe a hundred dollars trying to convert it to something that would work on his current Mac.

Me: What is on the file?
Him: Just an old list of Club Members
Me: Are you ever going to change the info?
Him: No. I just want to be able to see it at some time. I don’t want it lost forever.
Me: Why don’t you just print it out and then get rid of the computer?
Him: Oh…I hadn’t thought of that.


#3

I have an old Mac that I keep just for Quicken 2007.


#4

For time-travel money laundering?


#5

Ummm… Couldn’t you just use a virtual machine? Unless there is some obscure hardware level interaction, there’s no reason for this, especially on something as new as Windows NT.


#6

Good luck with that. We didn’t even have a good digital Rosetta Stone back in the day…

I remember having file conversion software back in the early 90’s, the days of DOS 5 and Windows 3.1. There was still a competitive software ecosystem back then - people still using WordStar, WordPerfect, IBM’s DisplayWrite, Lotus 1-2-3, SuperCalc, Lotus Symphony, and more. Hell, this was a time when your graphics might come in PCX, BMP, GIF, LBM, TIF, and plenty of others.

Since then we’ve had something of a monoculture occur. Yes, there are plenty of people using software like LibreOffice, but its a minority. Most things produced are going to be in Microsoft Office formats.

Back then, when you needed to get your files converted so that you could get your Deluxe Paint II art into your mate’s WordStar document, it wasn’t peachy. The only real way to ensure full fidelity was to run the original software - you often ended up looking for a format that two applications mutually supported, and would use that as a workaround. Good third-party format conversion software was both expensive and imperfect.

And here’s why I wish them luck - things haven’t actually improved, even with a monoculture. Even today I see old corporate documents that were produced a decade (or three versions) ago, and Word seems to have odd little issues with them. Backwards compatibility for Office isn’t as good as you’d think it is.

If you want the general gist of it, a digital Rosetta stone will be OK. But if you want to see it as it was intended, you should go to the original environment it was produced in.

If that means maintaining a fleet of VMs - well, you only need to build a DOS 5/Windows 3.1/Office 4.3 machine once. Then you can just clone it and keep using it again and again…


#7

Yes, I think that emulating and virtualizing where possible is the solution. Some of those expensive CAD programs use those annoying hardware dongles (HASP?), I could see that requiring actual physical hardware.

What the article talked about successively converting the file from one version to then work in the current version really scared me because how do you know that all the information was indeed transferred correctly?

How could you ever know if anything except for the original code is going to interpret the document the way the original author at least intended it to be? So shouldn’t you try and run the original code when possible?


#8

I have a powerful solution to almost all of these issues, spelled ASCII. No matter how advanced my software is, ASCII still works.

Kind of sucks for video, I’ll admit.


#9

I recently converted a ton of documents using a windows95 vm and running a very old copy of word. I would say it isn’t rocket science but I’ve been in IT and I realize for most people it would be rocket science.

Dammit! All my old code it in Ebcdic and Petscii.


#10

Converting between EBCDIC and ASCII is relatively easy. iconv has all of those encodings. PETSCII however is more challenging. I’m not sure there are even unicode codepoints for all of the PETSCII drawing characters.


#11

Still have a Star Trek game for TRS-80 on cassette tape somewhere…


#12

Just to add a bit of history, back around 1990 two DOS / Windows products that attempted to convert files from one format to another were Data Junction and CrossFile.


#13

Came here for this, leaving satisfied. That S stands for standard.

Wait, no, I’m not leaving yet. Not till I mention that I’ve got an old thesis written on a CP/M VT180 in a word processor called Select that I’d like to get converted some day (the Rosetta software will need to accept 5and1/4" floppies, though)…


#14

I imagine what it might be like to throw the entire computing power of the USS Enterprise at an ancient binary file – let some monstrously powerful AI reconstruct the cultural gestalt at the time of the document’s creation, developing an approximation not just of all the contemporary coding paradigms but also trends in graphic design, going not so far as to interpret the document the way the original author intended, but even how the original author might have preferred it to be interpreted if released even slightly from the technical limitations of the time.


#15

Well it would be a start.


#16

Another problem we had back in the day was that there was no single 5 1/4" floppy standard. There were some conversion programs out there – I wrote one that let my Morrow read disks from a machine on campus – but it exacerbated the ‘moving files between systems’ problem.


#17

It’s articles like this that make me wonder if a new kind of programmer will come about as a consequence of all this. One that doesn’t really create new software but rather writes programs to access lost but valuable data. Sorta like archaeology but for data. I know we keep adding new frameworks and libraries all the time but look at how C++ and C keep chugging along. It’s inevitable that either programmers will not able to easily access the data directly and thus need virtualization to even load a file or we’ll have to do other kinds of tricks to interact with ancient technology. It’s really weird to consider, honestly.


#18

This discussion so far is entirely about 8x86 processors. I started in this business before these machines existed, and data were stored in myriad physical formats on floppies of two sizes, with from 8 to 26 sectors per track, or on paper tape (thankfully mostly in ASCII), or on one of a dozen incompatible cassette tape formats.

My brother was able to write a program for a modern computer to accept the cassette tapes from his homebrew 6800 c.1977 and do some DSP to extract the ones and zeroes. It took several hours.


#19

It’s reasonably easy to go from ASCII to EBCDIC, as long as you didn’t use any characters that aren’t in typical mainframe code pages (like, for example, curly braces) but it’s pretty hard to go the opposite direction. Even if you know you’re going to code page 37, you’re still trying to stuff eight EBCDIC bits into seven ASCII bits.

The company I’m currently working for processes a lot of mainframe data. Periodically some gigantic org with thousands of employees will accidentally change the code page translation table on their mainframe and suddenly all their EDI will break…

(Another fun translation task is dealing with the packed decimal formats that crop up in mainframe files amongst the EBCDIC text.)


#20

The Morrow I mention above was a Z80 (running CP/M).