I have a friend who is an archival librarian, and she could talk for hours about this problem. The best way to preserve information is still, it turns out, “books” (which is no doubt why GitHub went with what are essentially bar codes printed on paper for this). Nothing digital lasts more than a few years. All digital media degrade and require mechanical devices to read them which fail and/or stop becoming supported. You can’t have the record of humanity depend on the life of bearings in a CD-ROM drive, or on being able to get Windows 98 drivers for a DAT tape drive in 500 years.
This has turned archiving from a challenge of organizing to one of constant migration. Every few years all the data has to be moved to whatever the new thing is that they can get drives and software for. Forget “the cloud” of course. Nobody who cares about archiving trusts that one bit. The last thing you want is the record of human history depending on a virtual instance of an EC2 on a box in some server farm in Virginia owned by Amazon. This is creating a whole new class of problem because physical media is going away entirely. Librarians are faced with gambling that the internet will somehow outlive us all (unlikely, considering how fragile power grids and corporations are) or inventing new archival formats of their own. Most are just punting at the moment, constantly migrating bigger and bigger disk farms and swapping out PCs every couple of years.