Couldn’t it be distributed as software? Create a Twitter client that simply follows tweets and writes them to disk. I get that a definitive storehouse is also a useful thing, but with thousands or tens of thousands of people recording tweets you’d have a decent level of depth. Then as a second layer have your own API that collects opted-in peoples tweet collections and databases them. It could even have a “Number of instances recorded” feature that allows people searching the database to have a confidence rating for each tweet.
Or is there a Twitter thing I don’t understand that prevents this?