I’m sorry, but this presenter makes some spectacularly specious arguments.
Drug approvals per billion dollars of R&D in the Pharmaceutical industry has been going down for over 60 years and started at a time when there were a dozen computers on the entire planet. Blaming Big Data on a problem that predates it by several decades is ridiculous. Never mind that drug approvals isn’t a particularly good measurement of actual health impact.
I can’t understand his reasoning around radiation being thought to be good and now thought to be bad that somehow data will be considered the new toxic waste. He presented no reason why this would be the case, but just linked the two ideas with some hand waving.
Comparing data to toxic waste would imply that it is piling up like crazy, but you’d be hard pressed to read even a tiny fraction of the data that was created 20 years ago. Data, historically, has degraded quite quickly. Hard drives fail, software gets lost, formats go undocumented, institutional knowledge about the data disappears making understanding it or even trusting it difficult or people just delete it because it is worthless after a few years.
I have no idea what his argument about humans adapting is about. Sure, it means that static predictive models don’t work and that there is error (and sometimes re-enforced error), but it doesn’t make them useless. It just means that the models won’t be perfect. That’s not really a reason not to do analysis. It is simply a reason to be critical of the results.
Then he brings up Nixon? I mean, wtf? I’m surprised he didn’t bring up Hitler. I mean he’s already got Chernobyl and a bunch of spooky scary fonts to really drive home the message that the only thing there is to fear, is data itself.
All in all, I must say that this talk seems to be incoherent FUD.
Anecdotally, back when I was in college, one summer I had a crappy job for an inventory control company - we would go into supermarkets and stores and count what was on the supermarket shelves so that stores could sort of know what they actually had in the store instead of what they thought they had (actually, I guess, what they computer told them they had). Then they would write-down the losses due to shop-lifting or in-store damage or whatever.
While in training, the trainer would yell out a number before someone would do the actual count and would invariably come within 1% of the total for the shelf. I noticed also that after the crew of 12-15 were done with counting everything in the store, the final total would change anyway after a discussion with the store manager.
As a programmer, I think the same concept applies - I don’t think we really need “Big Data” so much as we need smart data.
It’s important to separate business from academia here. Every class I’ve
taken on the subject starts with some variation of “its still very early
days, we don’t really have any credible far-reaching understanding of how
to think about general optimization, so its all still heuristics.”
As far as I can tell, that “same old, same old” is a product of business
pressures, not the discipline itself. For the most part, everyone is
excited and extremely uncertain about what can actually be delivered.
I have some experience with this inside of an Architecture school and I can
tell you that people are starting to apply these strategies to
understanding things like the housing crisis (for humanitarian purposes,
that is) and to understand how respond in a more sophisticated and
empathetic way to housing needs. Of course, a few billion dollars would be
sure to help a little.
2015 Fulbright Fellow
Center for Media, Data and Society
I work in heath care and am particularly interested in feedback processes & safety. The reality is that in medicine, especially when it comes to risk management and compliance (which in many ways overlaps realms of academia & business) data use is hapless and helpless.
Serious incidents are under reported, minor incidents over reported, classification systems turn out to be worse than useless and categories are wrongly assigned. All this has serious implications to safety. I am yet to see evidence that collecting big data / which the incident reporting industry is promoting (with academic support) has significantly helped save lives. Learning and change tends to follow in depth analysis of small data, i.e. individual incidents.
Aviation is vastly better at analysing big data for improving safety. But in aviation close to 99% of activity is accurately recorded and monitored as opposed to around 0.1% in medicine. While you can pretty much monitor every breath in a cockpit you can’t monitor a human like a cockpit so your evidence will remain skewed and dangerously misleading. Tragically the German Wing Flight 9525 disaster showed, 99% is still not enough for assessing all that might turn out to be significant. Anaesthetics (and a bit of Intensive Care) is the only area where monitoring the data is realistic and effective–but tellingly it’s also the area defined by human inactivity (on the side of the patient).
The evidence in medicine, for the moment being, seems to show, that big data numbs people’s perception and lulls them into thinking that they have easy access to the information most relevant for their decision making, but in reality a London fog obscuring their vision.
Yes, if you know what you need to know (e.g. birthweight of a new born) you can collect that data and use it effectively, but when it comes to predicting what you ought to know before disaster strikes, it’s pretty neigh useless.
Please elaborate. If possible less rudely. At this juncture, your assurance is a mere anecdote.
What social good was achieved? How was Big Data instrumental in achieving that goal?
As to my 99% number. You are right it’s my wild guess that in a profit driven, market fundamentalist world, where the third sector is far from immune to the basic workings of market capitalism, most entities investing in Big Data have some sort of profit motive. There is an interesting argument to be made about the profit free nature of the non-profit sector, but let us leave it for another place.
Sigh. I don’t have time to pull an exhaustive list since that would be rather extensive considering the number of government agencies, companies and academics around the world which use Big Data, but I might as well point out some well known examples.
NOAA (+ NASA + US Forest Service) - really dozens of agencies but they have quite a few Big Data systems from their weather prediction service, flood prediction service and fire threat prediction services. Plus of course all the climate research prediction systems.
The United States Geological Survey - early prediction of earthquakes & tsunamis. Actually uses not only seismographs, but also monitors social media to find epicenters and predict amount of actual damage faster and more accurately for emergency response services to get where they need to be faster and more efficiently.
CDC - Has all sorts of Big Data projects from their BioSense platform which tracks health issues as they develop by collecting ER visits, hospitalizations, health info from the VA and the DoD and then provides them to projects running on their platform like the Dengue Detection project in Florida and Hawaii to various diseases tracking systems that monitor Twitter for everything from Flu to Ebola outbreaks. All the data is then used to recommend school/govt building closures, immunizations and supply increases of various drugs.
IBM Watson Health - Helps doctors and nurses with medical diagnosis (largely cancer related) using a large Big Data system with terabytes of data of structured and unstructured data from everything from random websites to cancer patient records. The system is now partnered with Apple and some children’s hospitals to collect data for analysis to better the system’s ability to suggest personalized treatment regimens that are more effective.
SumAll - I might as well name a smaller project since you mentioned homelessness. This Big Data non-profit predicts which people are at risk for becoming homeless in New York in order to get them help before they become evicted since it is far easier to keep someone in their house than it is to put them in one once they are homeless. They pull data from everything from court records to shelters. They’re goal is to go nationwide.
United Nations Global Pulse - Really dozens of Big Data projects that do everything from predict food shortages to analyzing public attitudes towards contraception to figuring out how to change people’s perception of public sanitation in third-world countries.
I could go on and on. There are thousands of these projects. Some are just used to inform public policy. They go into some briefings to governments. Some are are public services. Some just help researchers target where more research is necessary. A huge amount goes into making things more efficient and detecting fraud so that we can use our limited resources more effectively to help more people.
It is impossible to truly convey the scope we’re talking about here since really, Big Data is really just a modern buzzword for large scale data mining and analytics which has been going on for decades. The data sets have simply become bigger and the tools more scalable.
OK, @nojaboja has some good points. I work in epidemiology and biostatistics for health research. Most health data qualifies as small. Two notable exceptions are genetic data and medical claims. Those are big. But as for hands-on clinical research, the data tends to be on the small side.
Why? Lots of reasons. Clinical data is hard to collect. Clinical data takes a long time to collect. Clinical research is also often looking at rare conditions, which don’t amount to very many cases, over long stretches of time and studies are routinely halted for low numbers or the treatment landscape has changed. There are big clinical studies, but those are the tip of a very deep iceberg.
Another example is “confounding by indication”. In clinical research, and in the absence of lots of funding for a randomized controlled trial, we often build cohorts to study a disease and treatment. These types of study designs call for a treatment group (or groups) and a placebo group. But the problem is that people who get into the treatment arm usually have some condition that predisposes them to be in that arm rather than in the placebo or control arm. That is confounding by indication: a prior set of variables about that person caused them to be in the treatment arm, thus contaminating the study. And ethically there is no way to withhold treatment to force them into the control group. So what do we do? We still have to analyze the data and try to get something meaningful out of it.
The point is that no amount of BIG DATA will solve this problem. This is not a big data problem. This is a stat methods problem. We adjust the model using schemes like inverse probability of treatment weighting. Or any of many different strategies to mathematically level the playing field so that our regression model is a fair look at the data.
Most of health research is this way. It’s massively frustrating that we can’t just throw 400,000 subjects into a regression and see a p-value < 0.00001. That is incredibly rare. We must instead deal with data on a much messier, more personal level that requires a sincere dedication to statistics to both do it and also to read these papers and understand what other people did.
Hey, more power to big data people. Get the job done. By all means, tweak your ad streams and have fun with geo data and huge piles of words. I’m not saying don’t. It won’t help us with most health research, though.
Let’s run through your Big Data hype list (confirmation bias is really a thing):
Just spent (wasted) 20 min reading through various http://unglobalpulse.org/projects. The standard formulation is project x “shows potential.” I couldn’t find a single case where an actual correlation between analysing Big Data and achieving social good has been proven. There is no question that influential people fervently believe that Big Data shows potential–that is at the roote of the hype. The question was / is, is it fulfilling that potential?
This is considered instrumental in achieving social good? I doubt this will make any significant mark on NYC homelessness. It is about as effective (or probably less) as me standing on the street corner handing out buns, to alleviate world hunger. There are well tested ways to reduce homelessness e.g. surprise, surprise making housing accessible to poor families, but those of course, involve radical rethinking as opposed to hiding behind the fog of big data.
Shocker. Big Data company is selling big data as the solution to everything. E.g. have a read of this 2013 press release (there is no access to any actual research anywhere on the site) https://www-03.ibm.com/press/us/en/pressrelease/42214.wss. Apparently the IBM’s Oncology Expert Advisor is going to eradicate cancer. There isn’t a single peer reviewed article on said Expert Advisor and the vague claim that it helps manage a patient’s treatment “by alerting less-experienced physicians or nonspecialists to aspects of therapy they might not be aware of or overlook.” is unlikely to fill anyone with confidence http://www.mdanderson.org/publications/annual-report/issues/2012-2013/info2.pdf. Given the choice between an experienced oncologist and a nonspecialist with access to IBM’s Expert Advisor, I know which one I would choose.
You are right CDC does good with data. But that has been the mission of epidemiolgy long before big data, they are doing what they have always been doing, just better.
The CDC’s case also highlights the pitfalls of political interference and bias into big data. The CDC is prevented (by Congress) from studying gun violence, a major cause of death in the US, at par with motor vehicle fatalities thus substantial skewing the data findings.
As to the social good in earthquake prone regions. Infrastructure investment saves lives. High quality / expensive buildings, good roads and functioning emergency services are effective measures to improve survival, big data does not–nor is it likely, any time soon.
There is a reason why devastating earthquakes in Japan “only” kill thousands while earthquakes with the same magnitude kill tens of thousands in less developed regions. A small part of that is warning systems a far greater is the built environment. Big data is not going to solve that one.
Yep, you are right big data is key to meteorology, just as to epidemiology, but that was so, long before the current hype in big data.
But, really that’s about it, fields of studies which have always relied on data, now have access to far greater data to study, that is a good thing, and as those researchers are well versed in using data effectively and prudently, the benefits outweigh the dangers.
However, in other, less clearly defined fields, where data use is haphazard and there is little understanding of such fundamental concepts as confirmation bias, the danger is that a fog of data obscures real progress.
Another important issue is that in health care data input is patchy at best, fraudulent at worse. There have been major coding scandals in both the US and Germany which suggest that health care records are far from reliable, thus neigh useless for serious research.