Big Data's religious faith denies the reality of failed promises, privacy Chernobyls

[Read the post]

Laughing uncontrollably at this!

ā€œIn Soviet times, there was the old anecdote about a nail factory. In the first year of the Five-Year Plan, they were evaluated by how many nails they could produce, so they made hundreds of millions of uselessly tiny nails.ā€

10 Likes

From the full presentation: ā€œWeā€™re very much in the ā€˜radium underpantsā€™ stage of the surveillance economy.ā€

Yeah, that sounds about right. Except that we know that and we are still doing it.

14 Likes

O ye of little faith: had ye but faith the size of a 16-bit integer, ye could move mountains!

6 Likes

Of course, they got smart the next year, and measured the nail production by weightā€¦

9 Likes
Ceglowski also raises a critical point: Big Data has not lived up to its promises, especially in life sciences, where we were promised that deep analysis of data would yield up new science that has spectacularly failed to materialise.

I think this is more a case of ā€œWhereā€™s my jetpack?ā€ syndrome where people donā€™t realize just how much science has advanced because they base it on some unrealistic goal rather than more important if less glamorous advances. Biology certainly has been tremendously advanced by ā€œbig dataā€ in the sense of automated DNA sequencing over the past twenty years or so.

6 Likes

The comparison with nuclear mishaps does not entirely fit.

The data breaches do not create beautiful natural preserves.

7 Likes

ā€¦or shambling, misshapen mutants.

Well, since they are talking about big data in biology, they might yet.

5 Likes

If you never test your assumptions, then no amount of data will light your way out of the deep, dark statistical woods.

3 Likes

Do you work in data? You sound like you know a thing or two. I have loathed the term ā€œBig Dataā€ since people really started using it all the time. Most people donā€™t realize that all data is a sample. Even if you collect exhaustive data, it is still a sample from a putative infinite population. And, because of that, sampling methods, appreciation of distributions, adjustment and all kinds of other basic processes flowing from the Central Limit Theorem still hold true. And then there is the reportingā€¦ Whenever I hear Big Data, my eyes crinkle a bit and the plates running down my spine start to redden and I brace for battle.

#Bring it, Big Data.

1 Like

Oh, they willā€¦

1 Like

So good. Thank you for sharing.

I also get a small whiff of, ā€œWhereā€™s my jetpack?ā€ here.

If we replace this this sentence:
ā€œBig Dataā€™s advocates believe that all this can be solved with more Big Data.ā€

With this one:
ā€œBig Dataā€™s advocates believe that all this can be solved with more research on methods of data modeling.ā€

Do we feel differently about it?

The way the term Big Data is used these days, those two sentences are functionally interchangeable. Data ā€œscienceā€ is still mainly heuristics at this point, we havenā€™t had the computing power to run experiments in this field for very long at all.

I completely agree that collecting lots of data about human behavior is problematic and I love Maciejā€™s 90 day expiration plan. That said, Iā€™d be willing to bet that twenty years from now, no one is going to wish that we, as a society, had spent less money researching ways to use machine-learning to understand the life sciences, or that the contributions therefrom will be considered unimpressive.

All of that said, I need to give a talk next month on ā€œBig Data for Social Goodā€ and Iā€™m having a really hard time coming up with materialā€¦

1 Like

The problem is not that big data is not in itself useful, the problem is the adversarial nature in which itā€™s used. Itā€™s a real shame, with a bit of honestly-intentioned regulation the potential for research in the public interest would be far more interesting than all this ad targeting and Skinner Box fine-tuning.

All data is not a sample nor are all populations infinite. Thatā€™s just ridiculous on its head. If my population is the manufacturers of socks in my drawer right now, I assure you that not only is it finite, but I have a complete data set for it.

My only problem with ā€œbig dataā€ is that people like Cory seem to think all big data sets are related to people.

1 Like

Until one fine dayā€¦ a stray sock shows up.

If you are trying to generalize to socks based on your accessible sample, which consists of the socks made by the manufacturers in your drawer, then the larger theoretical population is what your sock sample was drawn from.

I.e. theoretical population > accessible population > sampling distributions if you are running stats on those socks.

1 Like

Yes, but my population was the manufacturers of socks that were in my drawer at the time. I assure you that has not changed just because you decided to create a new population of socks that could ever exist period in my drawer.

The population of states of an abstract bit is exactly {0, 1}. It does not change. One does not say this is invalid because they decide to redefine the population to mean the states of every bit that exists, has ever existed and will ever exist. Thatā€™s just an exercise in sophism.

I see the direction you are coming at this from. But even still, what if you are generalizing about manufacturers? If you are generalizing about manufacturers based on your sample, even though you call it complete, that act of trying to generalize about them requires this theoretical construct of a larger frame of reference. Yes, we can get super esoteric, and I see what you are saying, but I am talking stat theory and you seem to be talking comp sci or maybe set theory? I dunno.

Also, I would note that a bit is not data. It is a datum. :relaxed: Iā€™m not arguing sophistically about data, though. Iā€™m talking stat theory, so if we are talking about different things we oughta acknowledge it and we can both be correct in our domains.

n.b. thinking a bit more. This back-and-forth summarizes the chicken-egg problem with Big Data. Just because Big Data might be exhaustive or somehow total, does not mean that inferences gained from analyzing it are necessarily generalizable. When you cross over that line from data to statistics, it turns a corner and the assumptions underlying the stat methods do not get suddenly suspended because you think you have all the data. It doesnā€™t work that way. Examples are Google Flu, ad targeting and use cases gone awry. In each case, there are or were misfires that happen because even exhaustive data doesnā€™t tell you everything there is to know about something. Itā€™s still just a sample.

2 Likes

Great presentation. Interesting point about the measurement of truckers. The observer effect of quantum physics seems to kick in here; the measurement of a thing impact the thing itself. Isnā€™t that applicable to much of big data? Who hasnā€™t changed how they communicate online around certain topics (politics, religion, relationships) knowing that someone, somewhere will be observing it. Who buys some things in cash so nobody knows they bought it? The mere collection of big data has influenced the behaviours itā€™s tying to analyse. Doesnā€™t that lower the value in the analysis itself?

1 Like