Originally published at: http://boingboing.net/2015/07/23/when-scientists-hoard-data-no.html


Not saying that pharma companies shouldn’t be more forthcoming with clinical trial data, but isn’t this “MOAR DATA = BETTER” impulse something you routinely bash in the context of Big Data? Not all clinical trial data is created equal - it’s actually quite difficult to design a good study that will tell you what you want to know at the end. Mashing together clinical data from a bunch of different sources seems like a recipe for trouble.


Ben Goldacre has an answer to this too. There are meta-reviews where different studies are compared. This isn’t a simple average all the results. Even if everyone is doing the same experiment, you have to find out how many data points people had, and whether some people have more scatter than others. There are probably many orders of magnitude more schoolchildren measuring the earths gravitational constant then there are new peer-reviewed experiments that actually come up with a better figure, and yet the single peer-reviewed result is good, and the average of all results is meaningless.

You also have to do what a computer can’t do, and work out whether they have actually done their job properly, and weight the peer-reviewed papers in the famous journals more than the report from the Institute of Crystal Healing, that is a dead link on a website. Statistical tools are good for highlighting outliers in a distribution, or hints that data has been cherry-picked, but a meta-review is a lot more than a web crawl.


In a moderately perfect world which still has patents we would require full disclosure and publication to get the patent, patents serve the public by getting the science published up front and throwing the design to public domain after expiration in exchange for a temporary monopoly. rather than keeping them an unpatented trade secret and no state enforced monopoly.
Of course in an imperfect world scientists would be incentivised to record less data or run off of memory and verbal communications, or buy a patent law keeping all of the science locked in-house.


Well, this week, NASA flew a spaceship past Pluto, and the truthers appeared right on cue to say it was all a fake. Idiots gave them coverage. And then…the sky did not fall in. NASA was not defunded.

Not saying this is entirely wrong thinking, and I would definitely be on the side of making more data available rather than less. There are counter-examples, however…

After his speech, a member of the audience asked if [Texas Congressman Lamar] Smith could do anything about the National Science Foundation funding for climate research, complaining that “it only goes to one side.” Winning applause, Smith told the crowd that his committee had just cut NASA’s Earth science budget by close to 40 percent and was pushing the National Science Foundation to stop funding research that he perceives as useless.


If a tree falls in the forest, and nobody is around to hear, does it make a sound?

Likewise, science that is not published.


Mashing together clinical data from a bunch of different sources seems like a recipe for trouble

the article [also] talks about gaining access to the original datasets from studies in order to review the statistical results for those studies. so, in this that case they aren’t mashing things together at all. [ edit: they talk do about mashing for the statin studies, but not for the deworming study. ]


in the context of big data, what’s being gathered are a wide variety of seemingly unrelated facts about you. who did you call, when did you do so, what web sites do you visit, what shampoo do you purchase, and do you like gifs of cats.

this seemingly unrelated data attempts to create a profile of you, and this profile is used to attempt to assess risk.

that data – because it’s random – is largely noise. coincidental facts, not causative ones. ( all terrorists hate cat gifs. bob hates cat gifs. is socrates a man? )

reporting scientific data, i would argue, is more like traditional criminal investigation: asking customized questions tailored to a specific investigation in attempt to prove ( or disprove ) a single hypothesis.

every positive and negative result in this context has high information value. high value information – and the ability to review it openly – is good. noise – especially hidden behind closed, just trust us, doors – is bad.


Yes. Trees don’t know if they are or aren’t being watched. You don’t want to know what happens to a tree that gets caught falling without making the requisite noise. There are random inspections and the trees know it.


Only for those who believe in consensus. None of this prevents others from doing their own research.


