Originally published at: https://boingboing.net/2018/09/20/12-ways-to-curve-fit-a-bunch-o.html

…

# 12 ways to curve fit a bunch of random points

Is xkcd still a thing, though?

This is true, and in order to fully understand why you need to take several PhD courses worth of statistics. At some point in these courses you will learn that you can’t really understand what the statistics mean or how they are created. The math is ultra-complicated and requires inordinate amounts of processing such that if you were to do it by hand, it would take you millions of years to complete one equation. You basically have to accept that the statistics “prove” whatever it is you’ve done.

I am not kidding. We spent weeks creating equations that would approximate all different types of curves and lines, figuring out which algorithm we should use to “solve” whatever the data was.

The vast majority of social science is based on sloppily collected data churned through statistical engines that supposedly find things that are “statistically” significant. Garbage in, garbage out.

xkcd is always a thing.

Really who hasn’t gotten seduced by some new curves and sweet talk?

A relevant comic makes its way into the comment threads every couple days at least

Only 12?

Randall mentions another curve in the title text, and Nate Silver tweeted the comic with an additional way- “the increasingly popular no curve at all.”

Didn’t Paul Simon once wrote a song about this?

There must be 50 ways to fit your data, or something?

Statistics is, ya know, just a bunch of statistics, Ya know?

The interesting problem I find in social science is that a lot of it isn’t math that requires large amounts of processing… and that’s the problem. A lot of the maths used in the field are shoddy specifically because they had to be done by hand. Now we have the computing power to do better analysis, but most social scientists don’t really care about the statistics portion, and would rather just keep sticking numbers into the same old ANOVA model and, as you say, hoping the numbers come out < .05, without even bothering to see how well their data actually fits the assumptions of the model, or what the assumptions of the model are. Hell in my first graduate level statistics course, the instructor spent a good deal of time making sure we understood exactly how to explain to professors they were wrong when they inevitably tried to point out that your predictor variables weren’t normally distributed.

The biggest takeaway I’ve gotten from statistics is this: there is no objective way to do statistics right. There’s lots of objectively wrong ways to do it, but ultimately whatever procedures you choose will have some level of subjectivity, and the answers will never provide a clear yes or no answer. All you can do is make the best argument possible, and hope others evaluate it in-depth instead of looking at an arbitrary number value.

Anscombe’s Quartet is the opposite of this- four sets of points (with the same mean, deviation and correlation) that fit the same line:

This topic was automatically closed after 5 days. New replies are no longer allowed.