Collecting user data is a competitive disadvantage

Originally published at: https://boingboing.net/2019/05/14/minimum-viable-corpus.html

1 Like

yes but boingboing’s lack of a data moat means you were unable to predict I’d make a smart-ass rather lame comment about this - hoist by your own data petard Doctorow!

2 Likes

Let’s not forget the people who will never use a service due to its data collection practices.

1 Like

No, let’s. They want to be forgotten.

5 Likes

Netflix collecting data on me may not make their recommendations algorithm much better at this point, true. But my data is the input to the model that allows them to make recommendations specific to me. If they weren’t collecting my data specifically, they couldn’t make those recommendations for me; they’d have to make very generic recommendations based on the things they could guess from immediate context, such as what I’m searching for at this moment.

Also: as noted in the article, freshness of data can be very important. Netflix couldn’t make recommendations about this year’s crop of movies and shows if they decided that their model was good enough in 2017 and stopped feeding it new data back then.

4 Likes

This doesn’t make the case that collecting data is a competitive disadvantage does it? Only that there’s a diminishing return and a short shelf-life. That’s true of most data collection and many other activities isn’t it?

3 Likes

They have a bit:

It’s been a while since we’ve seen ******** — their last post was 11mon ago.

3 Likes

ANDREESSEN HOROWITZ - their motto is Software Is Eating the World?
What?

Yeah, I feel like the headline here is an almost willful misinterpretation of what they’re actually saying. The article is saying: data has been purported to offer a specific type of competitive advantage that is in practice not so hard to overcome. Data is not a magic bullet, and just having it (without cleaning, maintaining, trimming, and analyzing it) does very little for a resource-strapped startup.

5 Likes

Indeed. Not being a sustainable competitive advantage is NOT the same as a “competitive disadvantage” . But it’s Cory - what did you expect?

3 Likes

All five of them.

Admit it though: your entire comment was auto-completed as soon as you typed “y”

as was mine

including the spoilers

2 Likes

This reminds me of when you first organize and catalog a collection it can be fun, but if it gets big enough it becomes a chore.

There are categories of knowledge that can only be accessed by data mining. Suppose it turned out that, exactly one week before someone has a heart attack, their heartbeat always does “shave and a haircut” three times. That would be unimaginably valuable knowledge to have – and only a company like Fitbit could ever have the kind of corpus in which you’d find it. By this argument, big data is an unexplored continent with gold mines somewhere in it, and you either have access to that continent or you don’t.

But there are two strong counterarguments to that:

  1. Big data sets are necessary to find certain things, but they are not sufficient. You have to know what patterns to look for. And if you have a good enough reason to suspect a pattern, you can afford to gather the required data, even if you don’t already have it as a side-effect of your core business. A company like Amazon can scan its data for very basic patterns, but that’ll just reveal that hot dog purchasers often want hot dog buns. The value is in having the right question, not owning the library where the answer can be looked up.
  2. Similarly to the idea that everything can be funded by ads, the idea that you can fund a business solely by learning about your customers has an obvious Ponzi-scheme quality to it. Revenue-multiplying market intelligence is only valuable insofar as you have revenue by which to multiply it.

I would also add that businesses with these supposedly priceless corpora often give them away for free. Like, Google’s image database gives it the edge in machine vision? How does that work, when Google spent billions compiling it, and I spent $0.00, yet I can use it to train my 1990s-style neural nets just as well as they can?

And: speaking of recommendation algorithms – which is an area where Amazon and Netflix allegedly have made concerted efforts to exploit their data hoards – has anyone else noticed that the state of the art is still absolute dogshit? Amazon could sell me twice as many Kindle books if they only came up with good suggestions, but their idea of a sophisticated recommendation is literally just to suggest books by authors I’ve read before. Any independent bookstore website could do that.

1 Like

Which would of course probably be a lot more useful to you since the fact that you watched an awful lot of Peppa Pig probably won’t help Netflix know what kind of slasher horror flick you want to watch on Friday night.

(Although…)

Also the point of the article isn’t whether the data helps you (or even whether it helps a business keep you as a customer), it’s whether the data helps the business keep ahead of the other sharks.

Their conclusion is that after a certain point, it doesn’t.

Your point is of course apt in that Netflix, et al, still need to keep collecting it in order for their business to function but beyond a certain scale the data is becoming less and less useful and more and more expensive to handle.

Well, it’s Cory’s summary of what he takes from the article. The authors may not think they meant that and you may not think they meant that but Cory’s view is a logical corollary from “data is not a perfect moat and beyond a certain minimum scale has strongly diminishing returns on increasing costs”.

If you carry on doing something that costs you more than it gains you, that’s a competitive disadvantage.

So, he’s perfectly entitled to say so. He summarises the article perfectly well and links to it so we can read it for ourselves.

Disagreeing with his conclusions is kind of what the comment section is for…

To put it another way - looking at things and taking a different (but logically plausible) conclusion to the immediately obvious is sort of his day job.

Not necessarily (up to a point). I am reminded of the supermarket (Sainsburys?) which soon after it had its shiny new data mining engine made serendipitous discovery that at certain times of day and certain days (e.g. Friday 6-7pm) they had a bump in sales of six pack beers and disposable nappies (diapers, to the colonists).

Putting beer on promotion near the exit along with nappies, bumped it even more hugely once they’d spotted it.

Turns out many dads were phoned up in the afternoon by mums saying they were nearly out of nappies and would dad pick some up on the way home? Dad being dad decided to treat himself to a beer or six as well.

This was back in the nineties IIRC. Nobody was looking for a beer/nappy link. That pattern had occurred to precisely nobody. But I’ll grant that they were looking for something, although as I heard the story it was simply a result of someone noticing an anomalous pattern.

But I don’t disagree with you, really. Just an excuse to tell another apocryphal data mining story. Also (and I was told, certainly much truer) many years earlier, Sainsburys did not put barcoding/scan at checkout, etc. in place to optimise stock levels etc. It was an act of faith that it would reduce checkout queue time (something customers told them was their no.1 bugbear - prior to this checkout staff typed prices into a till machine). The whole stock optimisation/just in time delivery thing came later as another serendipitous by-product.

Beer and nappies, again. :wink:

1 Like

This topic was automatically closed after 5 days. New replies are no longer allowed.