Originally published at: http://boingboing.net/2017/07/07/metacrap.html
…
It’s a sleazy and pathetic move on the part of the broadcast network executives (what else is new?) but the situation also betrays a real laziness on Nielsen’s part (in line with the complacency of what is effectively a monopoly business). It’s not like spell-checkers with custom corpora haven’t been available for years.
This seems like a really solve-able problem.
This is why I only like unfailing shows like “the Bug Bang Theory” or “CIS Miami.”
Feels like someone at Nielsen most certainly would have stumbled upon this, best case their analysts aren’t spending enough time reviewing the data, worst case they were turning a blind eye
If I mistakenly write “NBC Nitely News,” you can probably still tell what program I’m talking about. Nielsen’s automated system can’t…
Simple enough problem to fix. Google has no problem with spelling errors.
I’ve been doing this ever since my 2012 3Q “likes” on BB underperformed.
Cordially,
Semiotics
If Nielsen is that much of a feckless clown show, then it’d almost be irresponsible for networks not to game their system. These boobs are the gatekeepers for hundreds of millions of dollars’ worth of commerce.
I get that the network misspells a show’s name for a particular week so that it won’t hurt the overall average, but won’t doing so mean that the “real” show doesn’t show up on that’s week’s rating list?
If it’s like somebody wants it to fail, one must wonder who’s making the buck.
Late Stage Capitalism… if there are loopholes, they shall be pwned.
That’s pretty clearly a feature, and not a bug.
it’s not that, so much as it’s not an independent auditor. Not if they even -can- be gamed that way.
Torrent sites have found an effective way to sequentially name shows and their episodes. Any pretense that Neilsen can’t have done so as well, but chooses to let this ‘accidentally’ happen is hilarious. If they stopped then a competitor who did allow it would likely flourish suddenly - very much because of the hundreds of millions of dollars, and the very few people who steer those currents at this moment in time.
The Networks report their numbers to Nielsen?
I thought Nielsen collected the numbers and published the results themselves.
So, what exactly is it they do?
It’s honestly a bit more surprising(at least to me) than mere laziness:
A situation where you are maintaining some sort of database, with the intention of useful reporting or analysis, of things with potentially overlapping/ambiguous/non-unique/sometimes deliberately misspelled and sometimes accidentally/similar things where the real world spits on your naive ontology and classification; is exactly where you cook up some more robust identification scheme in order to preserve your own sanity and make the job easier.
Nielson isn’t exactly an impoverished mom 'n pop shop; and they (at least aspire to) be taken seriously as a source of audience engagement statistics on a national scale even as some of the data and search outfits from the internet have started sniffing for blood, and things like the streaming services (where the server logs provide details Nielson would dream of having basically for free) have become more prominent.
I would have expected them to be using something a bit more… scalable…than “broadcasters self-report their shows, identified by natural language display name”; which has the virtue of simplicity at a tiny scale; but is vulnerable to error(even in the absence of malice) in volume. Some sort of system with UUIDs, or other ugly-but-unambiguous identifiers seems like it would have become a necessity years ago. Is their “database” made of harried secretaries manually dumping stuff into a big hideous Excel sheet on the shared drive?
I would not bet on Nielson if Google were to get the knife out and come for their market; but the ugly thing about spelling errors is that, since show names can(and I think some do) use deliberately incorrect spellings, you can’t just spell-check your way out of it.
A naive spell checker can transform all misspellings into their closest correct equivalent, but can’t distinguish between errors and intent.
Someone with lots of statistical data, like Google, can transform all inputs into the outputs people like you most often intended; but can still only get the answer right if the right answer isn’t the improbable one in this case (think of the occasional topic that is virtually unsearchable without copious use of quotation marks and exclusion modifiers and such because it overlaps with something that uses most of the same strings and is vastly more popular).
They would certainly be right more often than the naive case or the pitifully obtuse case; but when dealing with inputs that may be malformed so as to be underdetermined, overdetermined, or both; there is only a way to the more plausible answer, not a path to the correct one(because there may well be zero or more than one correct answers).
Neilsen isn’t new. That they are just now just discovering that their system doesn’t maintain data integrity is inexcusable. I get that sometimes user submitted data can’t be verified (first name: Marry, Mary, Merry, Mari). But the list of shows that exists or has existed is a known quantity. That they don’t check submitted show data against the list of known shows is surprising.
One has to wonder.
On the one hand; it is never safe to bet against the awfulness of a nontrivial data processing system; whether it’s a legacy nightmare with a downright Lovecraftian lifecyle; or a painfully half-baked and rushed replacement for one; yes it can be that bad.
On the other hand; this sort of chicanery seems like something that would be noticed, even if only by chance in the course of other work, by the humans involved often enough that it wouldn’t be a safe trick to use against an organization that actually takes a hard line on the matter, even if their data processing is actually handled by a time-sharing system hosted within Azathoth the Blind Idiot God. Getting away with it most of the time can still be quite painful if the punishment for the remaining cases is severe(plus, an entity that actually cares would be likely to increase scrutiny of someone discovered to have cheated in the past; making it harder to get away with anything in the future).
This inclines one to wonder if, perhaps, the market forces were such that Nielson, for whatever reasons, more closely identifies with the interests of the broadcasters(or requires their cooperation to a degree that cannot be obtained purely by coercion) more than it does with the interests of the advertisers; and an informal arrangement exists where, within unstated but mutually understood limits, broadcasters are afforded a little ‘professional courtesy’ when it comes to burying certain embarassments; presumably as long as they don’t do it often enough or blatantly enough to upset the business.
I work on data-centric system. It is fed by a large number of city and state governments, vendor companies and other commercial companies. Lovecraftian describes it well. It is often confusing and frustrating but I have to say that it does a great job of catching bad data and chucking those records to a fallout table where our data folks review and deal with anything the system is unsure about. But yes, I do understand that making changes to such systems can be very painful.
This sounds very plausible. The system they are using was probably built when there were three networks and it was easy to see if one was gaming the system too much. Now the number of networks is… out of hand.