Microsoft takes down MS Celeb facial recognition database, 10 million+ pics of ~100,000 faces, maybe yours, scraped under Creative Commons

Military research and Chinese firms had access to the data Microsoft scraped under Creative Commons licenses.


Were the pictures themselves licensed under Creative Commons or was the database licensed using Creative Commons… or both? I’m kind of confused by the wording here.

That said there’s certainly nothing stopping anybody from scraping the entire web for facial images (no matter how they are licensed) and indexing them. There’s plenty of free and open source libraries out there for facial recognition and deep learning that can be utilized.


Speaking of facial recognition:

Auditors slam FBI for shoddy testing of facial-recog tech. But no big deal. It only has 641m images on its systems


So many issues with his sentence.

  1. weasel passive
  2. employee that, rather than employee who
  3. in either case, materially irrelevant content following employee
  4. slightly ambiguous anaphoric placement of ‘removed’ (following employee) is rather sinister

I wonder how they assembled the database of face photos? Did they use a Mechanical Turk, or were they using facial recognition to recognise faces to train facial recognition? If the latter, how many of those photos will be of actual people, rather than paintings, sculptures, CGI and action figures.


The photos are gone but what about the models that were trained on that data, are they available still?

Whatever they used, you can be sure the following were true:
-it was a really cheap option
-is was super creepy

