CLIP Interrogator AI will roast your selfies

She’s wearing a Things Could Be Worse shirt!


This thing does spit out some interesting stuff. When I fed it my face aside from the obvious descriptors for hair, gender and glasses I also got stuff like:

  • reddit contest winner
  • team ibuypower
  • taco
  • fantasy medium shot
  • hitchcock
  • leblanc

If the AI is being fed unfiltered internet as the basis for it’s model that is what you get. The thing that “machine learning” is actually best at is perpetuating and amplifying existing bias. That is why most conversational AIs race to racism and get shut down

Two questions:
Did you have a taco in frame?
If not, and if this is too personal feel free to ignore, but are you of hispanic/latinX background? (Curious if the inclusion of taco as a keyword in a picture that does not include a taco is a sign of racism in the model)

No tacos or taco related paraphernalia present and I am a stereotypical level of white. I am on the heavy side, so maybe it just assumed I like tacos? Which I do!


Tacos are delicious. I am really curious about why the model decided to include it


OH, great, they are going to put all the assholes online out of business!

Nah, like Dall-E for some artists, this is a tool that assholes can use for inspiration and to improve their workflow. The assholes will never be put out of a job


Yes. That’s where I immediately assumed this AI toy will go after it’s “refined” through the input of the larger Internet. Especially since it’s effectively designed to roast and insult photos of humans. Things could get nasty very quickly if it’s hooked into the dating apps.

It’s a comment on a number of facets of Internet corporate and consumer culture that I’d prefer HAL 9000 to the unpleasant AIs we’ve ended up with (art-generating ones excepted). At least HAL was polite and operating on a basis of what it thought was altruism and the common good, even as it was killing humans.


The purpose of CLIP Interrogator is unclear—is this digital performance art?

I can’t tell if this is sarcasm or actual confusion, but the purpose is pretty self evident from the name (CLIP is referring to an AI model for text to image). The point of this is to take a picture and generate a prompt that can produce images similar to it using a system like Stable Diffusion or Midjourney.

[Edit: Although there are more serious CLIP interrogators out there and it is possible that the things people are finding objectionable with this one are not just inadvertent problems with the training set used but were purposely added by the author of this one]

For my faculty profile I got:
a man standing in front of a brick building, a character portrait, inspired by Eric Dinyer, academic art, headshot profile picture, an overweight, summer shirt, power bi dashboard, smiling confidently, anxious steward of a new castle, editorial model, background = library, hi-fructose

