“Mom, Dad, where do emoji come from?” The Unicode Consortium, son

[Read the post]

Which gives us the following sports:

  • runner
  • walking
  • dancer
  • rowboat
  • swimmer
  • surfer
  • bath
  • snowboarder
  • ski
  • snowman
  • bicyclist
  • mountain_bicyclist
  • horse_racing
  • tent
  • fishing_pole_and_fish
  • soccer
  • basketball
  • football
  • baseball
  • tennis
  • rugby_football
  • golf
  • trophy
  • running_shirt_with_sash
  • checkered_flag
  • musical_keyboard
  • guitar
  • violin
  • saxophone
  • trumpet
  • musical_note
  • notes
  • musical_score
  • headphones
  • microphone
  • performing_arts
  • ticket
  • tophat
  • circus_tent
  • clapper
  • art
  • dart
  • 8ball
  • bowling
  • slot_machine
  • game_die
  • video_game
  • flower_playing_cards
  • black_joker
  • mahjong
  • carousel_horse
  • ferris_wheel
  • roller_coaster

No rifles, no hand-guns, no target-pistols.

SHOCKED, SHOCKED I AM!

although not as shocked as I will be if this thread manages to remain civil, sane, and on the topic of emojis in general

1 Like

It’s entirely possible that I’m just a grumpy old man who is busy defending his lawn from kids these days; but it has been really depressing to watch the Unicode Consortium somehow get dragged into the business of being fairly close to the leading edge of the process of spewing out new emoji.

It all started innocently enough: Unicode has always balanced a desire for technical sanity and actually-being-implementable-in-finite-time-by-finite-entities with a desire to get adopted, which requires a certain amount of…tolerance…of various legacy encodings.

ASCII was incorporated as a proper subset for that reason, as were various other encodings in common use, even if they introduced duplicate characters, or involved choices contrary to the preferred Unicode way of doing things(eg. ligatures and digraphs are supposed to be handled by using the appropriate combination of discrete glyphs, not given their own codepoints; but various legacy encodings had ligatures and digraphs implemented that way, and backward compatibility was needed, so they got codepoints; lesser of two evils).

In the case of emoji, the Japanese handset market was unbelievably dysfunctional. A bunch of emoji floating around; but encoding could differ between carriers, between handset vendors, possibly even between different combinations of the two. Implementing translation layers so that messages between users on different handsets or different networks was bad enough; and potential foreign entrants to the market were loath to touch such a quagmire.

So, the Unicode consortium was called in and, as with other legacy encoding messes, just did what had to be done. All the emoji were lined up, any duplicates culled, and the remainder assigned code points. Ugly, completely idiosyncratic, and based on nothing except the inertia of certain twee little pictures in the Japanese text messaging market; but it was the closest thing to a clean break that could be arranged, and at least cauterized the oozing pustule that was the prior encoding arrangement.

Then things started to go bad: In Apple’s default system font, ‘smiley face’ was yellow. Accusations of racism arose. Apple (cynically and dishonestly) claimed that Unicode was at fault when, in fact, all Unicode did was specify that a given code point was ‘smiley face’ and offered no further clarification or specificity. Now it seems like everyone with some awful bit of clip-art wants their own codepoint; at the same time as a number of actual natural languages remain unincorporated or ill supported.

The Unicode Consortium has always had to deal with idiosyncratic and historically contingent situations(they are attempting to tackle natural langue, after all); but mayhem was, somewhat, mitigated by the fact that they were in the business of either absorbing legacy encodings that had already been market-tested and become entrenched, where those existed, or designing encodings in consultation with the relevant experts in the case of languages without active IT markets and legacy standards(whether because they are dead, alive but spoken by people without much IT in use, or whatever).

Now, they appear to be the place where every last idiotic proposal gets made first, without first undergoing proof and refinement by real world use. As best I can tell, they aren’t equipped for that. The task of describing the world’s characters is vast enough; but at least you can approach it empirically. The task of incorporating random images thrown at you is unbounded; and largely without any criteria for guiding inclusion and exclusion. When you simply describe the world, 'Well, do people use it?" is all you need to know. Once you abandon that criterion, how do you distinguish between emoji that just have to get into the next revision and ones that are pointless?

5 Likes

As an engineer there are two types of problems that give me night sweats. Time, and character encoding. Both appear deceptively easy, but are the product of millions of man hours gettin’ it wrong.

3 Likes

Wow. As a civilian, I think I can say this doesn’t effect me in the slightest.

Consider yourself lucky. Character encoding sniffing is a darker art than anything Voldemort ever practiced. Writing Cyrillic to English web scraping bots for hostile websites still makes me want to pop a few Xanax.

3 Likes

effective.
Power
لُلُصّبُلُلصّبُررً ॣ ॣh ॣ ॣ

snort
My phone gets the encoding wrong.

Castle Doctrine FTW

Taking a bath is a sport?!

Also, I wasn’t the only one to read the headline as “The Unicorn Consortium” was I?

1 Like

So is wearing/listening-to a pair of headphones. YMMV

This topic was automatically closed after 5 days. New replies are no longer allowed.