Email case sensitivity in Discourse

codinghorror · August 28, 2013, 6:49am

That’s how the spec works, unfortunately:

this-is-case-sensitive@but-this-is-not.com

Note “local-parts normally are”.

lemonl · August 28, 2013, 3:54pm

gmail isn’t

anon85524460 · August 28, 2013, 4:56pm

By their own choice.

And some of gmail is, some isn’t. The business/education apps for google do have a setting you can use to make them case sensitive or dot included or not.

Medievalist · August 28, 2013, 7:46pm

Sendmail, qmail, exim, procmail, postfix, eudora, courier, dovecot, UWash IMAP, mercury, Microsoft Exchange, scalix, VAXmail, Novell groupwise and every other mail processing software I’ve ever adminned but can’t remember right now were distributed case-insensitive by default. That Wikipedia quote is very misleading in its use of the word “normally”.

The specifications permit case sensitivity, and several of the aforementioned can be configured for it, but far more than 99% of the Internet’s email addresses are case-insensitive. Thus the majority of the world’s software that interacts with email addresses by default is case-insensitive, and can only be configured for case-sensitivity with some difficulty.

Note that the local part of an email address can contain routing information or free form text. These addresses are all the same mailbox (I redacted the domain, though):

root@mailhost.example.com
“Maurice Meisterberger” <morrie+Root@mailhost.example.com>
“Knuckles Malone” <knuck+ROOT@mailhost.example.com>

I think that if you implement email address case-sensitivity as a default, you’re kind of bucking current best practices on a technicality - observing the letter in preference to the spirit. It should only be a discouraged option. Paul Mockapetris once advised me in a similar situation to follow Postel’s dictum of “be generous in what you accept, and conservative in what you send”. Just as email-aware programs should accept domain names terminated by a dot, but never send them, they should be capable of case sensitive comparisons, but never default to that, and always preserve case of addresses entered by humans.

Edit: fixed pointy brackets to get past meta parsing

anon85524460 · August 28, 2013, 8:48pm

This is true… but RFC 5321 says the local part is case SENSITIVE, and further, that no assumptions should be made about the installation and configuration options set on the mailhandling machine.

From the RFC:

The local-part of a mailbox MUST BE treated as case sensitive.
Therefore, SMTP implementations MUST take care to preserve the case of
mailbox local-parts.

The fact that it is handled as though case insensitive in many configurations and set ups doesn’t change the fact that applications accessing and using the server are explicitly told that they CANNOT ASSUME THAT. And given that discourse does use those email addresses to mail, they have no choice but to assume case sensitivity if they are using email address as log on.

codinghorror · August 28, 2013, 9:35pm

Personally I blame UNIX. Case sensitivity is bullshit, but it’s 100% inherited from UNIX. Windows was never case sensitive in the filesystem, for example.

Medievalist · August 28, 2013, 9:44pm

That’s a draft, not an actual standard. But in any case it doesn’t really say the local part is case sensitive. We’re splitting hairs now, but read it again.

It says, in the context of mail delivery, the local-part must be treated as case-sensitive, because it’s case must be preserved. This is very specifically directed at SMTP implementations - MTAs and their submission queues - and it says so. (Note Discourse is at best a MUA, not an MTA.) Mail Transfer Agents cannot tell how local-parts are interpreted because they do not know the behaviour of the MDA at the receiving end. If the parent OS and/or MDA is case-sensitive, mangling the case in MUA output or MTA breaks email, thus the requirement to preserve case.

The same draft says:

However, exploiting the case sensitivity of mailbox local-parts impedes interoperability and is discouraged.

RFC1123 (which is a standard) quotes and elaborates on Postel:

The second part of the principle is almost as important: software on other hosts may contain deficiencies that make it unwise to exploit legal but obscure protocol features. It is unwise to stray far from the obvious and simple, lest untoward effects result elsewhere. A corollary of this is “watch out for misbehaving hosts”; host software should be prepared, not just to survive other misbehaving hosts, but also to cooperate to limit the amount of disruption such hosts can cause to the shared communication facility.

[quote=“Lion, post:154, topic:4432”]
The fact that it is handled as though case insensitive in many configurations and set ups doesn’t change the fact that applications accessing and using the server are explicitly told that they CANNOT ASSUME THAT. And given that discourse does use those email addresses to mail, they have no choice but to assume case sensitivity if they are using email address as log on.[/quote]

You’ve misinterpreted the scope of the RFC. It says SMTP implementations right in the text you quoted - it says that if I supply an email address to Discourse and it sends mail to me using the SMTP protocol it must preserve the case of the address as I supplied it. This is undeniable. However, it most assuredly does not explicitly say that when using an email address as a validation string completely outside of the SMTP layer I should conform to an inapplicable standard that breaks the robustness principle by inverting user expectations.

Medievalist · August 28, 2013, 9:49pm

And you’re right to do so.

Back in the day there were three camps; unix geeks, who never capitalized anything, mainframers, who just as obsessively capitalized everything, and the DEC guys (us in the PDP and later VAX world) who wrote proper fscking English like humans beings. If you change case due to grammatical rules, your CLI needs to be case-insensitive.

The mainframers and unix geeks ganged up on us and won. The only victory we managed was on the hostnames, because we literally could not support case-senstive hostnames in DECnet et al.

AshleyYakeley · August 28, 2013, 11:05pm

It’s intuitively easier to understand, though, since if a file-system is recording case in its names, it’s easier to know when two dir entries that differ only by case are allowed (i.e., always). In particular, it would be a nightmare with Unicode. Would you have to normalise strings as well as case-fold?

Case-insensitivity is hard, and that’s worse.

codinghorror · August 28, 2013, 11:11pm

Doesn’t matter – you’re going to have to compare case at some point in Unicode, aren’t you? Might as well pay the price up front, because steVe is the same person as Steve.

sam · August 28, 2013, 11:27pm

You know, we can ignore case in some cases, like “forgotten password” or “login” when there is no clash.

If there is only 1 sam@somewhere in the system and no Sam@somewhere in the system we can be somewhat forgiving, keep in mind if the email was activated we already know the one stored in the table is good and this is not a real security threat.

Medievalist · August 29, 2013, 5:56pm

This kind of serious analysis is precisely what the quoted RFCs are trying to encourage.

To be maximally useful, software needs to accommodate case-sensitivity without forcing it inappropriately.

Most mail software uses configurable settings and optimized defaults to achieve this goal, but fundamentally Discourse is not Mailman - it’s using mail addresses for several purposes distinctly different than SMTP transfer operations, so there’s no reason it can’t have a distinctly different (perhaps superior) way of dealing with case issues. Insisting that you have to do it the same way as everyone else is not any better than insisting on conformance to an indefensibly narrow reading of a standards document.

Man, how did I get in an RFC interpretation argument on bOINGbOING? I do enough of that at work. The craziest one was years ago, when IBM insisted that RFC1179’s specification that “The user identification must be 31 or fewer octets” meant you could not have a zero-length user id field “because technically zero is not a number”. The fact that the entire RFC is a description of existing software that actually does permit zero-length fields moved their position not one whit!

codinghorror · July 24, 2015, 8:59am

Email case sensitivity is no longer a thing in Discourse as of about 6 months ago. So, you guys were right.

riking · July 29, 2015, 11:10pm

I dunno, my JS console disagrees with them on that:

> typeof 0
< "number"
> isNaN(0)
< false

Topic		Replies	Views
Efail: can email be saved? boing	54	2141	May 26, 2018
Rule mail-fighting-fancy-spam with an iron fist because YOU ARE THE MOTHER F*CKING KING! general topics	64	6846	January 8, 2016
Podcast: "Teaching Computers Shows Us How Little We Understand About Ourselves" boing	4	1357	August 10, 2013
New BBS Accounts meta	35	8593	January 22, 2015
Personal information on the BBS meta	2	570	September 17, 2019

Email case sensitivity in Discourse

Related topics