Email case sensitivity in Discourse


#1

That’s how the spec works, unfortunately:

this-is-case-sensitive@but-this-is-not.com

Note “local-parts normally are”.


Bbs = BS; Quality of discussion plummets
#2

gmail isn’t


#3

By their own choice.

And some of gmail is, some isn’t. The business/education apps for google do have a setting you can use to make them case sensitive or dot included or not.


#4

Sendmail, qmail, exim, procmail, postfix, eudora, courier, dovecot, UWash IMAP, mercury, Microsoft Exchange, scalix, VAXmail, Novell groupwise and every other mail processing software I’ve ever adminned but can’t remember right now were distributed case-insensitive by default. That Wikipedia quote is very misleading in its use of the word “normally”.

The specifications permit case sensitivity, and several of the aforementioned can be configured for it, but far more than 99% of the Internet’s email addresses are case-insensitive. Thus the majority of the world’s software that interacts with email addresses by default is case-insensitive, and can only be configured for case-sensitivity with some difficulty.

Note that the local part of an email address can contain routing information or free form text. These addresses are all the same mailbox (I redacted the domain, though):

root@mailhost.example.com
“Maurice Meisterberger” <morrie+Root@mailhost.example.com>
“Knuckles Malone” <knuck+ROOT@mailhost.example.com>

I think that if you implement email address case-sensitivity as a default, you’re kind of bucking current best practices on a technicality - observing the letter in preference to the spirit. It should only be a discouraged option. Paul Mockapetris once advised me in a similar situation to follow Postel’s dictum of “be generous in what you accept, and conservative in what you send”. Just as email-aware programs should accept domain names terminated by a dot, but never send them, they should be capable of case sensitive comparisons, but never default to that, and always preserve case of addresses entered by humans.

Edit: fixed pointy brackets to get past meta parsing


#5

This is true… but RFC 5321 says the local part is case SENSITIVE, and further, that no assumptions should be made about the installation and configuration options set on the mailhandling machine.

From the RFC:
http://tools.ietf.org/html/rfc5321

The local-part of a mailbox MUST BE treated as case sensitive.
Therefore, SMTP implementations MUST take care to preserve the case of
mailbox local-parts.

The fact that it is handled as though case insensitive in many configurations and set ups doesn’t change the fact that applications accessing and using the server are explicitly told that they CANNOT ASSUME THAT. And given that discourse does use those email addresses to mail, they have no choice but to assume case sensitivity if they are using email address as log on.


#6

Personally I blame UNIX. Case sensitivity is bullshit, but it’s 100% inherited from UNIX. Windows was never case sensitive in the filesystem, for example.


#7

That’s a draft, not an actual standard. But in any case it doesn’t really say the local part is case sensitive. We’re splitting hairs now, but read it again.

It says, in the context of mail delivery, the local-part must be treated as case-sensitive, because it’s case must be preserved. This is very specifically directed at SMTP implementations - MTAs and their submission queues - and it says so. (Note Discourse is at best a MUA, not an MTA.) Mail Transfer Agents cannot tell how local-parts are interpreted because they do not know the behaviour of the MDA at the receiving end. If the parent OS and/or MDA is case-sensitive, mangling the case in MUA output or MTA breaks email, thus the requirement to preserve case.

The same draft says:

However, exploiting the case sensitivity of mailbox local-parts impedes interoperability and is discouraged.

RFC1123 (which is a standard) quotes and elaborates on Postel:

The second part of the principle is almost as important: software on other hosts may contain deficiencies that make it unwise to exploit legal but obscure protocol features. It is unwise to stray far from the obvious and simple, lest untoward effects result elsewhere. A corollary of this is “watch out for misbehaving hosts”; host software should be prepared, not just to survive other misbehaving hosts, but also to cooperate to limit the amount of disruption such hosts can cause to the shared communication facility.

[quote=“Lion, post:154, topic:4432”]
The fact that it is handled as though case insensitive in many configurations and set ups doesn’t change the fact that applications accessing and using the server are explicitly told that they CANNOT ASSUME THAT. And given that discourse does use those email addresses to mail, they have no choice but to assume case sensitivity if they are using email address as log on.[/quote]

You’ve misinterpreted the scope of the RFC. It says SMTP implementations right in the text you quoted - it says that if I supply an email address to Discourse and it sends mail to me using the SMTP protocol it must preserve the case of the address as I supplied it. This is undeniable. However, it most assuredly does not explicitly say that when using an email address as a validation string completely outside of the SMTP layer I should conform to an inapplicable standard that breaks the robustness principle by inverting user expectations.


#8

And you’re right to do so.

Back in the day there were three camps; unix geeks, who never capitalized anything, mainframers, who just as obsessively capitalized everything, and the DEC guys (us in the PDP and later VAX world) who wrote proper fscking English like humans beings. If you change case due to grammatical rules, your CLI needs to be case-insensitive.

The mainframers and unix geeks ganged up on us and won. The only victory we managed was on the hostnames, because we literally could not support case-senstive hostnames in DECnet et al.


#9

It’s intuitively easier to understand, though, since if a file-system is recording case in its names, it’s easier to know when two dir entries that differ only by case are allowed (i.e., always). In particular, it would be a nightmare with Unicode. Would you have to normalise strings as well as case-fold?

Case-insensitivity is hard, and that’s worse.


#10

Doesn’t matter – you’re going to have to compare case at some point in Unicode, aren’t you? Might as well pay the price up front, because steVe is the same person as Steve.


#11

You know, we can ignore case in some cases, like “forgotten password” or “login” when there is no clash.

If there is only 1 sam@somewhere in the system and no Sam@somewhere in the system we can be somewhat forgiving, keep in mind if the email was activated we already know the one stored in the table is good and this is not a real security threat.


#12

This kind of serious analysis is precisely what the quoted RFCs are trying to encourage.

To be maximally useful, software needs to accommodate case-sensitivity without forcing it inappropriately.

Most mail software uses configurable settings and optimized defaults to achieve this goal, but fundamentally Discourse is not Mailman - it’s using mail addresses for several purposes distinctly different than SMTP transfer operations, so there’s no reason it can’t have a distinctly different (perhaps superior) way of dealing with case issues. Insisting that you have to do it the same way as everyone else is not any better than insisting on conformance to an indefensibly narrow reading of a standards document.

Man, how did I get in an RFC interpretation argument on bOINGbOING? I do enough of that at work. The craziest one was years ago, when IBM insisted that RFC1179’s specification that “The user identification must be 31 or fewer octets” meant you could not have a zero-length user id field “because technically zero is not a number”. The fact that the entire RFC is a description of existing software that actually does permit zero-length fields moved their position not one whit!


#14

Email case sensitivity is no longer a thing in Discourse as of about 6 months ago. So, you guys were right.


#15

I dunno, my JS console disagrees with them on that:

> typeof 0
< "number"
> isNaN(0)
< false

#16