Request for BB crowdsourcing unit: Anyone know of a thing that parses URLs to check them for hidden code?


Copy of my post to codinghorror:

Hey man BB crowdsourcing unit, just wondering if you know of any tools that exist to parse a URL (or any string of text) and analyse it for information coded in the URL using known substitution ciphers like rot13 or similar.

It just dawned on me that many phishing attacks could be mitigated or more easily detected if you had a browser addon that checks URLs for such coded information before you’re sent to the phishing page, since the phishing pages are usually personalised in some way to seem convincing and since that personalisation is often done through coded URLs.


Man, its a hard one. I was looking through mysql encodings today and I think I saw several hundred. Not only that internationalized domains are being encoded. And it is almost never (almost being operative) that the encoding itself is the issue. On top of that, you.can pry base64 from my cold dead hands.

There are techniques that could be used such as trying lots of decoding techniques in an attempt to identify say personal info or misdirection attacks. But given that giant corporations use this legitimately (cougheBay) that throws another monkey wrench into things.

For phishing DKIM plus phone authentication is the best we have. I’ve had pointed, angry discussions about this and there is a large contingent that wants to “save” email and http. It will be interesting to see if we can.

Tldr: attacking encodings is attacking an adversaries strength. We need other options.


Thanks for the info. Yeah… it seems like a tricky task indeed! There’s a gmail phishing thing going around that my friend got burned by yesterday. As soon as I opened the email that came from his gmail account it looked immediately suspicious. I pasted the link into a private window of a different browser and it loaded (except for the janky domain and URL) an extremely convincing Google login page, filled with my email address (that must have been encoded in the URL). It looked a bit like this:

I figured it would be good for less paranoid/careful users to have a browser plugin that just alerted you anytime it detected a URL that had something encoded in it.

Definitely agree about finding other options cause encoding obviously has its uses!


There are some really convincing ones, and something like that, if a real password was inputted would be game over.

I’m curious though, what did the phishing attack look like? And did the “gmail page” appear to be encrypted? Most people don’t even know where to look for those indicators, so it isn’t usually a great help :stuck_out_tongue:


Anyone with an understanding of these things would have seen the signs, but obviously my friend did not.

This is the email I received from him:

And the domain of the screenshot from my previous post wasn’t an https page:

It gave me a sweet opportunity to tell people about 2 factor auth since the most likely time someone will implement a security improvement is when they’ve been hit. Bet he still isn’t using it though…


In addition to the mass of possible encodings @japhroaig mentions, there’s also the possibility (at the expense of some additional load on the server hosting the phishing page) that the URL encodes absolutely nothing of interest except a unique ID that the server uses to look up and display the correct personalizations.

That does require storing and looking up the values on the server side; but you don’t need all that many bits before even randomly generated IDs become vanishingly unlikely to collide with one another; and it simply isn’t possible to ‘decode’ the URL without having the stored IDs and values because the data simply aren’t in the URL. Even worse, using little chunks of user ID, session ID, etc. is something that wholly legitimate operators also do to direct you to the correct part of their site. This makes just blocking URLs with obscure looking blobs in them a difficult blanket policy.

(Though, doing it manually in certain cases may actually be good advice… It would be analogous to the old advice for screening phone scammers: if somebody claiming to be your bank calls you, you can’t reasonably verify that they are who they claim to be. However, if your bank is actually calling you about something, that something will be there if you call them, at whatever published or previously established number they had. In the same way, if you allegedly have an email from any links it contains could be anything and probably are; but if you manually go to and log in you skip any potentially malicious link while still seeing whatever the alert is, if genuine.)

A potentially stronger defense(though one that requires trusting a local password store, or at least hash store) might be intercepting inputs that match a given username/password pair on unusual domains.

If I am, with password ‘password’, I should have very, very, limited reasons to be sending the user/‘password’ inputs to a different domain(even if I am being wicked and reusing passwords, I’ll still have a small number of domains I actually log into, for which I would be prompted once, and a whole internet of potential phishing domains or IPs that I should have no reason to ever log into).

If the username/password/domain associations, or just their hashes, were known by the browser it could look for form inputs that matched a given username/password pair; but did not match that pair’s associated domain. So attempting to send user/‘password’ to would occur normally, while attempting to send user/‘password’ to would ask me if I was intentionally re-using credentials or if I had been tricked and would like to not submit that form.

I don’t like the dependence on local memory(particularly troublesome if you are logging in from a different system); but it does massively cut down on the number of dangerous pieces of data you have to look for, as well as ensuring that those pieces will show up in the clear at least once(since your browser handles presenting the form, accepting your input, displaying the characters or little password pips, it already sees your input at least once).


This topic was automatically closed after 927 days. New replies are no longer allowed.