That’s pretty much what I was thinking – what ISP collects “browser histories” and stores full URLs? Yes, there are legitimate uses for retaining DHCP logs (who had what dynamic IP address assignment, when) and for collecting “Netflow” type records from routers (basically what IP and port was accessed at what time, and number of bytes/packets transmitted and received).
There’s zero legitimate reason for an ISP to log full URLs accessed by users.
I run ISPs and Fortune 500 corporate networks, and no ISP I’ve been involved with has stored any record of user activity beyond IP assignment history. Yes, some corporate networks break SSL, filter out porn, and log URLs, but the (mostly Fortune 1000) firms I work with only kept the data around long enough to deal with security and performance issues, don’t let HR go trawling through browser histories to find people to fire.
For example, at a Fortune 500 media company, IT Security kept DHCP logs forever, Netflow for 1-3 months, and full URL logs for about 4 days. We didn’t filter URLs by category or do deep inspection on SSL partly because the company didn’t have the budget, but mostly because the staff reporters would get all worked up about “First Amendment” if there was even a hint that “Corporate” was even considering doing either.
Under current HTTPS standards, the server name (domain name part of the URL) is sent in clear text, so even with encrypted websites the name of the site being visited is available to a DPI sniffer.
The ISP can also correlate DNS requests, the DNS reply with server IP, and subsequent connections to that IP address from the client, giving them a reasonable guess of what hostnames are being accessed, even when the client protocol doesn’t expose the hostname.
Are you sure? I thought the browser connects for the TLS handshake directly to the server IP, one of the reasons why SSL certs and multiple sites on one web server are not very convenient.
How does this fly with the Trans-Pacific Partnership?
Would 007 get to look through anyone’s history whom has packets that transmit through the London+haggis+potato patch of the internet? What aboot through Her Majesty’s colonies?
First, if a sniffer captures the DNS query from the client, you know what names he is looking for and what IP he will be talking to to reach that name, this works for many protocols, not just web.
Second, in all versions of TLS, the subject name (site) is sent unencrypted by the web server as part of sending the certificate; from this a sniffer learns what the web server claims it’s name is.
And most recently, the “fix” for multiple HTTPS sites on on a single IP is Server Name Identification (SNI), where the client sends the desired server name, unencrypted, early in the TLS negotiation, though this may change in a future version.