Paranoid Browsing: anti-profiling plugin seeks feedback




Reverse StumbleUpon.


Interesting concept but surely it would contribute needlessly to any data cap that you might have from your ISP? Seems a bit of a waste of resources to me.


There is no way to disguise traffic without deviating from maximally efficient network behavior (since 'maximally efficient' means only transferring what you want transferred and having your packets take the most efficient path, both of which make it about as easy as it can be to see what you are doing and who is doing what). It's the same issue that TOR runs into: 'onion routing' is a deliberately perverse behavior (by networking standards) that introduces considerable latency and consumes bandwidth across additional links; but there is no more efficient method that still cloaks the user.

Whether it is a 'waste' or not depends on whether you value what it does. Bandwidth spent obfuscating traffic, if you want obfuscated traffic, is as valid as bandwidth spent downloading files, if you want those files.


I never liked this type of obfuscation.
First of all it's creating background traffic, which might not be much, but it's there. Also data caps as someone mentions above.

Second, and most importantly, it gives possible interpretations to that fake data. Imagine there is an specific combination of sites that the dataminers assume to match a type of sexuality, religion, etc. Sure, your profile will be false, but it won't be considered useless: Whoever is spying on you will take it literally because they don't know better, and will add preferences based on the fake data, instead of the opposite. For the better or for the worse.

It's like when you have your youtube suggestions based on what you actually watch, and then someone sends you a nyancat video, you open it, and suddenly your suggestions keep suggesting more nyancat or internet memes like it. Datamining seems to be additive in what it contains, it'll just add new stuff without bothering to check for conflicts, apparently.

Imagine your profile is exactly the same as before, except with random changes based on someone else's ideology. Imagine you are labeled as the wrong sexual orientation, political ideology, religion, etc. Imagine an employer buys access to that profile and discriminates you based on wrong data.


My biggest concern would be the difficulty of creating convincing simulated traffic against a sophisticated adversary with access to all traffic.

In particular, the inclusion of a 'default' seed list seems very problematic. I'd be inclined to not include one (because if the default appears to be working, nobody ever bothers changing it). Ideally, a little wizard (say, something with a copy of a spell-check dictionary that does a bunch of web searches and provides the user with a set of domain names, that they can optionally edit) to autopopulate the list would be nice; but even forced-manual would be an improvement.


Ironically, Chrome only. I mean, he could have made it for a browser not made by Google...


Is that a hard requirement, or do de-Googled, but otherwise similar, Chromium-based browsers work?

A hard requirement would be foolish; but getting a defanged version of Chrome isn't exactly difficult.


Hmm, the plugin leads off talking about "Advertisers and government agencies" - color me skeptical.

Agree w/ weissritter that this sort of "chaff" really just raises surface area for false positives. Say you're searching for backpacks and the background browser does a search for pressure cookers, for example...

Beyond that, if tools like XKeyscore work as stream filters as the leaked docs imply, then once you get targeted, you don't really get much protection at all as finer-grained filters are applied (say at a site or keyword level).

For third party commercial third party datamining, it seems like Disconnect would be fine, or using private browsing for things that you don't want first parties to track. If you want more privacy you could combine that with VPNs and TOR (this of course, doesn't work against state actors - that'll just get you flagged for extra scrutiny and as soon as anything leaks you're hosed).


I assume it will work with Chromium. There is no reason why it wouldn't.

It would be more in line with their ideals to use Firefox though.


Ideologically that is probably true; but (just another little detail that makes their problem more difficult), determining which browser somebody is using by examining traffic on the wire or getting logs from a cooperative host is generally not that difficult.

By default, the browser will send an honest UA string, in the clear, with all sorts of requests. Even if you configure the browser to lie, individual sites often sniff more carefully (so they know what ghastly javascript tricks will or won't work, etc.), sometimes with different resources requested from the server depending on the result.

Unless the system is to be trivially obvious, it is probably necessary to have the spoofed traffic generated by the same browser that the user typically uses for real activity.

I don't know how the stats are these days, so I don't know if Chrome is the best starting point or not; but if you want it to have a chance of working, supporting just one browser, any one, is going to be an issue (and let's not even think about mobile browsers, which are power/data constrained, sometimes locked down, and should be easy to correlate with subscriber data provided by the notoriously privacy friendly Cell companies. People don't necessarily do most of their browsing on phones; because it's unpleasant; but they often hit authenticated services from both phone and PC). It is not an easy problem.


I was thinking the same thing, do security minded people still use Chrome?


Chrome has very good security, actually. I just prefer an open source project and Mozilla is run by a non-profit foundation.

(Disclosure: I work for the Mozilla Corporation.)


Since the type of background browsing this plugin does is configurable, there is no chance a malicious agency could use it on a target, someone who isn't tech savvy, the kind of person with a dozen random toolbars installed,for example, to give them a false profile for browsing terrorist related or illegal content?

Or am I just being super paranoid?


AFAIK it won't work. Given a long enough history it's easy to filter out the noise from the signal. At least that's what I was told by security experts when suggesting similar techniques to hide other data. A quick google brings up this paper.

I'm sure there's more but basically I was told it's security 101 that adding random noise will not hide the data.


The Kargupta paper is nicely general, but the amount of noise they add is not enormous: they note that for SNR << 1 the method doesn't do that well.

A more directly relevant paper is
which shows how to disambiguate interleaved clickstreams using Markov models. This looks harder to spoof: if there is no link from a page on site A to site B and vice versa, then a sequence A1 B1 A2 B2... is pretty easy to separate. Same thing for parts of a site that rarely have transitions between each other. However, this method has problems when users can jump into a site at nearly any point (hard to tell where a session starts) and when sessions overlap in the sequence.

So my suggestion for the plugin is to have the option of using URLs in the current browsing history to create more overlapping clickstreams, and maybe find a few more ways of jumping straight into sites (googling some keywords gleaned from past pages and then jumping?)


I think because the extension clicks on the actual links in the page, the method described in Pozenel et al. won't work.


But can I get a plugin to automatically connect me through seven proxies?


Yes, but by tending towards the more popular websites, all it does is make your browsing history more average than it was. And that's the same as not knowing anything about you.

Why? Because if they didn't know anything about you, they would assume you were average.

Imagine the nyancat video were trending, and everyone was clicking on it. In the absence of any information, YouTube would have to assume you liked nyancat. If you get annoyed at it because it's not as relevant to you, that actually means that you secretly like the fact that YouTube is knows enough about you to show you videos about makers, or about backpacks and pressure cookers, or whatever it is you tend to search for, and not show you nyancat videos.

If you don't actually like that fact, then you can install this and, sure, you'll start seeing more ads for beer and football. But that's exactly what you would have been seeing in the absence of other information.


Unless it downloads movies, you're unlikely to hit any reasonable* data-cap using this.

  • yeah, okay. Poor choice of word. No data-cap is reasonable.