It's especially funny because Snowden was an NSA IT guy. Sure, using wget cURL, etc. is hardly rocket surgery for people a great many rungs down the pecking order, and using the slightly friendlier and less flexible mass-backup tools with friendly GUIs is only a couple of steps above just helplessly drooling on the keyboard; but you'd think that even the dustiest of newspapers would give a techie at an organization that builds among the planet's largest mass-scraping tools credit for not even considering doing it manually if a scraper would work.
(Incidentally, I wonder if the Times' own web people rolled their eyes at this one, probably having used such witchery once or twice...)
Oh my! I do declare, all this nasty talk about scraping is giving me a case of the vapours.
At least they did not invoke the nastiness of rsync....
What officials cannot explain is why the presence of such software in a highly classified system was not an obvious tip-off to unauthorized activity.
Good grief. cURL comes already installed on every recent MacBook.
The key takeaway here is not that the NYT doesn't understand wget, but apparently neither does NSA internal security.
I rolled my eyes at the silliness if the hand-wringing. But then I had a fleeting moment of humor, wondering if (according to the NYT) my decade-worth of familiarity with cURL and wget make me a 1337 hax0r d00d. woot! I've even edited httpd.conf files in vi...but now I'm back to rolling my eyes and feeling bad for the state of our society. That was a fraught 90 seconds, and now I am exhausted.
This one is my favorite:
The gardener used a "lawn mower" to "mow" the lawn. — Marc Andreessen (@pmarca) February 9, 2014
I hope he didn't really use (an unmodified) wget... every logged request using it leaves a red flag, even after you change the user agent and put in a delays between requests: it only makes HTTP/1.0 requests. It passes a Host header, and most servers will respond correctly to a host header with a 1.0 request, but I've occasionally had trouble using it with things that behave differently depending on the version request. cURL is better--unfortunately, I have every wget option memorized, and never can remember curl's.
Actually curl doesn't have crawler features. Wget does, and is ubiquitous because it's installed with the base system on most Linux distributions.
They don't actually say Snowden used wget, they say Manning did. But I agree the tech illiteracy makes this painful to read. Ending the second paragraph "what officials cannot explain is why the presence of such software in a highly classified system was not an obvious tip-off to unauthorized activity." Come on! But later on, the article is interesting in perhaps shedding a bit more light on NSA security culture, describing it as uncompartmented, and again we get how Snowden benefited from a slow rollout of internal monitoring.
I think the best tidbit here is reporting he set the parameter for "how deeply to follow links." So two possibilities: they actually have a .bash_history file or similar that logs the command lines for when he was collecting stuff! From what I've gathered about Snowden's technical abilities, he knew some stuff, so this would have to be interpreted as a fuck you on departing, look how easy this was kind of thing. Or, they have logs of which files were downloaded and they reconstructed that it's what you'd get following N links. That's probably more likely. Someone higher up must have been doing a periodic manual review of logs, maybe based on some aggregate stats. Doing a full crawl would get you caught, so limit the depth and stay just below radar (and in fact he popped up once but only high enough he needed a reasonable sounding explanation).
Quotes can be used to alert the reader that words or phrases are being used in non-standard ways, and I think that is what is happening here - scraping is jargon that the average NY Times reader probably hasn't seen in that context before. Software engineer here - If I was writing documentation for a non-technical audience I wouldn't assume they'd know what scraping means in that context either. If you look past the syntax they are saying how a web crawler is an everyday tool of someone like Snowden. The snark on boingboing and techdirt is extraordinary.
Honestly, I'm a software developer and I still don't get why all this mocking is necessary.
What did the NYT do wrong, or in a tech-illiterate manner in this article? Admit that not everyone reading the article knows what all these tools are?
Let's look at all these quoted tweets mocking the article. #1:
Matt Blaze joked about the fact that children (children!) might download wget:
This has nothing to do with the article. #2:
Snowden, a $120k a year sys admin, knowing how to use web scraping tools, is like a chef knowing how to use a vegetable peeler. Not A1 news.
Exactly. That's why it's interesting. To quote the article itself:
The findings are striking because the N.S.A.’s mission includes protecting the nation’s most sensitive military and intelligence computer systems from cyberattacks, especially the sophisticated attacks that emanate from Russia and China. Mr. Snowden’s “insider attack,” by contrast, was hardly sophisticated and should have been easily detected, investigators found.
Ok, mock #3:
NYT: 'Snowden used "web crawler" software to "scrape" NSA networks...'
The gardener used a "lawn mower" to "mow" the lawn.
The two are not equivalent. The first is jargon that most non-tech literate (probably the majority of the US) aren't expected to know off-hand. The second is non-jargon and is something every adult knows.
And... wait, that's it? Those three things are all it takes to call the Times "technologically illiterate?"
That's completely stupid. Does no one actually understand the difference between the Times being technologically illiterate, and the Times recognizing that the majority of adults aren't familiar with these technologies? Does everyone on the internet have a false-consensus bias and think that all of America is just like themselves? Oh wait, we knew the answer to that one already.
Hmmm... Have we mocked Ars Technica for being technologically illiterate yet?
Wget can be and is often used to set up “mirror” websites. (arstechnica:
NSA let Snowden crawl all over its intranet unchecked)
Guffaw... 'A photocopier is often used to create "duplicate" copies....' How stupid can this 'arse technica' site be...?
Perhaps the reporters on the case were "technologically illiterate" because the article is based on confidential sources. Deputizing another reporter to ask more intelligent questions was probably not in the offing.
Besides, the article says that it's more complicated than "wget."
While Snowden could set the reporters straight, "sources and methods" could be used to build a criminal case. and I'm sure that Snowden would prefer not to work against himself, and instead direct his attentions toward building a case against surveillance.
I agree that most of the mockery is off-point, but there is a case to answer here.
By acknowledging that readers won't understand the terminology, but never actually explaining any of it, they've robbed the piece of important context, and ultimately leave the reader with little understanding of what exactly took place. Without that, there's no way for the reader to understand statements like the "hardly sophisticated" one you quoted.
The piece goes on at length about how "the process... was quite automated”. You and I say "duh, of course it was automated! He downloaded hundreds of thousands of documents!". But you can't argue that readers are unsophisticated enough to need scare quotes without simultaneously arguing that they're unsophisticated enough to benefit from an explanation.
The article leaves the hypothetical layman reader out in the cold, and after reading the article they'll be none the wiser about what Snowden actually did or how he pulled it off.
I think the point is that wget (''wget'') leaves a fingerprint in access logs (''access logs'') via the User Agent (''User Agent'') string - that look unlike any legitimate browser activity the NSA might allow.
I would expect the NSA to anticipate the possibility that someone (probably a Chinese spy, but also a disaffected employee) could gain access and attempt to download their whole database.
Thousands of rapid fire download requests from any source should by rights trigger an automatic Predator Drone strike on the geolocated position of the next chump who got the same dynamically assigned IP address as
SnowdenManning. The NSA will have already written the code to do that somewhere. Why was it not deployed? (compare with arxiv.org's policy: click here but not here).
SnowdenManning manage to get permission to run ''wget''? I reckon he told them he was doing ''load testing'' and they let him .... sneaky bugger.
How then did Snowden manage to get permission to run ''wget''?
Evidence presented during Private Manning’s court-martial for his role as the source for large archives of military and diplomatic files given to WikiLeaks revealed that he had used a program called “wget” to download the batches of files. That program automates the retrieval of large numbers of files, but it is considered less powerful than the tool Mr. Snowden used.
Notwithstanding the detail of who used what for what, thereʼs a trend here of discouraging acceptance of simple utilities. (Which even the Guardian article yesterday did nothing to improve on.) In the UK this may be partly attributable to the ‘lost’ generation of computer-as-appliance thinking promulgated through schools by Microsoft et al. (Approximately, we give you cheap licenses, you pretend computer = Win+MSO+IE+games.)
But now (according to gov.uk) kids are to be taught computer-as-programmable-tool thinking again. So I propose: The next step to tackle this climate of ignorance should be that all schools, or – anyone really, should teach kids (or anyone) to write scripts that use wget to do something simple and obvious, like download a monthʼs worth of BoingBoing articles and sort into daily directories. Or maybe even something creative and useful, who knows. ;-p
Not sure if this would benefit from an award scheme or anything...? The most creative use of wget by an 8-12 yr old; the most annoying use; the most likely to be imprisoned, etc. Oh, and most likely to land a GCHQ job, couldnʼt leave that out.
(Sorry world outside UK; nothing specific for you here. Also, application in rumpUK and Scotland may vary.)
true. Most unix utilities are simple-- what makes them powerful is the way they can be strung together in useful ways.
There's an implication, however, that Snowden's spider was more intelligent than your basic instantiation of wget, and looked for documents that were most relevant to his interests.
But they did:
"A web crawler, also called a spider, automatically moves from website to website, following links embedded in each document, and can be programmed to copy everything in its path. "
An explanation is necessary (I agree) but that one is sufficient. The article is about why he had access to tools which could so easily be used towards these ends, not the tools themselves.
Yes we must put installing and using Linux on the school curriculum for every class in the country. Give school kids an exercise where they have to budget for software and hardware costs for an organisation over a 5 year period, price how much it costs to buy licenses vs. use GPL tools.
Also, a choice of coding questions: write the algorithm for bubble sort / heap sort / quicksort in a language of your choice between C, python, basic, etc. Write a function to calculate factorials using a) loops, b) recursion.
Explain the principles behind public key cryptography. Explain how hashing can provide a message digest and public key cryptography can authenticate the sender of a message. Explain the birthday problem and derive the constraint on the required length of a hash, in binary digits, to ensure that the probabilities of any two messages colliding are a) less than one in a thousand billion b) greater than one in a thousand.