Itâs especially funny because Snowden was an NSA IT guy. Sure, using wget cURL, etc. is hardly rocket surgery for people a great many rungs down the pecking order, and using the slightly friendlier and less flexible mass-backup tools with friendly GUIs is only a couple of steps above just helplessly drooling on the keyboard; but youâd think that even the dustiest of newspapers would give a techie at an organization that builds among the planetâs largest mass-scraping tools credit for not even considering doing it manually if a scraper would work.
(Incidentally, I wonder if the Timesâ own web people rolled their eyes at this one, probably having used such witchery once or twiceâŚ)
Oh my! I do declare, all this nasty talk about scraping is giving me a case of the vapours.
At least they did not invoke the nastiness of rsyncâŚ
What officials cannot explain is why the presence of such software in a highly classified system was not an obvious tip-off to unauthorized activity.
Good grief. cURL comes already installed on every recent MacBook.
The key takeaway here is not that the NYT doesnât understand wget, but apparently neither does NSA internal security.
I rolled my eyes at the silliness if the hand-wringing. But then I had a fleeting moment of humor, wondering if (according to the NYT) my decade-worth of familiarity with cURL and wget make me a 1337 hax0r d00d. woot! Iâve even edited httpd.conf files in viâŚbut now Iâm back to rolling my eyes and feeling bad for the state of our society. That was a fraught 90 seconds, and now I am exhausted.
This one is my favorite:
The gardener used a "lawn mower" to "mow" the lawn.
â Marc Andreessen (@pmarca) February 9, 2014
I hope he didnât really use (an unmodified) wget⌠every logged request using it leaves a red flag, even after you change the user agent and put in a delays between requests: it only makes HTTP/1.0 requests. It passes a Host header, and most servers will respond correctly to a host header with a 1.0 request, but Iâve occasionally had trouble using it with things that behave differently depending on the version request. cURL is betterâunfortunately, I have every wget option memorized, and never can remember curlâs.
Actually curl doesnât have crawler features. Wget does, and is ubiquitous because itâs installed with the base system on most Linux distributions.
They donât actually say Snowden used wget, they say Manning did. But I agree the tech illiteracy makes this painful to read. Ending the second paragraph âwhat officials cannot explain is why the presence of such software in a highly classified system was not an obvious tip-off to unauthorized activity.â Come on! But later on, the article is interesting in perhaps shedding a bit more light on NSA security culture, describing it as uncompartmented, and again we get how Snowden benefited from a slow rollout of internal monitoring.
s/rocket/packet
I think the best tidbit here is reporting he set the parameter for âhow deeply to follow links.â So two possibilities: they actually have a .bash_history file or similar that logs the command lines for when he was collecting stuff! From what Iâve gathered about Snowdenâs technical abilities, he knew some stuff, so this would have to be interpreted as a fuck you on departing, look how easy this was kind of thing. Or, they have logs of which files were downloaded and they reconstructed that itâs what youâd get following N links. Thatâs probably more likely. Someone higher up must have been doing a periodic manual review of logs, maybe based on some aggregate stats. Doing a full crawl would get you caught, so limit the depth and stay just below radar (and in fact he popped up once but only high enough he needed a reasonable sounding explanation).
Quotes can be used to alert the reader that words or phrases are being used in non-standard ways, and I think that is what is happening here - scraping is jargon that the average NY Times reader probably hasnât seen in that context before. Software engineer here - If I was writing documentation for a non-technical audience I wouldnât assume theyâd know what scraping means in that context either. If you look past the syntax they are saying how a web crawler is an everyday tool of someone like Snowden. The snark on boingboing and techdirt is extraordinary.
Honestly, Iâm a software developer and I still donât get why all this mocking is necessary.
What did the NYT do wrong, or in a tech-illiterate manner in this article? Admit that not everyone reading the article knows what all these tools are?
Letâs look at all these quoted tweets mocking the article. #1:
Matt Blaze joked about the fact that children (children!) might download wget:
This has nothing to do with the article. #2:
Snowden, a $120k a year sys admin, knowing how to use web scraping tools, is like a chef knowing how to use a vegetable peeler. Not A1 news.
Exactly. Thatâs why itâs interesting. To quote the article itself:
The findings are striking because the N.S.A.âs mission includes protecting the nationâs most sensitive military and intelligence computer systems from cyberattacks, especially the sophisticated attacks that emanate from Russia and China. Mr. Snowdenâs âinsider attack,â by contrast, was hardly sophisticated and should have been easily detected, investigators found.
Ok, mock #3:
NYT: âSnowden used âweb crawlerâ software to âscrapeâ NSA networksâŚâ
The gardener used a âlawn mowerâ to âmowâ the lawn.
The two are not equivalent. The first is jargon that most non-tech literate (probably the majority of the US) arenât expected to know off-hand. The second is non-jargon and is something every adult knows.
And⌠wait, thatâs it? Those three things are all it takes to call the Times âtechnologically illiterate?â
Thatâs completely stupid. Does no one actually understand the difference between the Times being technologically illiterate, and the Times recognizing that the majority of adults arenât familiar with these technologies? Does everyone on the internet have a false-consensus bias and think that all of America is just like themselves? Oh wait, we knew the answer to that one already.
Hmmm⌠Have we mocked Ars Technica for being technologically illiterate yet?
Wget can be and is often used to set up âmirrorâ websites. ([arstechnica:
NSA let Snowden crawl all over its intranet unchecked][1])
Guffaw⌠âA photocopier is often used to create âduplicateâ copiesâŚâ How stupid can this âarse technicaâ site beâŚ?
[1]: NSA let Snowden crawl all over its intranet unchecked | Ars Technica
Perhaps the reporters on the case were âtechnologically illiterateâ because the article is based on confidential sources. Deputizing another reporter to ask more intelligent questions was probably not in the offing.
Besides, the article says that itâs more complicated than âwget.â
While Snowden could set the reporters straight, âsources and methodsâ could be used to build a criminal case. and Iâm sure that Snowden would prefer not to work against himself, and instead direct his attentions toward building a case against surveillance.
I agree that most of the mockery is off-point, but there is a case to answer here.
By acknowledging that readers wonât understand the terminology, but never actually explaining any of it, theyâve robbed the piece of important context, and ultimately leave the reader with little understanding of what exactly took place. Without that, thereâs no way for the reader to understand statements like the âhardly sophisticatedâ one you quoted.
The piece goes on at length about how "the process⌠was quite automatedâ. You and I say âduh, of course it was automated! He downloaded hundreds of thousands of documents!â. But you canât argue that readers are unsophisticated enough to need scare quotes without simultaneously arguing that theyâre unsophisticated enough to benefit from an explanation.
The article leaves the hypothetical layman reader out in the cold, and after reading the article theyâll be none the wiser about what Snowden actually did or how he pulled it off.
I think the point is that wget (ââwgetââ) leaves a fingerprint in access logs (ââaccess logsââ) via the User Agent (ââUser Agentââ) string - that look unlike any legitimate browser activity the NSA might allow.
I would expect the NSA to anticipate the possibility that someone (probably a Chinese spy, but also a disaffected employee) could gain access and attempt to download their whole database.
Thousands of rapid fire download requests from any source should by rights trigger an automatic Predator Drone strike on the geolocated position of the next chump who got the same dynamically assigned IP address as SnowdenManning. The NSA will have already written the code to do that somewhere. Why was it not deployed? (compare with arxiv.orgâs policy: click here but not here).
How did SnowdenManning manage to get permission to run ââwgetââ? I reckon he told them he was doing ââload testingââ and they let him ⌠sneaky bugger.
How then did Snowden manage to get permission to run ââwgetââ?
He didnât.
Evidence presented during Private Manningâs court-martial for his role as the source for large archives of military and diplomatic files given to WikiLeaks revealed that he had used a program called âwgetâ to download the batches of files. That program automates the retrieval of large numbers of files, but it is considered less powerful than the tool Mr. Snowden used.
Notwithstanding the detail of who used what for what, thereĘźs a trend here of discouraging acceptance of simple utilities. (Which even the Guardian article yesterday did nothing to improve on.) In the UK this may be partly attributable to the âlostâ generation of computer-as-appliance thinking promulgated through schools by Microsoft et al. (Approximately, we give you cheap licenses, you pretend computer = Win+MSO+IE+games.)
But now (according to gov.uk) kids are to be taught computer-as-programmable-tool thinking again. So I propose: The next step to tackle this climate of ignorance should be that all schools, or â anyone really, should teach kids (or anyone) to write scripts that use wget to do something simple and obvious, like download a monthĘźs worth of BoingBoing articles and sort into daily directories. Or maybe even something creative and useful, who knows. ;-p
Not sure if this would benefit from an award scheme or anything� The most creative use of wget by an 8-12 yr old; the most annoying use; the most likely to be imprisoned, etc. Oh, and most likely to land a GCHQ job, couldnʟt leave that out.
(Sorry world outside UK; nothing specific for you here. Also, application in rumpUK and Scotland may vary.)
true. Most unix utilities are simple-- what makes them powerful is the way they can be strung together in useful ways.
Thereâs an implication, however, that Snowdenâs spider was more intelligent than your basic instantiation of wget, and looked for documents that were most relevant to his interests.
But they did:
"A web crawler, also called a spider, automatically moves from website to website, following links embedded in each document, and can be programmed to copy everything in its path. "
An explanation is necessary (I agree) but that one is sufficient. The article is about why he had access to tools which could so easily be used towards these ends, not the tools themselves.
Yes we must put installing and using Linux on the school curriculum for every class in the country. Give school kids an exercise where they have to budget for software and hardware costs for an organisation over a 5 year period, price how much it costs to buy licenses vs. use GPL tools.
Also, a choice of coding questions: write the algorithm for bubble sort / heap sort / quicksort in a language of your choice between C, python, basic, etc. Write a function to calculate factorials using a) loops, b) recursion.
Explain the principles behind public key cryptography. Explain how hashing can provide a message digest and public key cryptography can authenticate the sender of a message. Explain the birthday problem and derive the constraint on the required length of a hash, in binary digits, to ensure that the probabilities of any two messages colliding are a) less than one in a thousand billion b) greater than one in a thousand.