Turn web articles into readable, printable PDFs


#1

Originally published at: https://boingboing.net/2018/02/05/turn-web-articles-into-readabl.html


#2

I mean… can’t we already do this with print to pdf?


#3

Use Pandoc instead and turn a web page into a PDF, or an EPUB, or a LaTeX document, or about 30 more obscure formats!

wget -q https://somesite.web -O - | pandoc -f html -o ~/thatsite.pdf


#4

Been using this at work for about a year now…https://www.printfriendly.com. It’s fantastic.


#5

https://www.gmail.com/mail/help/paper/


#6

Close. It looks more like printing Safari’s reader view then a straight print of the page. It also attempts a few other layouts (like headline with 2 column layout) and picks a “good one”.

Definitely not a “new thing”, but more of a “better version of an existing thing”


#7

If you just want a 300 word blog post to be spread over 5 pages of page with broken ads and mangled stylesheets, sure!

It’s basically the “reader” functions of browsers, but it does some extra layout magic to make sure it all fits neatly over however many paper pages would be right.


#8

Because fuck trees. They’ve had it too easy recently. Besides, we have too many trees now and they are making our kids sick. Children need to be exposed to fewer trees so they can grow up healthy. No, I am totally not planning on applying for a job in the Trump EPA. Better yet, print on plastic! Beautiful clean burning plastic.


#9

I like Print Friendly and PDF, too, though it has stopped allowing me to download the PDF after it has been generated.


#10

Really? I’ve never encountered that problem. Just downloaded several on Friday at work. Could it be a browser issue? Also, do you use the browser extension? I use that regularly and have never had an issue.


#11

I am using the browser extension and have yet to figure out why it stopped
a few weeks ago.

Thanks.

James


#12

Just tried that with https://boingboing.net/ and it choked on the tracking pixels. I wish I was kidding.


#13

It doesn’t work.

For years I’ve tried to find the best way to save a webpage that works offline exactly as it works online. Mostly, I would save the page, then laboriously work through the stylesheets to download the graphics and other missing elements that the browser can’t save, then I’d work through the javascripts to disable the almost endless calls to shitty advert and aggregator sites.
But that’s a process that can often take hours and website layouts change with the fickleness of 22-year old media graduates and their dyed hairstyles. Also: Fuck you, Kinja for this exact manner of bullshit.

So I’ve tried image capture extensions but none of them work 100%, since they all seem to be 32-bit based software ‘borrowed’ from Sourceforge.net, promising premium features if I pay them to remove a watermark - and they all aggregate what you try to save anyway.

So as an example of this fail:

On a 64-Bit Windows 10 PC with 16Gb of DDR5 RAM in Firefox Quantum, using the current release of Nimbus Capture, this page never saves.

Using the software here, Simple Print gives you the text of the article but not any graphics,
videos or external links at all.
Almost all webpages - and even many scientific papers - contain both graphics and links to external URLs.

So what’s the fucking point of Simple Print?

Anyone?


#14

Yeah, what have the trees ever done for us?


#15

This topic was automatically closed after 5 days. New replies are no longer allowed.