Pandoc is a universal document converter

Originally published at: https://boingboing.net/2024/02/02/pandoc-is-a-universal-document-converter.html

6 Likes

Pandoc is probably the first fully fledged programs written in Haskell that this sorry pilgrim has encountered, (which can have the side effect of having to have the whole recursive Haskell library on board if you want a local copy; although there are pre-compiled options). Haskell being a quite pure functional language that maths centered folk take especial pride in. (yknow: “A monad is just a monoid in the category of endofunctors” kinda thing)

14 Likes

What, no ClarisWorks support? Damn it …

Fortunately, there’s LibreOffice for that.

I does look like I may have to write my own converters for the old View and Wordplus formats from the BBC Micro though.

7 Likes

Glad the world is discovering this gem. Our current workflow requires us to publish documents in many places. We write it up in Markdown in a git repo and use pandoc to export to:

  • Jira/Confluence format for a few different internal wikis
  • an HTML file for some random proprietary web system
  • Word to be put into a Sharepoint

In past, I’ve done Word → Jira for vulnerability reports.

Pandoc isn’t perfect, but works wonders.

12 Likes

Ha, first thing I did when someone told me it converts everything was to look up if it supported SpeedScript. Just to be ornery.

Edit: quoted the wrong part of the message

2 Likes

I’m a big fan. I know I only scratch the surface of what it can do (and features are added steadily), but it’s still super-useful.

1 Like

Can I use it to convert an MPEG into a 9th-Century illuminated manuscript?

10 Likes

For converting one set of Markdown files into both an ePub file and a PDF document ready for publication, it works great. I do need to use a few of other support files like LaTeX templates and Lua filters to get there, but the end result is fantastic.

5 Likes

Sure!

9 Likes

Clarisworks was the first thing I looked for too. Hardly seems “universal” if it supports exactly three word processor formats - but I guess it supports everything useful and current.

Things were wild 30 years ago – I can remember when WordPerfect 6 came out and broke compatibility with existing WP 5.1 converters (such as those included in Clarisworks). And didn’t ol’ Word for Windows 2.0 have a couple of different versions with mutually-incompatible document types?

2 Likes

I could try it on those old WordSTAR documents I have buried on my hard drive somewhere.

If it can convert the godawful .pdfs that financial advisors, lawyers and HR depts send us into something that will fit onto our shitty old database, then I’m sold. Gonna fool around with it next week and see.

If you need to produce reports or presentations containing figures and tables, https://quarto.org/ is your tool to fully exploit the possibilities of pandoc. You’ll have the code to produce figures, tables etc. in the same .qmd document as the text explaining stuff. The .qmd file is converted into a Markdown (.md) by executing the code in the qmd file and placing the results in the .md file. This .md file can then in turn be converted into all sorts of formats using pandoc.

2 Likes

dkoBF7s

12 Likes
4 Likes

There’s another project called Panda that lets you include Graphviz and some other diagram markups inline, and does text conversion through Pandoc.

2 Likes

Interesting! Quarto has support for native mermaid code blocks to create diagrams, too.

1 Like

Thanks! I’d not heard of it.

R + knitr is my friend, so I’m curious about yet another markup language knitr can process. Neato.

ETA: Just had a look and I’m curious - what would you say that Quatro + knitr does that Rmd + knitr doesn’t? A quick looks seems to show me that cross-referencing is easier, but I suspect there’s more.

1 Like

If your issue is tables in PDFs, I can recommend Tabula. HSBC’s CSV exporter stopped working for a while, and using Tabula on the PDF statements was my stopgap.

2 Likes

Another vote for Tabula. I get a lot of technical specifications with tables in pdf, and Tabula has been amazing.

1 Like