In Search of a Practical PDF Framework
So here I am in that dark place again; I need to produce pretty PDF reports from a PHP web app, with headers and footers and tables that wrap and paginate nicely. Every time I get here I look for an elegant solution, but frankly all of them ended up sucking.
My last approach was built around TCPDF. It’s an excellent low-level PDF toolkit, but any non-trivial document generation code always ends up ugly as hell. How is it that 25 years after PostScript promised us “device independent” document layout, I am still dicking around writing layout and pagination logic for PDF reports?
XSL-FO is an XML markup language that lets you specify document layout in a sane way, and Apache’s FOP will render XLS-FO documents into PDF’s. This is really cool - these are the droids I’ve been looking for; they take care of all of the heavy lifting, so we just need to build a PHP interface.
But there’s a wee problem. After years in the wilderness writing PHP code to emit well formed XHTML, then working on a few Rails projects using HAML… well what has been seen cannot be unseen. Now I use PHAML in PHP for any non-trivial XHTML generation. HAML lets you neatly isolate your layout logic from your application logic, and leaves your code and your breath minty fresh. So I really can’t get all excited about writing PHP to spew forth well-formed XLS-FO.
FOML - Formatting Objects Markup Language
So, duh, connect the dots. Steal Hampton Catlin’s goddam brilliant insight that became HAML and apply it to XSL-FO. FOML is to XSL-FO as HAML is to HTML. Consider this “hello world” in XSL-FO:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Here’s the same thing expressed in FOML:
1 2 3 4 5 6 7 8 9 10
Unless you work in a Hello World factory (sweet job!) in practice the layout-master-set block gets more complex, and is likely to be common to all of the documents in a project, so lets use an :include filter to pull in a shared layout-master-set partial. HAML doesn’t have an include filter, FOML does.
1 2 3 4 5 6 7 8
FOML With Inline Code
Concise syntax and automatic closing of tags is nice, but it is not why we are here. FOML, like HAML, supports inline code. The following FOML document iterates through an array of records and generates a nicely wrapped and paginated 3-column multi-page table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
And here’s how you would turn that into a PDF from PHP.
1 2 3 4
Easy as, right?
A PHP library implementing FOML is available on github. The library requires Apache FOP, which in turn requires Java. It is complete enough to be useful, but is not feature-complete. The documentation describes the features that are implemented.
Also note that the overhead of starting Java for Apache FOP is non-trivial. On my machines it adds >2 seconds, so even a trivial PDF report takes that long to generate. At some point I plan to try running FOP as a daemon to eliminate the startup latency, but for now expect XLS-FO to PDF translation to take a few seconds.