The impossible marriage of PDF and HTML

Converting HTML to print ready PDFs with pdfChip


It is hard to imagine two file formats as different as PDF and HTML.

PDF (or the Portable Document Format) was designed as a page description language. It focuses entirely on the visual appearance of each page. Structure – to PDF – is not essential. The file format doesn’t have a concept of a "word", "line of text", or "paragraph",… let alone more complex structures. Ensuring all page elements can be faithfully reproduced in the right spot is key.

HTML (or the HyperText Markup Language) is the exact opposite. HTML is concerned with the structure of the content, not its precise reproduction. How its content will look and where it will be reproduced are largely inconsequential. Or better still, it shouldn’t be taken into account.

The difference, of course, stems from where both formats were designed to be used. HTML was created for use on the Internet, where no actual "pages with a fixed size" exist, so it had to be able to adjust to being shown in different environments. PDF, on the other hand, started from the concept of a page with a fixed size because it was destined for electronic documentation, use of fixed-size content on the Internet, and later, print.

Using HTML to generate PDF?

So these two formats must be completely incompatible, correct? Well… not really. Look at the callas pdfChip product that uses HTML to generate print-ready PDF files. How is that possible, and why would you do it?

Let’s start with why you would do it. HTML may be obsessed with structure rather than representation, but of course, the companion CSS format focuses precisely on that. So, it is definitely possible to format HTML content in precise ways. Moreover, HTML engines (the formatting engine behind modern web browsers such as Safari, Chrome, Edge…) are incredibly flexible and fast.

So, if you have to generate PDF files from information in a database, for example, using the engine from a web browser and an HTML template is a very smart choice. HTML is much easier to write (and maintain!) than custom software code using a low-level PDF library. Even better, such an HTML engine also supports JavaScript. This allows for fully dynamic templates or templates that use existing web-JavaScript libraries such as JQuery.

But HTML isn’t for print!

So, what about all the differences between HTML and PDF? HTML isn’t page-based, doesn’t support many print color spaces (such as CMYK, spot colors…), and doesn’t know about overprint, output intents, PDF/X…

There are various ways of overcoming these limitations. pdfChip extends HTML, CSS, and JavaScript to allow print-specific content. As a very small example, the way in CSS to specify the color red for paragraphs is:

p {
 color: red;
}

When making a pdfChip template, you can adjust that to:

p {
 color: -cchip-cmyk( “callas Red”, 0, 1, 1, 0 );
}


This would cause pdfChip to generate elements that use a spot color that looks red and is named "callas Red". The number of additions to HTML and CSS are relatively minor, but they still allow pdfChip to generate completely compliant PDF/X content.

And yes, writing HTML and CSS for pdfChip is undoubtedly different from writing Internet content, but getting up to speed with even complex HTML templates for pdfChip is a relatively quick process for someone with HTML experience. Something that is not the case if a low-level PDF library would have to be used.

Back to overview
 

Subscribe to our blog newsletter for access to regular updates

No strings attached. Unsubscribe anytime. For further details, review our Privacy Policy.