Publish smart – The Internet Standards Series


Luckily, the times when people tried to get attention starting discussions whether HTML or PDF is the better format are over (or should I say hopefully?). Such discussions are as useful as asking whether a phone call or an email is better or a truck versus a racing car. In today’s digital world with its rich communication capabilities, the question what channel or format should be used has to be asked for the individual type of publication and is indeed not always easy to answer.

The RFC Series is the home for internet standardsand relatedbest practices and informational documentation developed by responsible organizations: the Internet Engineering Task Force (IETF), the Internet Research Task Force (IRTF), the Internet Architecture Board (IAB)and independentsubmission streams. They are published by the RFC Editor which again is not a person (anymore) but an organization.The RFC Series had its 50thanniversary in 2019 which was also celebrated in an RFC: also gives a nice overview about how the whole system has evolved.

When you follow the link above you will notice that you can view / download the RFC in four possible formats: HTML, Text, PDF and XML - and not just HTML which would be the most obvious choice for this kind of document. As you might suspect, we would not have mentioned this if it would not involve PDF. But we are far from saying that PDF is the better format - which would be stupid anyway since we are using web technologies for publishing this article. But PDF has certain qualities that web technologies don’t have.

In the past 50 years RFCs were texts limited to ASCII character codes, originally rather informal documents published as Requests For Comments. We all know that ASCII text is not the most powerful format: It limits characters and disallows Umlauts and other diacritical characters – which makes it difficult to e.g. write a specification about how to encode Umlauts. Somehow it only allows graphics or even positioning of graphics, not to mention pagination … So, the organization that publishes the documents, the RFC Editor, was looking for a better solution (almost as long as the organization existed).

The natural decision is HTML and it is no surprise that it is one of the formats they are using now. But the organization acknowledged that this also has downsides, e.g. it can’t easily be downloaded, has no (working) pagination concept, vector graphics are not natively supported, it is difficult to deal with them when it comes to versions and updates and – although it has some structure – it can’t in this regard be compared with XML.

RFC Editor decided to come up with what I believe is a very smart way to publish technical specifications: First of all new documents can be downloaded in a variety of formats: HTML, TXT, XML and – PDF, each format with its specific advantages. Sidenote: Graphics are still not as nice as they could be in PDF, since they are understandably only created once for all formats (see

RFC 4 file formats

The feature that made them - from a PDF point of view – really stand out, is that they are not using simple PDF but PDF/A-3u. What does that mean?

PDF/Astands for reliability, conformance level Umakes sure that all text has Unicode representation which guarantees searchability and text extraction. And standard part 3allows for embedding arbitrary file formats. In various cases PDF/A-3 is combined with structured information and that is the case here as well: Each RFC comes with an embedded XML structure so that interested parties can easily extract structured content and put it into their own data repositories.

A very smart way to use PDF features for publishing technical specifications – in fact better than what 'we' do with publishing PDF standards in PDF only (and EPUB) since 'our' ISO procedures do not allow us to do otherwise.