03 Jul 2019 By David van Driessche
PDF standards PDF/UA Archiving
In different parts of the world, rules around accessible documents and when you are required to supply them, exist under different names. As an example, in North America people will talk about section 508 or WCAG AA, in Europe, they might refer to the EU accessibility directive. But whatever the name, they usually do the same thing: they establish specific rules different documents must follow to be labeled accessible. And for PDF documents, those rules boil down to compliance with the ISO standard for accessible PDF: PDF/UA.
The “UA” in PDF/UA comes from “Universal Accessibility” and it is an ISO standard that defines the rules for PDF documents to be labeled accessible. Those rules have been selected in such a way that the document can easily be used by adaptive technology such as screen readers. The most important rules are:
Most ISO standards for PDF documents can easily be verified by software. The process of checking compliance with an ISO standard is typically referred to as preflight (from aviation where the pilots will check the plane before it takes off), and software exists to preflight against those ISO standards.
PDF/UA is a bit of a problem child in this department. Many of the rules in the standard can indeed be verified by software, but unfortunately not all. A human usually is required to validate that all rules are properly followed… Why? One small example should make this clear. Imagine a document containing English and Spanish text again. Software can preflight this document and tell me whether or not all text has been labeled with a language. That’s just checking the metadata for the text. But how can the software be certain that the right text is labeled with the right language? Humans are good at this, software… not so much.
And the PDF/UA standard contains quite a few of such cases, where the software has to be assisted by a human in order to fully validate compliance. That doesn’t mean software can’t make the process easier of course. Examples of software applications in this field are the PDF Accessibility Checker from the Swiss foundation "Zugang für alle", a free tool to check everything a software application can check for PDF/UA and is considered the first tool based entirely on the Matterhorn protocol, and callas pdfaPilot, a commercial tool to check everything a software application can check for PDF/UA.
Two notable points for pdfaPilot. First of all – while it’s a commercial tool – the PDF/UA verification part is always free. And secondly, pdfaPilot includes help with the human verification part as well, by converting the PDF document into an HTML representation. This HTML version then makes it much easier to check the structure of the document because it is shown in a nicely color-coded format. It also makes it very easy to see whether non-text elements are either labeled as not important or have a proper alternative description.