More and more, PDF/A (the ISO standard for long-term archival of PDF documents) is becoming the file format of choice to archive documents. pdfaPilot is an expert in creating archive-ready PDF/A documents; read on to get lots of background on what is supported and how to go about this.
Different versions & flavors
The first questions when thinking about creating PDF/A documents is usually which version and flavor of the standard should be used. The full list is:
- PDF/A-1a, PDF/A-1b
- PDF/A-2a, PDF/A-2b, PDF/A-2u
- PDF/A-3a, PDF/A-3b, PDF/A-3u
- PDF/A-4e, PDF/A-4f
PDF/A-1 was the original version of the standard; it doesn’t allow a whole list of modern PDF features such as transparency, forms of compression for images and layers. PDF/A-2 extended support for these features and also opened up the possibility to store other PDF/A files inside a PDF/A-2 file (allowing a PDF/A-2 to behave a little like a small archive if you want).
PDF/A-3 opens up the way to store any file inside a PDF/A-3 file. This allows email attachments to be stored in their native form inside the archive email for example, or electronic invoices in PDF/A-3 format that have the invoice data embedded in them as XML.
Based on PDF 2.0, PDF/A-4 documents may or may not contain tags. Unlike previous parts of the standard no dedicated conformance level is required for tagged PDF/A-4 documents, thus eliminating the previous A/B/U conformance levels. Similarly, PDF/A-4 documents may or may not contain file attachments.
The 'b' or 'basic' flavor is focused entirely on visual reproduction; the only thing that counts is to be able to see the document on screen or print it exactly as it was put in the archive.
The 'a' or 'advanced' flavor adds additional requirements. Text must be embedded in such a way that it can easily be extracted (the meaning must be clear, not just the visual appearance), images must have alternative text associated with them (again to make their meaning clear) and all elements in the PDF/A must be tagged (to be able to distinguish between different heading levels, body text, find paragraphs and tables ...).
The 'u' or 'Unicode' flavor sits somewhere in between 'b' and 'a'. It focuses mainly on visual reproduction, but does require that all text can easily be extracted or searched.
The 'f' flavor in PDF/A-4f allows non-PDF/A file attachments similar to how PDF/A-3 extends PDF/A-2.
The 'e' flavor is targeted at the engineering community that adds RichMedia annotations for 3D content in U3D or PRC format to the base PDF/A-4 format.
Which one should you use?
That very much depends which types of documents you are archiving and what features you want to enable in your archive. Keep in mind that it’s much easier to create a 'b' PDF/A file than an 'a' PDF/A file (because of all of the additional requirements).
What is good news is that pdfaPilot supports all versions and all flavors of the standard, so you have total freedom on that front.
Interactive verification or conversion of PDF/A
pdfaPilot Desktop provides different ways to verify compliance of a PDF with your PDF/A version of choice and different ways to convert (or attempt) to convert a PDF into your PDF/A version of choice. The easiest way however is the 'PDF/A in one click' window.
Figure 1: The PDF/A in one click window in pdfaPilot Desktop
Using the action button (the button with the gear icon in the top right corner) you can select which PDF/A version and flavor you want to work with. The window then offers two buttons, once to simply check an opened PDF file against the PDF/A standard you selected, the other to convert it to that standard.
Using the pdfaPilot preferences, you can select whether you want to enable fallback methods if regular conversion to PDF/A fails.
Fallback methods for conversion
Fallback methods are used when pdfaPilot has attempted conversion to PDF/A and that conversion failed. As a fallback you can:
- Convert the complete document to PostScript and then back to PDF. This forces a number of newer PDF features to be converted to older features and it rewrites the PDF file from scratch, which sometimes fixes problems.
- Convert pages with problems to images. If specific pages have problems, pdfaPilot will convert just those pages to images and insert them into the PDF instead of the original page.
- Convert all pages into images. If nothing else works pdfaPilot can create a completely new PDF document and insert an image version of all pages of the original PDF into this new PDF document in a last ditch effort to fix the problems that exist in the original PDF document.
Of course these fallback methods take time and often imply degradation in quality for the document that will be put into the archive. It is up to you to decide whether you want to use these fallback methods. If clients submit files to you and you absolutely have to put something in the archive, there sometimes is no other choice than work with fallback conversion for troublesome PDF documents.
Automating the process
In most cases the volume of files to be put in an archive is huge; pdfaPilot Server can easily handle such volumes and process PDF documents in an automatic, unattended way.
Figure 2: A pdfaPilot job to automatically convert to PDF/A-1b
pdfaPilot Server lets you create jobs, where each job has a watched folder, an associated profile and a number of output folders defined. All files that arrive in the watched folder are automatically picked up and processed with the selected profile. The result is dropped in the success or error folder depending on the processing result.
Often the most efficient way to check PDF files for compliance with your PDF/A standard or to convert them to it, is integrated in a larger solution such as a web portal or document management system. callas provides both a command-line solution and a real SDK to support these scenarios.
pdfaPilot CLI is a command-line application; it can be started from a terminal or command window, but is typically launched from the application or script it are integrated with. This version of the application supports all of the PDF/A capabilities and can automatically generate detailed reports (in PDF or XML) for further automation.
Using the SDK
pdfaPilot SDK is a solution providing integration on a library level. The SDK contains the necessary libraries, headers, documentation and samples to integrate PDF/A support closely in a C, C++, Java or .Net application. This requires development resources but allows for the closest possible integration in the end.
You can read much more about all of the products mentioned in the product pages on the website. Or simply contact us for a personalized demo or to ask more in-depth questions.