PDF Parallelism - why and how to modify bookmark structures?

Image July Blog 700x400

PDF is more than digital paper. It has many features that are not always used. These features are based on optional entries in the internal PDF format. Two of these are designed to provide direct access to pages in a PDF file based on additional information related to these pages: Bookmarks (in the internal PDF format called "outlines") and document part metadata (DPart).

Bookmarks are usually used interactively and associate one or more (bookmark) names with pages.

Document part metadata (DPart) is more powerful and is typically used in automated processes. It is not limited to a single name, but uses metadata that consists of arbitrary Key-Value pairs. In addition it is possible to associate it with page ranges instead of just a single page.

Bookmarks and DPart are both hierarchical, but this means something completely different in each case. Bookmarks only refer to a single page and their hierarchy simply determines the way the bookmarks are presented to the user, i.e. on which level an item is displayed. In the document parts, hierarchy means something different: You may associate some DPart metadata with a range of pages and other metadata with a sub range of these pages.

In the internal PDF format, both features use a node structure that is parallel to the page structure. That enables software to access pages via these "alternative" structures without having to analyze all pages in their "regular" order. E.g. the bookmark structure can be displayed in a PDF viewer even if it has has not read all the pages, and the page it refers to can be accessed immediately when the user clicks on the bookmark.

For document parts that is even more useful since they have been designed for PDFs with several thousand pages, e.g. individualized documents such as postcards where each postcard has a different recipient. It is useful to encode the ZIP code in the document part metadata, which makes it easy for a processor program to select all postcards for a specified ZIP code, allowing you to print them in the right order for optimized mailing.

While these parallel structures are very helpful when present, they create issues when the page order is modified or if pages are added or deleted. You would then have to update these parallel structures too and that is not always straightforward. If you merge PDF files, where each of it has bookmarks, it is not even fully clear what should happen: Do you want to create new top level bookmarks that e.g. use the names of the original files, or would you rather keep the hierarchies as they are? In fact, many programs do not update these parallel structures when the page structure is modified.

Is it possible for another application to "repair" these structures later? Of course, you would have to understand what happened and in which way the structure is invalid. This might be the case because the page structure has been modified programmatically in automated or integrated workflows. In such cases, it would be helpful to have access to the parallel structures inside of the PDF file in order to adjust them to the "new" page structure.

To make this possible, we have updated our free Acrobat plug-in callas pdfDPartner 2 to not only export DPart structures to JSON, but also to import a (modified) structure back into the PDF file. The same is possible with pdfToolbox 14.3, which also supports importing and exporting "JSONized" bookmarks.

Our documentation provides examples for some use cases:

1. Derive bookmarks from headings

You may use the PDF Preflight engine to identify headings (e.g. by means of text size, font type and color). You can then use the identified text to create a proper JSONized bookmark structure and apply that structure to the PDF.

2. Adjust bookmarks for reader spreads

A PDF was turned into reader spreads (pairs of two pages are combined to a new page), but the bookmark structure was left alone. You may extract the existing, dysfunctional structure, adjust page destinations (divide by 2) and apply the modified structure.

3. Adjust DPart structure

A sample PDF with DPart was modified to only contain parts. Again, a dysfunctional structure - in this case for DPart - is updated and applied.


Back to overview
 

Subscribe to our blog newsletter for access to regular updates

No strings attached. Unsubscribe anytime. For further details, review our Privacy Policy.