The IT industry has embarked on a new topic: Robotic Process Automation (RPA). It covers the automation of processes by eliminating manual activities to a large extent, so software solutions can take over. With today's processor speed and increasingly sophisticated algorithms in software applications, IT can now carry out increasingly complex requirements faster and, for recurring tasks, more reliable than any human being.
The RPA model originated in industrial production. Robots carry out the work in the manufacturing industry, and humans are only there to control and monitor the processes. In order to function smoothly, the processes and also the materials to be processed must be standardized as much as possible, which usually is the case in manufacturing processes, but not necessarily in office processes … As RPA will most likely be pushed by the current crisis, we would like to have a look at RPA from a PDF point of view!
Basically, what applies to PDF automation applies to every automation: processes and processed materials have to be well-defined and standardized.
Luckily, RPA for office processes can build on experiences that have been made for PDF automation in production. How is that? There are vertical areas where PDFs are not only administrative documents, but also the production material itself. This applies e.g. to manufacturing or architectural industries, where PDF drawings carry the actual production data. But even more advanced are automated prepress workflows. PDF files from arbitrary sources are prepared in a fully automated process from which they go straight to the printing machine.
Depending on the diversity and quality of the files, this can place high demands on the PDF software. Print shops receive their raw material (PDF files) from numerous companies and agencies in varying degrees of quality. Manual corrections should avoided in RPA, so the first step in most RPA processes will be the standardization of input material. This normalization makes sure that the raw material works in all subsequent processing steps up to printing. Standards, such as PDF/X, are used to streamline the definition of normalization requirements: if the raw material is converted to PDF/X, subsequent steps can rely on that.
An example of steps and processes:
As I already said, normalization should be the first step – at least if the source of the data (PDF) is not always the same. That might even include the creation of PDF from other source formats:
- Document conversion, office conversion
If the source is an image, OCR (Optical Character Recognition) should be applied:
- Apply OCR for scanned PDFs
Since PDF is so flexible, you would then normalize to a PDF with a certain quality. This could be based on internally developed standards, but in most cases an official standard like PDF/A leave out the development of internal standards. If needed, requirements can be adjusted to internal needs.
Normalization to PDF/A-2u:
- Font embedding
- Make sure Unicode representation is present
And then, the actual processing can take place:
- Extract text
- Embed files
- Merge files
- Split PDF pages at certain keywords
- Normalize page sizes
- Add page numbers if missing
- …
One of the lessons learned in prepress is that automated processes cannot usually be created by programmers from scratch. Real processes are often better developed, adjusted and fine-tuned when real files in day-to-day work are used. That's why callas software has developed a graphical workflow editor in pdfaPilot that enables user agents to design processes visually in a drag-and-drop interface without having to involve programming. Once a process does what it should, it can be seamlessly exported from the workflow editor to the server responsible for fully automated processing.
Automated workflows can become even smarter with the help of intelligent RPA technologies. The prerequisite for this is that the incoming PDF files can be adapted to meet the internal requirements and standards. In order to bring together specialist knowledge and programming, tools are ideal by which RPA processes can be created in an agile and modular manner in individual steps without programming knowledge.