Socially distancing penguins

Sifter_blog_penguins

Sometimes funny concepts are the best way to explain what otherwise could be very dry (but important!) topics. And as everybody understands the concept of social distancing after the last year, it is ideal to talk about proximity testing. Wait! What?

Proximity testing… Sometimes you want to make sure that certain objects are not too close together. Perhaps because they have to be cut apart after printing and there needs to be enough spacing to do that without damaging the printed content. Or, to stick to our funny concept, to make sure all penguins are at least 1.5 meter or 6 feet apart at all times.

Sifter

In pdfToolbox v11, callas introduced a new preflight technology called “Context aware object detection” or Sifter for close friends. It’s special because it doesn’t just examine individual objects (“Is the resolution of this image high enough”, “Is this black text set to overprint”) but it checks relationships between objects.

Because of that, it can answer questions such as: “Is this object hidden by any other objects?” or, “Does this text come too close to the cut line?”. Very powerful and ideal to weed out the majority of false positives (things reported as errors while they’re actually fine) preflight engines tend to find.

There is one question the Sifter engine could not answer and our penguins are a perfect illustration; the Sifter engine could not tell you whether a penguin came to close to another penguin. Why not? Because it would take too long!


The math behind socially distancing penguins: If you don’t like math, just skip this part, you don’t need it to understand the rest of this article. If math interests you, this is why detecting socially intrusive penguins is difficult…
In our example, we have 6 penguins. If we want to know whether one of them comes too close, we need to look at all of them.
For the first penguin, we need to measure the distance with 5 others. For the second penguin, we need to measure 4 times (we’ve already measured the distance between 1 and 2). 3 times for the third penguin, and so on. You can calculate how many measurements you need for any arbitrary number of penguins. For “n” penguins, the formula is: n * (n – 1) / 2.
This isn’t too bad for small numbers: ten penguins would be 10*9/2 is 45. 20 penguins leads to 20*19/2, or 190 penguins. The problem starts if you don’t know exactly how many objects you may have in your collection. It isn’t impossible for a page to have 500 characters for example and it’s not unheard of to have a couple of thousand vector elements on a page. And yes, computers are fast, but if you have 2000 vector elements you want to test, you will have to test just under 2 million times already. And even using a fast computer, that will take a while.

pdfToolbox 12

In pdfToolbox 12, callas does introduce a proximity test, but there is a precaution built-in…

The condition allows you to incorporate a limit to the number of objects you’re checking. If you have a PDF file that contains 2000 penguins, Sifter will either:

Check the first 100 (the limit set in this example) and then report what it found in those first 100 penguins,
Or decide that its limit has been reached and not test anything at all.

The choice is yours what strategy you want to follow. These seems like harsh actions, but it allows you to use this very useful test without having to be afraid you’re going to get some rogue PDF file that’s going to trick pdfToolbox into doing more measurements than there are stars in the known universe…

And very often you know that all PDF files you’re going to use this check on, should contain a relatively small number of objects you want to test. The distance between cut lines for example is a very useful case, and one where you’re not going to have thousands of objects to check.

Conclusion

Making sure penguins are sufficiently far apart is time-consuming if your collection of penguins grows too big. But in pdfToolbox 12 you can still do it with a Sifter condition, and you have a way to refrain from going overboard on the number of calculations.

Even shorter conclusion: don’t let your collection of penguins get out of control!