New Feature: PDF Scrubbrush

While PDF is intended to provide an entirely portable mechanism for exchange of non-trvial documents. However, in practice documents created by the real-world variety of clients almost inevitably contain deviations from the PDF standard which create issues when the document is processed by other applications and platforms. To ensure maximum compatibility the scrub brush feature re-compiles documents on the server using the using the Poppler libraries.

[workflow tmp]# pdffonts prescrubbed-document.pdf 
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
CCQPLO+Arial                         CID TrueType      Identity-H       yes yes yes     18  0
ArialMT                              TrueType          WinAnsi          no  no  no      26  0
Arial-BoldMT                         TrueType          WinAnsi          no  no  no      28  0
QVDGUR+MinionPro-Regular             CID Type 0C       Identity-H       yes yes yes     30  0
FZPPKI+MinionPro-Regular             CID Type 0C       Identity-H       yes yes yes     38  0
Helvetica-Bold                       Type 1            Custom           no  no  no      52  0
Helvetica                            Type 1            Custom           no  no  no      58  0
ZapfDingbats                         Type 1            ZapfDingbats     no  no  no     188  0

Text 1: The pdffonts report of a document which references non-standard fonts but does not contain the fonts. This document is unlikely to render correctly by clients on platforms other than that which created it.

The most common defect is that the PDF references non-standard fonts which are also no embedded into the PDF document – such fonts will either not display when viewed on other clients or may be replaced, often unsuccessfully, based on the viewers font substitution tables. PDF documents can be examined using the pdffonts tool provided by the Poppler project.

[workflow tmp]# pdffonts scrubbed-document.pdf 
name                                 type              encoding         emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
KOFYDJ+LiberationSans                TrueType          WinAnsi          yes yes yes      5  0
RXEXJO+LiberationSans-Bold           TrueType          WinAnsi          yes yes yes      6  0
IOKNUX+Arial                         TrueType          WinAnsi          yes yes yes      7  0
AADISD+MinionPro-Regular             CID Type 0C       Identity-H       yes yes yes     10  0
AGFLBT+MinionPro-Regular             CID Type 0C       Identity-H       yes yes yes     11  0
THKLNC+NimbusSanL-Bold               Type 1            WinAnsi          yes yes yes     12  0
CIXCHU+NimbusSanL-Regu               Type 1            WinAnsi          yes yes yes     13  0
QAOVNF+Dingbats                      Type 1            Builtin          yes yes yes     14  0

Text 2: The same document as previously after being processed by the scrubbrush; fonts have been substituted based on Poppler's font substitution tables, and those fonts are now all embedded in the document. This document should render consistently regardless of client application or platform.

The scrub brush feature is available in the following workflow actions:

  • searchDocumentsToZIPFileAction
  • folderToZipFileAction

In the future the scrub brush feature will be made available in the messageToINBOXAction and documentToMessageAction workflow actions.

Author: Adam Tauno Williams