EDDP: Page Bursting

One form of EDDP (Event Driven Document Processing) is "Page Bursting". Page bursting is only applicable to PDF documents [at least as of the current version]. The purpose of page bursting a document is to decompose a multi-page document into single page documents allowing each page to be easily reviewed individually. One use case is to scan a large collection of invoices, workorders, receipts, on a document center using a document feeder. The scanned document can be posted to OpenGroupware Coils via the SMTP service, REST (AttachFS), or WebDAV - whichever the document center supports. If this EDDP feature is configured on the target folder the document will be automatically decomposed into single-page documents.

When a PDF document is created in a folder which either has or inherits a burstingTarget object property in the document management name-space the coils.blob.event component will attempt to burst the document into single page documents in the indicated folder. Documents are only page-bursted upon creation. To achieve page-bursting upon revision use the EDDP auto-file feature to file a copy of the document for every revision into a folder which is configured to perform page-bursting; every auto-filed copy of the document will be a new document and thus eligible for bursting.

The value of the burstingTarget object property must be an ogo:// scheme URI specifying a project and a document path. ogo:// URIs for this usage take the form of ogo://ProjectNumber/path...?parameters.... If the project resolves but the a corresponding document path does exist the path will be created.

Bursting occurs within the administrative context so the creation of documents in the burst target is controlled only by the permissions to create PDF documents in the origin folder. Only documents of MIME types application/pdf or application/x-pdf can be page-bursted. Unless the allcopyenabled parameter is appropriately set in the burst target URI unburstable documents will be ignored. If allcopyenabled has a value of "YES" then unburstable document types will be copied in their entirety to the specified target - object linking and object property inheritance will still be performed as if the new document had been created by bursting.

The handling of documents which are of a burstable type, but for which bursting fails, is controlled by the copyonfailure URI parameter. By default documents which fail to burst are not copied; setting the copyonfailure parameter to a value of "YES" will cause documents which fail to burst to be copied to the target in their entirety. When copying documents which failed to burst the target folder can be overridden by setting the errorfolderid parameter to the object id of an alternate folder. The errorfolderid parameter is only used when the copyonfailure parameter is "YES"; this alternate target is not relevant for non-burstable document types or documents which burst successfully. If a failed document is copied to either the target or an alternate folder object linking and object property inheritance will still be performed as if the new document where created normally – except there will be no reference to the source page.

A notice concerning documents of burstable types for which page bursting failed can be generated to an e-mail address by setting the onfailnotify parameter to a valid e-mail address. The alert will be generated using the "/Templates/BurstingFailure.mako" Mako template from Project 7,000. Sites may modify this template to match their own requirements. If no corresponding template has been created or the template is invalid an administrative notice will be generated citing the exception which occurred [the intended address will not be notified].

Template variables provided to the "BurstingFailure.mako" template.

Parameter Description
document A reference to the source document.
folder A reference to the target folder specified in the target URI.
mimetype The MIME type string of the origin document.
copy_on_failure The value of the copyonfailure parameter from the target URI.
traceback The trace-back of the exception relating to the failure of the document to be page-bursted.
to_address The value of the onfailnotify parameter from the target URI. The value will be NULL if no such parameter was specified.
copy_enabled The value of the allcopyenabled parameter from the target URI.
error_folder The value of the errorfolderid parameter from the target URI. The value will be NULL if no such parameter was specified.

During the attempt to burst a document a timer is enforced for processing each page [this avoids a hung component when dealing with documents that contained damaged or recursive references]. If processing a page of a document exceeds the allowed time the burst operation will be considered a failure and the bursting operation will be rolled back. Only once all pages have been generated as separate documents will the bursting operation be considered successful. Upon success or failure of bursting the document an object property of bursted will be set on the source document – a value of "FAIL" indicates bursting of the document failed while a value of "OK" indicates the document was successfully page-bursted. All documents created by bursting will also have a bursted object property with a value of "TARGET"; this prevents bursting loops in a use-case where a document is page-bursted into a target folder which itself either has or inherits a burstingTarget object property. The page bursting feature will ignore all documents which have a bursted object property of any value.

Documents of a page-burstable type which fail to burst will also be marked using the "{57c7fc84-3cea-417d-af54-b659eb87a046}damaged" object property with a value of "YES". One effect of this property is to prevent the document from being eligible for EDDP auto-print; a damaged document is not expected to print correctly and depending upon print server configuration may stop the print queue.

Documents created by bursting will have the same file-name, in alpha lower-case, as their origin document but with a "-page-timestamp-random" suffix inserted between the file-name and the file's extension. For example: from a file named "stanley.agenda.pdf" having three pages documents named:

stanley.agenda-00001-373051543869-92601.pdf
stanley.agenda-00003.373051543869-92601.pdf 
stanley.agenda-00003.373051543869-92601.pdf

will be created in the target path. The purpose of the time-stamp + random is to guarantee file-name uniqueness of the created documents [avoiding overwriting an existing document]. The page counter is limited to five digits - creating an effective maximum document length of 9,999 pages, page bursting of documents with more then 9,999 pages is not supported.Each document created by bursting will have three object properties in the document management name-space: the bursted property described previously, sourceDocumentId with a value of the object id of the source document and sourceDocumentPage with a value of the page number the document represents from the source document. These properties are added to the collection of any object properties existed on the source document. In order to preserve all meta-data throughout the life-cycle of the document all documents created by page bursting initially inherit copies of all the object properties and object links which exist relating to the source document. The properties applied by the bursting operation are then applied to the set properties copied from the source document [potentially over-writing an inherited object property].

In addition to the object properties the following actions are performed on documents created by page bursting – regardless of if the generated documents are pages from page-burstable document types or copies of non-page-burstable types due the to allcopyenabled feature.

  • An object-link having a type of coils:burstedFrom will be created where the source of the link is the single-page document and the target of the link is the original [potentially multi-page] document. This link is intended to facilitate easily generated mapping of a document's provenance and life-cycle.
  • Audit messages of type comment are made on the new documents stating their origin. This makes the provenance of the documents readily visible to end-users. If the document was successfully page-bursted [from a page-burstable document type] the audit comment indicates the page number to which the document corresponds; audit comments on non-page-burstable documents copied due to the allcopyenabled feature or documents which failed to burst which were copied due to the copyonfailure feature will not reference a page number, only the identity of the original document.

The document uploaded and page-bursted will remain in the folder to which it was uploaded. Since bursting occurs due to an event on the entity exchange it is not transactional - failure of the document to burst will not result in failure of the upload of the document. There is also a potential opportunity that a document uploaded to a folder with a bursting target may be deleted before page bursting can occur – this will not result in an error, non-existent documents [documents which have been deleted] are ignored by the coils.blob.event service. If it is necessary to avoid the case where a document must be page-bursted in all circumstances delete and pare permissions should be removed from the folder regarding the security context performing the upload. Expiration of documents from the source folders, if they are not required, is best managed using document retention policies.