I recently enjoyed the extensive discussion on this blog created when Serge Huber, CTO for Jahia Solutions, posted the blog entitled “After Flash, why PDF must die.” The posting and comments contained many informative opinions, as well as, a lot of actual data on technical issues, information management standards, etc. It was amazing to see the diversity of concerns and issues with the PDF “standard” as a variously viewable and archival information format.
Then, I decided to make a PDF document of an Internet displayed page, and the Adobe PDF printer driver failed – again. Instead of a nice PDF copy of a bill payment receipt, I got some “gibberish” looking characters that made the PDF document largely unreadable. This had occurred last year, before a hard disk crash, and most technical advice found on the Adobe and Microsoft sites was not very helpful. Some blog advice had been to reinstall the Adobe Acrobat software. Then the disk drive crashed, and on the new drive I installed a newer version of Adobe Acrobat and the problem went away. Or so I thought.
Then, yesterday, it came back. So I researched it again. This time, the answer was to be found at “Adobe Acrobat 10 Displays/Prints Gibberish” - http://helpspot.business.uconn.edu/index.php?pg=kb.page&id=344. The problem has to do with the option of the Adobe software to “Rely on system fonts only, do not use document fonts” when the PDF file is generated. Checking or not checking this box changes the PDF creation approach. The example of character gibberish provided was exactly what my documents had been looking like. In some cases this document creation failure was not apparent in the first few lines of the documents generated. It would only be apparent if one looked at the entire document.
This could cause real problems with automatically rendered PDFs to be used as official records unless human eyeballs caught the anomaly. Possibly, an automated OCR check of the PDF might catch the problem, but this would need to be executed on all rendered PDFs to identify specific document creation failures, thus taking up more CPU cycles during the document conversion process. Obviously, finding these PDF garbage files that must be deleted retrospectively after many had been stored in an ECM system would be costly for a system owner/operator. Imagine the reaction of Corporate Legal counsel if this is all they could produce during discovery proceedings to attest to the innocence of their client. And, if the fonts are altered from the original document, do you really have an archival quality rendition?
So, despite our reliance on automated solutions to ECM and ERM to get our daily work accomplished, it is still important to not completely turn our futures over to computer based robots or automated systems. All of the technical wizardry in the world during systems design will not assure we have records of evidentiary value and archival quality if the systems do not perform exactly as expected. Human review of the performance of software must be factored into every automated system or our futures may depend on archival records of questionable quality due to the unexpected generation of gibberish.
#ERM #ScanningandCapture #PDFs #EnterpriseContentManagement #ArchivalRecords #DocumentEvidence #ElectronicRecordsManagement