Today I’m going to talk about an overlooked part of the Content Management picture, data. I know that this seems a little strange but bear with me.
Let’s assume for a moment that you are in a regulated industry or that you are generally paranoid. (Most regulated industries are rife with paranoia) As a result, you want to track every action taken against your content so you can report on it if required. If you are particularly paranoid, you may need to track every search in order to identify irregular search patterns.
When your CMS was acquired, auditing was likely listed as a requirement. If you were thorough, you even grilled the vendor on how searches could be tracked (and they said with a quick customization). You probably asked about their reporting tools and they raved about flexibility.
There are some things that you might not have asked:
What is the impact to performance when you have large amounts of audit records?
Is the information in the audit logs meaningful by themselves?
Can I query the audit information without impacting the system?
Is it possible to build triggers that alert me to certain activity patterns?
In many systems, to get meaning from the information, you need not just the audit logs, but the list of system users and relevant information on the content. This is because many logs just store IDs for content and users which is typically meaningless without the related tables.
Besides, what if you want to query based upon a metadata attribute? What if you need to query against ‘update date’, ‘size limit’, ‘content creator’, or any number of values? Take this question that could be used during a fraud inquiry:
Give me all the car insurance claims filed in October of 2009 that were processed by Richard Prouty and approved within one business day of submission. Provide a list of all actions against the claims listed grouped by claim and user.
That is a “simple” query that can take many systems down if they are not properly architected.
What about those people already dealing with this problem? There are several things that you can do to help address the problem.
Performance tune your database: All CMS systems are generically tuned. You can usually gain performance by analyzing how you are using the system and add necessary indexes.
Process reports nightly: If you have a set of standard reports and/or datasets that you like to look at regularly, extract the data nightly. You can craft a database job that will run your report and place the results into a separate table to review. This prevents reporting functions from impacting daily work.
Externalize your data: Taking the previous option a step forward, you can extract the data you need to report against from your CMS and store it in a separate system. From there you can work to your heart’s content without impacting users.
As you can start to tell, this is a surmountable problem. The sooner you start to plan and take the various issues into account, the easier it will be to solve.#ECM #data #auditing #BusinessIntelligence #InformationGovernance #CMS