RISK of ECM Project Failure
Say “risk of ECM project failure” in a board room and you quickly discover that the most scary scenario is a performance related failure discovered after a go-live deployment to production, yet it happens all the time. That means that some part of the ECM application slows down under the production scale load to the point of it not being functional. The damage is inflicted on the budget because your team did not catch it before go-live. The impact may be a roll-back and redeploy which may take a month or more, depending on the project. Given that ECM deployments have a project teamof typically 10 to 20 resources paid between $65 to $125 /hr that can be over $200,000 in cost plus the impact to the business.
I shake my head and keep wondering why this happens. I see it on project after project. I get a phone call and am told that a project is off the rails and they need my help. They tell be that the front end, or the user interface of the project was tested to emulate the anticipated load, but when they deployed it to production the system slowed down and then started crashing every few days.
This can be prevented, but it isn’t easy. I would like to help prevent these things from going off the rails in the first place. Let me explain it with some technical depth and an example.
It is more difficult to load test the back end of the system or the content server and database side of an ECM environment whereas it is relatively easy to set up a Web-based UI tester likeLoadRunner to pound on the user interface. Testing the user interface is one thing but to test it with a representative repository behind it is quite another because the user interface is making the same requests to the back end servers but to databases with millions or billions of rows instead of thousands. Slow performance on the user interface side is sometimes caused by a component in the “back-end” such as a database indexing and tuning issue, complex security models, deep folder structures, poorly designed queries, services, workflows, transformation, etc. However you don’t catch those issues unless you have a test repository that represents the production scale repository in both configuration and volume.
For example, if you create a drop down control in a user interface that has a poorly designed query it may freeze the UI while it waits to be populated by the query result when it is run against a large database. If it makes it to production the end users don’t understand and don’t care. They just know it is freezing and crashing. If it affects a large or critical portion of the end users you have to roll it back, cancel the deployment and schedule a break fix period with testing and a re-deployment. Explain that to theCFO. I have been addressed by CIOs that have done so and they tend to be in bad moods when the subject comes up.
What should have been done in this scenario is to insist that there be a load test against a representative repository which replicates the conditions similar to to the one you will find in production. That is easy to say but, from what I have seen, it tends to drop out of scope or becomes reduced because of scope creep in another aspect of the project or even in the test repository building tasks themselves.
Now that we know that we need to build a representative Test repository, how do you build it? There are two scenarios and several strategies.
1) You already have an ECM instance and you are upgrading, integrating or combining multiple ones
2) This is a net new deployment of the ECM system.
Existing Production ECM Scenario
With the first scenario you need to build a representative copy of the existing production system, as it will be seen after deployment, in the test environment. You will need to occasionally update it on long term projects. It can be done by cloning, custom scripting or using some form of ETL or migration tool. I will examine the options below because you need to know the up and downside of each to make a decision that is right for your situation:
Cloning – (Warning – I am not recommending this approach) This surprisingly common approach to test environment building stems from the typically prescribed approach to moving a production repository from old servers to new ones. You export the database and copy the file store to a new server where there has been a new install of the software. The database is imported and the server names, etc changed. It seems simple, yes, and why not do with with a load testing repository? There are several risks involved with having two identical repositories running simultaneously on the same network. Every time I see this used for testing I cringe because something always screws up. If there is a way for these two systems to “see” each other or for other services or users to “see” the systems there is the risk of taking down production servers, end users or integrations using the wrong instance or administrators working on the wrong system. The other risk is that confidential information is exposed in a Test environment. Compliance to regulation or policy may prevent that. It may also take a lot of storage space to replicate a production system. On some projects corners have been cut just on the basis that sufficient storage was not allocated in time.
Scripting – (Warning – I am not recommending this approach either) Custom scripting is frequently used but in many cases for the wrong reasons. The expense, availability or flexibility of ETL tools for projects may dictate that custom scripting is required. This process is prone to scope creep, validation errors and turn over in resources, which causes the learning curve investment to walk out the door on various phases of the deployment. The time taken to write, test and validate the scripts could have been used to build a proper test repository. However services companies love it since it tends to push out the project and burn more hours. The compromise tends to mean a partially built testing environment.
ETL / Migration Tools – This is a more supportable model. However care must be taken in the selection of the tools and subsequent planning. Some of them are licensed by repository where there is a license required for each repository that the tool is connected to. This may cause them to be too expensive for use in a small project and subsequently cause the cloning or scripting methods to be used. Other factors would be the flexibility of the system to adapt to custom data sources and or perform transformations on the content. Some are flexible and others are not. The length of time to migrate the repository must also be taken into consideration so that the initial load or build time must be correctly estimated. A “sync” capability where the repository can be occasionally updated will save the time taken to rebuild it otherwise.
New Repository Scenario
In the second scenario of a net new repository the representative repository must be build from statistical analysis of what the new production system will look like. The number of documents, their data structures, the folder structures, the access control lists, the groups and users and other aspects of he configuration. The folder depth and number of objects per folder with versions, renditions, etc must be built. In this case there is not much one can do from the cloning perspective and few tools that can help you. At this point the process is a manual scripted one or a combination of ETL tool and scripting.
However we are coming to market with just such a tool soon. Please stay tuned for the announcement. ;)#Documentum #failure #performance #InformationGovernance #loadtesting #project #ECM