The case for SharePoint Archiving: it’s never too soon to deal with old information

By Dave Martin posted 11-01-2010 11:52


Do you have a lot of old content living in SharePoint? 

You might be asking, “Really, what is old content?“

Okay, fair enough, do you have a lot of “inactive” content living in SharePoint?  Content that is sitting idle, sadly orphaned to some site that has not been touched or viewed in months.  Still thinking no?  Okay let me outline a scenario:

You’re an organization that works collaboratively on numerous, consecutive or simultaneous projects.  As the SharePoint administrator you get regular requests for new sites which you grant as long as they coincide with a new project.  You get a request to provide a site called Project A which you grant.  Those involved in Project A spend the next three months filling this site with new and then versioned content.  Then Project A comes to an end and you get another request to grant the creation of a new site called Project B.  Much new content is created and versioned in Project B’s site, but as you would imagine, copies of content from Project A are also brought over.  And the cycle continues through the alphabet. 

Even if you are an organization that only started leveraging SharePoint with MOSS 2007, it’s been two or three years that you’ve been creating sites and filling them with content.  So whatever happens to those Project A sites?  Well, they typically become orphaned.  And I can’t tell you how many organizations I’ve talked to who fit squarely into the Project A mold, who have literally thousands of sites and terabytes of content just taking up space in SharePoint… very costly, performance absorbing, risk increasing space.

Statistically speaking somewhere between 25% and 30% of SharePoint sites are inactive, so if you have 40 Terabytes in your deployment 10 Terabytes are (again) just taking up space.  So what do we do with this old content?

Currently there are a couple of models we can follow: we can simply externalize content (redirect content to a lower tier, lower cost storage device) or we can actually archive the content.   Some folks may say, “what’s the difference here?” and my answer is:  a lot.  For the most part externalizing to a storage device may be enough – simply moving the content to a big old data box outside of the high performance requirements of the SQL Servers that support SharePoint.  And most of the time externalization is really just that basic, moving content to a bigger, more cost effective place with no added intelligence.  But let’s say you have regulatory or internal governance mandates where there are retention requirements and content policies that must be met. Tie to this the fact that you also have email and file system content requirements that you’d like to centrally manage under similar policies?

Behold, archiving! 

In a classic archiving model, we search across repositories of content (in this case SharePoint farms) for the old content we want archived based on specific rules and policies we’ve defined.  Then we make a copy of that content and place it into the archive and validate its existence.  Once validation has occurred we can generally do one of two things: delete the content from its original repository, or leave the original copy in-place.  Some people worry about deleting any content at any time, but by not deleting that original copy we really are missing the opportunity to take advantage of the storage management benefits of archival because now we have two copies and need twice the storage space whereas with archiving, you archive only a single instance of the file.  Another reason for original copy deletion is records management.  For records management to be a success you really need a single copy of record – trust me here, I've been browbeaten on this topic by a lot of records managers.

On the topic of records management, once the content is in an archive you can of course attach a life-cycle to it (retention and disposition polices) which results in one of the biggest benefits of archival – risk management.  Being able to attend to compliance and internal governance requirements allows us to do even more house cleaning on an ongoing basis.  The pure externalization example really only moves the content to a device, where technically it will just sit (granted at a lower cost) and take up space.  But with archiving, we can set disposition policies to have content deleted automatically, once it has met any and all regulatory obligations, of course.  This helps us reduce the risk associated with information, because the last thing we want to see is a big scary lawsuit come up and cost us an arm and a leg because we kept the wrong information, or worse, didn’t keep the right stuff!

Lastly, we have to consider the fact that although SharePoint is and will continue to be widely used, we likely have more than just SharePoint content in our broader information infrastructure.  Email for one is still a bigger deal in terms of corporate content growth, and we all know of the volumes of data sitting in file systems.  I think it would be fair to say we don’t want to deploy three separate archival solutions; it simply makes more sense - and again reduces our risk - if we archive all of our content to one centrally managed location where we can take advantage of consolidated and centrally managed policies.

But the truth is, we’re really not quite there yet in terms of archiving SharePoint content.  We haven’t hit that critical mass or had the EVENT that wakes everyone up – like Zubulake did for email archiving and eDiscovery solutions – but we will, we always do. So before we find ourselves looking at billion dollar fines maybe we should think about the merits of archiving SharePoint content today. 

Again, I’ll end on a question to anyone who reads this post:  have you, or any of your customers been thinking or talking about SharePoint archival?

#SharePoint #archival #InformationGovernance #SharePointstoragemanagement #sharePointexternalization #sharepoint #SharePointarchiving