Blogs

Should We Delete Dark Data?

By Lisa Ricciuti posted 10-21-2014 17:07

  

A few months ago I attended an event sponsored by HP.  The attendees were predominantly from IT, and not IM, so I was never quite sure what the presenters meant when they said “archiving.”  I strongly suspect it was from an IT-perspective, referring to cheaper disk space, rather than a select group of records designated for long-term preservation because of their enduring value.  The event was running behind schedule so I never did get a chance to ask. 

The first speaker talked about the usual stuff: data storage and archiving.  What I found particularly interesting about this speaker’s presentation was how he intermingled two diametrically opposed points.  The event was some time ago, so I’ve done my best to recreate the main ideas from my notes.  I remember silently cheering at the first point listed below thinking, “yes, IT and RIM can work together as a team with mutual benefits.”

Point #1: Applying retention to even a small portion of records can have a significant impact in the long run for a variety or reasons (storage costs, restoration, migration, retrieval, etc.).  

Point #2: Saving all your data is a good idea because in the future you may find a way to use it.  To illustrate the benefits he gave an example of a company that collected soil stats for years.  The stats were retained for 3 years before being destroyed.  They weren’t used for much mostly because nobody could figure out how to use the data.  And then one day somebody figured out how to use it.  Subsequently, everybody thought it was a great shame only three years’ worth of data existed.

Even without knowing anything else about this company, it’s pretty easy to guess what happened after the soil stats became analyzable.  Any hope of maintaining or implementing a retention policy was likely thrown out the window.  Old data was proven useful so who knows what other treasures could be discovered in the future.  Save everything becomes the new trend.    

I recently read two articles related to finding or creating value from data.  One was in the New York Times titled “For Big Data Scientists ‘Janitor Work’ Is Key Hurdle to Insights.”  The article basically describes the hours of tedious effort the scientists exert just to make data useable.  Although the title references big data instead of dark data, in many contexts big data is also dark data because it can’t be easily identified, deciphered, and used. 

Another article titled “American Scientists Unearth Lost Polar Satellite Images Worth Billions” discusses the efforts made by one scientist to catalog, index, and describe over 200,000 satellite images from the 1960’s.  According to the article, the scientist who requested the images received them in “25 boxes full of tins containing several thousand 60-metre rolls of photos, and quickly-deteriorating magnetic film with infrared imagery – unopened, and labeled with useless information on orbit numbers rather than locations.”  One can only imagine the time and effort spent to organize this vast quantity of data into something useful and searchable. 

I think this is a particularly interesting dilemma that we face as information professionals when coming up with methods to manage huge volumes of information.  Since many companies are now realizing successes due to data analytics, it makes it more challenging to have a compelling argument about why enforcing retention & disposition is a good idea.

If we can’t convince companies to throw out their data, just in case it’s magically useful later, maybe we can leverage the successes to make a stronger argument for creating structured data.  That way when companies realize the value of retention and disposition, it will be way easier to identify what needs to be purged.  And in the meantime, the cost for producing and storing data will be justified by getting some use out of it.  

0 comments
103 views

Permalink