The Battle of Little Big Data

By Marc Solomon posted 05-06-2013 09:15


If you're someone who's paid to help your organization work smarter, odds are you get invited to lots of trade shows and conferences. Here legions of knowledge professionals are encouraged to spend down their training and travel budgets for the exhibit booths and break-out sessions of big thought on little gadget boot camps. This year's model has evolved from a variation on the social enterprise 2.0. The enterprise content crowd continues to buzz over that swelling monster of saturated disk space called Big Data.

None dare call it by its original name before it got so ... big. Probably for concern it could appear less new and therefore more persistent, nagging, even chronic. If we break big data all the way down it's essentially a subatomic galaxy of...

1) Integers: semi-structured inside databases that bracket the movement and transaction of merchandise and commodities. BD's 'semi' status is visited on us by the forensics in our browser logs and SIM cards -- or everything we process prior to consummating the purchase.

2) Text: scantily-structured disorganization of network shares, resulting in bedlam, pestilence, disease-ridden tribes of orphaned and disenfranchised files. Mayhem of biblical proportions.

House of Text

Witness the decaying lump of unmarked assets no one wants to touch. Why the cold shoulder? Well for one, there's the reasonable fear these assets will compromise the same records that outlive institutional memory. That's the knowledge of whether they're worth preserving in the first place. Incomplete migrations of wikis and decommissioned applications are also justifiable concerns. The seduction of a single repository to house all reference and process-based materials.

The conundrums and exaggerated theatrics are here to drive three points:

* Everyone has digital landfill growing like weeds on steroids in the alley ways of their data centers.

* No one knows what's in them, at least no one whose years of service fall short of the last modified dates.

* No one has the time to get their head around these legacy assets or the stomach to send them to the trash bin.

Tipping Points

For some deviant reason some of us work smarter folks are drawn to the disaster that awaits us behind the text box.  That's our collective little big data or LBD problem. It's little because there's not enough structure or upside to resonate in keynotes. But can this mundane and widespread disaster justify priority status in your enterprise?

While Big D hogs the limelight, LBD darkens the door of every former data warehouse this side of the expert panel round tables -- remember them? To simplify the formula as an opportunity cost, think of your data dumps as your information surplus. A resource that could feed an ever-growing appetite for informed histories and a potential gap-closer around our recurring knowledge deficits.

Rescue Mission

Just because those wordy detailed documents are devoid of metadata doesn't mean you need to crack the form completion whip. While it helps to automate the tagging process, you can also retrofit your unsexiest file formats and mislabeled folders. That's right. You can bring shape and purpose to the most obscure corners of your enterprise through a hybrid of manual and automated efforts. Here are three we plan to cycle through in our upcoming proof-of-concept:

1) Infrastructure – Think time stamps, maintenance schedules, permissions structures, and other machine-generated minutae. These are the basic storage transactions we need to track before better governance policies are in place, at-risk resources can be moved, and unstable boxes can be powered down for good.

2) Surface Details – These are the trace elements of your garden variety file share: The original authors, sponsoring units, and folder hierarchies. These values get to the heart of sense-making, namely who was inspired to create this artifact and who consumed it (both internally and externally). This step is also an opportunity to import the list-making properties of your organization, from products and locations to functional roles, events, and milestones.

3) Algorithms -- Vendor developed enhancements like entity extraction, auto categorization, and query suggest will augment efforts to develop ad hoc vocabularies that bring context and value to the overlooked assets we're trying to index and ultimately leverage.

Perhaps the ultimate benefit to bringing LBD under control is that our users will opt-in to more productive information practices. The best way to unify an enterprise is not to send your dump to the dumpster or overburden your content producers. It's to liberate your users from the tyranny of knowing where the knowledge lives. That alone is a powerful reminder that working smarter folks must free up creativity -- not disk space. It's that nod towards real and practice innovation that prepares us for using information, not being used by it.

#Index #repository #crawl #ui #usability #attrition #metadata #intellectualcapital #Taxonomy #storage