Blogs

Big data and content management: why size shouldn’t matter

By Steve Weissman posted 06-06-2012 08:19

  

 

Is it me, or is the concept of big data gaining prominence in conversations regarding content management? Certainly I hear about it more since I launched my CIP classroom prep course, which covers data management because (A) data occupies a huge portion of the content landscape, and (B) it on the CIP exam! But I'm curious to know if you are finding it more visible than before as well … and if you are, just how much time, and to what result, you and the people you're talking to are defining what is "big" what is not.

As reported in an article last month in Network World, Amazon.com Principal Engineer John Rauser told an audience at a recent Big Data and High Performance Computing Summit that big data is “any amount of data that's too big to be handled by one computer.” To me, that's a good colloquial answer, but it leaves too much to interpretation to be a formal one. For instance, whose computer are we talking about: mine? IBM's? The US Department of Commerce’s?

Part of the problem is that “bigness” comes in many flavors and needs to be grappled with him on many levels. Among the most common examples are: total amount of raw data to be managed, the number of servers and/or applications involved in storing and processing that data, the geography over which those servers and applications reign, and the number of users who are to access those applications either at once or in general (there being licensing ramifications associated with the latter). As before, these are all good examples of definitions that meet the criterion of “we know what we mean,” but their very variety speaks to the same lack of specificity as Rauser’s response.

Here's what everybody's friend Wikipedia has to say about it: big data is a loosely-defined term used to describe data sets so large and complex that they become awkward to work with using on-hand database management tools … Though a moving target, as of 2008 limits were on the order of petabytes to zettabytes of data.” Better, because it cites an actual metric, but still more of a grand concept than a quantified delimiter.

Okay, smarty-pants, I hear you cry: how would you define it? Well, here's my answer: I wouldn't, and I don't.

The reason is that, for me and my followers, the issue isn't "man, I sure have an awful lot of data to tend to," it's "man, it sure can be hard to find the information I need because we have so much and sometimes I can't even get to it because it's stored in some isolated database someplace.” The point being, it's a business issue, not a technology matter.

You see. exactly how much constitutes "so much" is important only in that it will help determine which tools are needed to throw open all the data doors and windows. The real issue is centered on the impact that having an amorphous mass of data can have on operations and, ultimately, business success. Everything else, as the T-shirt says, is just detail – which is why it is part of the course and test curriculum.



#ECM #BigData #planning
0 comments
23 views