Learning How to Crawl -- Doing Bakeoffs Right: An Enterprise Search with Lynda Moulton

By Marc Solomon posted 03-21-2013 09:21


Most information managers are back office-based. Our customers are externally facing. We build the arguments and the rationales that our colleagues use to close deals and secure work. Our marketing efforts are internal.  Our resources tend to be in-house – as much for our missions and charters as for our reporting and cost structures.

The exception in all this is the bake-off – that dueling battle between rival vendors to see whose search solution is best suited for lighting up the central knowledge switchboard behind your firewall, above your fire cloud, or both.

Everything AND the Kitchen Sink

Julia Child once said: “Always start out with a larger pot than what you think you need.” She might as well have been describing the process for picking an enterprise search engine. This proof-of-concept or PoC is your due diligence for matching internal priorities and selection criteria to your bake-off results. PoCs demand an improbable mix of referenceable work product – no matter where it comes from, the application particulars, or the subpar referencing used to catalog the electronic version of what lives inside organizational roles and responsibilities.

While the practice of staging PoCs is common, the process is mired in mystery. Selecting the right vendor is resistant to simple risk-to-benefit reductions. It rejects gut assumptions like Siri says, “do this.” Hey, wipe that Google web smirk off your enterprise requirements for Google Search Appliance. We’re talking two different engines here. And it’s not just the winning formulas. The actual know-how for staging a competent and successful search engine showdown is no less elusive.

If your enterprise resembles a potluck more than a catered sit-down then we need to listen closely to Lynda Moulton. Moulton is the Julia Child of enterprise search bake-offs. She wrote the book, literally, on requirements gathering as the lead enterprise search analyst for the Gilbane Group. (See her curated list of further readings below). Five years later her Enterprise Search Markets and Applications remains essential grounding for any project lead looking to procure the right licensing fit between competing offerings and our own deal-breaker criteria.

Enterprise Assets … or Liabilities

Talking with Lynda it becomes apparent quickly that no proof of concept can go forward without this sobering realization: Most would-be answers to our colleagues’ questions are kept as anything but “prized” assets. According to Moulton it’s plain wrong to assume that most organizations have their acts together. If they did, what would that look like? Moulton ticks off some basics:

  1. Users tag their stuff
  2. Management assigns documentation roles to content contributors
  3. Product development maintains professionally managed technical documentation
  4. Information professionals…
    • Establish metadata where none exists at the point of content capture through auto-categorization and extraction tools
    • Normalize naming conventions across product lines, research and development processes, lines of business, business units, etc.

So how do we get our houses in order in preparation for the PoC? Here are some essential building blocks:

  • How big? What constitutes a fair test sample when a vendor solution crawls across diverse operating systems, repositories, and legacy apps (with licenses no one bothered to renew)?
  • How open? What's bothering your colleagues about access to stuff? What are the content bottlenecks that hinder a silo-bound organization?
  • How meaningful? What's the important stuff we can get access to if we need it? What mix of landing pages, wikis, network shares, and Drop-boxings get tapped to address foundational issues and solve routine problems?

Putting a Stake in the Ground

The end result is a map of identifying pockets of underperforming assets and a whole lot of worthless post-expiration trivia: In other words the crawl (or indexing of these far-flung repositories) helps us to chart the ocean without necessarily trying to boil it. Even more critically is a series of use cases from that addresses how these required materials are handled to resolve business cases. To Moulton no bake-off is complete without the lightening round; a PoC that not only inventories collective assets but demonstrates the impact of the search tool on the bottom-line:  

“What are the points in our business where lack of retrieval capability presented or resulted in a major business risk? We need to know where the content exists that would have ‘saved the day’ had it been accessible.”

Next, it’s important to leverage these test results to plan for the eventual implementation.

“You can exploit the PoC outcome by analyzing what shows up in searches that you might have neglected as being important or significant.” Conversely, you can use the PoC to reveal content that needs to be excluded from universal access or that needs special ACL (“access control list”) consideration.

Pricing is another factor that cuts two ways.

Moulton believes it makes sense to pay-as-you-go in the PoC process. There’s the added incentive for vendors to conform to your schedule when the host pays for the set-up. Letting each participant know who they’re up against entices them to draw PoC lessons about their own differentiators and detractions. Internally speaking, PoC commitments lend urgency to undertaking: actual sponsors inspire end-users to respect the process and participate more fully. Ultimately you’ll lessen your exposure if you tie PoC costs to future licensing fees.

Try This at Home

According to our master bakeoff chef, here are a few time-tested recipes to put your best stuff in the path of that crawl and test the mettle of your search providers:

  • Factor in funky exceptions that turn up in test sample to see how the algorithm parses them  
  • Plant some high caliber documents that are resistant to discovery
  • Try to uncover results important to you based on your program knowledge. Have various experts test for discovering them, sharing only the barest of generic description, such as, "We have document from the 'Phoenix' program in the database.”
  • Do extensive testing to try to find proprietary information, client confidential information, or personnel information that should not be indexed (e.g. social security numbers, health information about employees, government classified material).

Likely Suspects

Moulton suggests writing taxonomy and metadata management standards ahead of selecting your test set. This ensures consistency and meaningful comparisons between solutions. Narrow your PoC to vendors who respect top-of-mind metadata and can channel date ranges, authors, companies, and other entities into facets.  Then go them one better and share your own topic maps and categorical schemes. Determine how you want them handled by the search engine. According to Moulton, one example germane for PoCs is for file shares: “Request that they use folder names as metadata. The rules need to be figured out for all crawl-gathered repositories.”

Regardless of what’s planted and what seedlings sprout from our compost bin of legacy assets, there are a few outcomes likely to be favored in the aftermath:

  1. More stuff than you knew you had:
    the kitchen sink talking back
  2. More unique assets than expected:
    not just copies of copies
  3. Some stuff doesn't show up:
    Push those admin tools from the feature sets to unearth those assets
  4. Collect discoveries in your ‘issues to be addressed’ report and planning for implementation:
    Discuss and debate with management the significance of your findings.
  5. Consider governance implications in those unwanted discoveries.

“ACL rules are always complex,” says Moulton. “It's a whole other level of implementation.” She advises to beg off until after the search kinks are worked out.

My Place or Yours

Another big risk is going with a platform vendor. Oracle, Microsoft, Google and IBM are not simply bakeoff candidates but ecosystems in their own right. They don’t all play in the same sand box or even on the same playground. Picking from this selective group could well mean throwing away a lot of assets just to conform to the demands of becoming a platform X shop. But if conformity performs better than a best-of-breed provider, take note: Make sure they have ‘connectors’ for indexing the applications you’re trying to federate through a global search capability.

Finally Lynda Moulton strongly endorses PoCs to be hosted on local hardware. That’s so your environment gets tested for gaps and your technical people are actively participating, learning and gaining understanding of what they will be dealing with in the final product. Either way it’s your business cases that will carry the most weight regardless of familiarity or comfort with a specific IT brand or search interface.

After all, it should be serving up a meal you can make at home with ingredients you can't find anywhere else.

Selective Links

Podcast: Who should be in charge of an enterprise search deployment?

Article: Effective enterprise search strategy more than a technology problem

Video: Set up an enterprise search platform on-premises or in the cloud


#platform #ACL #Google #enterprisesearch #metadata #vendorselection #rolesandresponsibilities #bakeoff #governance #SharePoint #IBM #pricing #microsoft #Search #Oracle #proofofconcept #Taxonomy