From Conception to Proof: How to Best-Fit Your Enterprise Search Options

By Marc Solomon posted 07-11-2013 16:58


Over the past several months our knowledge management team has entertained some potential suitors to piece together the far-flung resources of a global engineering organization. Our Proof of Concept (or PoC) has centered on the adoption of competing enterprise search providers to help unify our documentation and process know-how.

With so many resources to bridge and ways to connect them, two seriously central questions arise in a PoC:

1. How much does it take to define the right search fit for your organization?

If you want your PoC to be representative, then a snapshot of some selective sources will do the trick. But if you raise the stakes to include real use cases, think again:

“Oh, you want to do actual research with this?”

You'll need a more comprehensive crawl of the archives to satisfy that requirement. This can be especially vexing when comparing two or more offerings – one of the surface crawl vintage and the other much deeper.

2. The other deal-bender (and sometimes breaker) is whether the PoC is hosted under the wall or in the cloud. Both options pose advantages and challenges.

Here's a closer look at our own experience and the implications for licensing the search applications in your own shop.

Categorical Assessments

A) Crawls:

Setting up a crawl is as hard and as simple as deciding what's worth indexing (and who’s interested). Be careful what you wish for. If you desire simplicity you may get hosed. That’s because it's easy to set up a crawl for all site extensions that fly under the same site root as your company's intranet. However you could be boiling the ocean in no time:

  • No standing governance models?
  • No limits to the organic growth of permission-enabled content?

You’ll wish your PoC was buried in the same enterprise obscurity as all the triviality you’re indexing.

Pros: If you have several resources that most everyone can get to now, you're fine. For example, you may be sparing yourself a whole lot of complexity by federating with Google Search Appliance (GSA) instead of say FAST Search on your community’s SharePoint. But if you could be out of your depth if you point the crawler at renegade and often delinquent servers within your firewall. The better way? Add specific site extensions within an overgrown CMS or an unmanaged file share to your crawl.

Cons: The GSA is at its best in a homogenous environment – even on a rival's platform. But lookout when it's pointed at a hodgepodge of file shares and multiple applications (all running multiple versions). It becomes less of a doorway into disparate repositories and more like an unwelcome visitor where the host spends more time changing the locks than hanging with their house guests.

This intrusive reality was visited on us when configuring a crawl of wiki-based documentation and status reports. Here our service partner raised the specter of the GSA as content reaper. In this scenario our standard security settings are powerless to stop the inadvertent elimination of content in the collaboration-based environment the GSA is crawling. Talk about delivering the opposite of no evil – Yikes!

B) Connectors:

In enterprise search most of the potential for mayhem is curbed by the use of connectors – middleware that enables the index to capture the original content structures through the API of the sources it crawls. Connectors are literally the critical link in reconciling multiple versions of the same application used by the siloed business units within your enterprise. 

Pros: You want to deal with a buffet style menu of add-ons without being trapped into revisiting the search license down the road. Where are these traps? Think of upgrades to other apps you're indexing that may knock the crawler off its updating schedule.

Therein lies both the content and the cost rationale (and a huge, overlooked justification of enterprise search): to put all those insurgent customizations out of their misery. It’s easier in enterprise search land to achieve this with a turnkey shop like Coveo. There are no barriers in its business model between the sales engineer configuring the PoC and the development team threading your code together. Big plus for reliable pricing and responsive service.

Cons: Unfortunately with the GSA your access to plug and play connectors depends less on a standard set of third party plug-ins and more on the domain expertise of the services partner. You're starting from scratch if they haven't done a prior implementation. The irony is that building all that bridge work from scratch into a PoC would raise the cost of the test into the pricing proximity of the actual license. Double irony – Retaining GSA to index Google Drive assets is still a no-go. Not good.

C) Licensing Standards:

Where do you expect any surcharges to go for the extra legwork associated with connectors? More to the point: what's considered standard and what's custom when you're dealing with so few dollars chasing so many APIs?

Pros: In the case of our PoC candidates there was widespread agreement that file shares, SharePoint, and Active Directory are all standard fare. They’re all bundled into the off-the-shelf versions of each search solution. There are also work-arounds for crawling connector-free content. RSS anyone?

Cons: If it sounds like your content spills out of those parameters consider yourself the rule, not the exception. Unfortunately there are few if any standards when it comes to ball-parking a total package deal in relationship to your crawl expanse:

  • Are you including SalesForce?
  • Are you trying to leverage under-used licenses for dormant platforms like Documentum or Notes that have tons of treasure buried under an intractable architecture?

Your best position may be to limit the PoC to familiar organizational footprints, i.e. front vs. back office, or sales/marketing vs. development/manufacturing, etc.

D) Relevance:

There is no greater PoC work-in-progress than establishing the must-haves from the also-rans. That ranking formula which sequences the all-determining first page of search results. Some of this burden can be shouldered by facets – those groups of pre-filtered term sets that reinforce the intentions of the searcher. That said there is no established business model for internally-based search rankings other than the same reliable suspect: reverse chronological date. That ain't much for establishing a pecking order for what users want to see.

Pros: A relevancy gauge enables a formulaic approach to settling search scores. For large enterprises pooled assets from multiple units should require that peer provided content scores higher than the stuff coming out of other groups. Whether the work is tagged to this purpose or not, your enterprise search tool should be able to aggregate departmental-specific content by referencing AD ("Active Directory") credentials within the author field.

Cons: The more thankless and labor-intensive business of internal search rankings means going doc-by-doc to vet the highest quality deliverables. Good luck with that:

  • Who's doing the vetting?
  • How do they define quality?
  • Which keywords cohere to the content in question?

It's a slippery and mostly downward slope. Coveo enables you to build Top Results (their term for what Microsoft and most information architects refer to as “Best Bets”).

On the plus side you can create a long string of keywords associated with any URL or document. But then your top results balloon into pagefuls of starred search results, thus undermining the selective nature of best bets. GSA has the opposite problem. Its keyword match prevents you from connecting a list of phrases or search constraints that most closely define the need met by the deliverable you're referencing.

Functional Comparisons

While those are the core four categories for determining best fit, there are many additional factors for comparing actual search performance. Here are an additional four that factored into our PoC:

1) Speed:

This is a two-pronged factor – (1) is about query responses, and (2) refers to how long it takes the index to rebuild.

Verdict: While Coveo has a better designed admin console for statusing the progress of your builds, the GSA offers the faster response times.

2) Security:

The age-old mantra of enterprise governance: "who can see what" is in full view within the admin settings of all enterprise search tools.

Verdict: Coveo's Quickview function enables users to see a cache result that matches their keywords to their recurrence in the text of their search hits. This is useful for establishing a transparent governance model and a level playing field where in the words of Boston Police Commissioner Ed Davis:

"Everyone is sharing equally."

It also satisfies the why factor – why I got what I got. Conversely it could compromise sensitive materials they would otherwise not be able to access within the native application. Either way all enterprise candidates respect site permissions – so long as you let them crawl your access control lists too.

3) Design:

Look-and-feel is one of those qualitative measures that gets tossed around shopping cart circles more than enterprise search environments. Do you want your interface to be an extension of your org chart? Do you want what identifies search for your users, hence Google? Or do you want form to follow function and let your content weave the navigation path to the prized assets in your crawl? Honestly, that doesn't sound like either of the first two choices.

Verdict: Both GSA and Coveo offer faceted search that filters down to manageable, more qualified, and exacting search results. They do this with a combination of XML-based dictionaries that perform keyword matches and retroactively tag stuff otherwise devoid of any context or metadata. Coveo goes a step further by validating terms based on usage, (i.e. not just a throwaway mention but an article about X). They also offer a UI design kit that enables closed dropdowns that don't gum up the UI but post persistent details like file types, dialects, and date ranges. 

4) Size / Cost:

How many assets will you crawl? The direct nature of the question offers nothing in the simplicity of the answer. After all we're doing the PoC in large part to answer for not having an informed response to our doc counts, our gaps and redundancies, and our content consumption (little of which any search vendor goes near).

Verdict: Our Coveo crawl was 15 times greater than GSAs. This is not reflective of either engine's capacity to index so much as a contrast in pricing models: all you can eat versus a la carte. GSA is more competitive price-wise, if you have a limited scoping of assets and connectors alike. As our experience with a heterogeneous environment suggests, that can be a mighty big IF.


So that's one KM manager's take on picking the right search tool for trawling your apps, shares, and docs. I'd like to tell you that's a wrap, but you know better. That would be more of an overreach than the notion that you need to limit your options to the same competing vendors we did. Competition is a good thing regardless. A healthy dose of it will increase confidence in your choice while reducing the cost of that investment.

#Coveo #usertesting #PoC #licensing #GoogleSearchAppliance #Search #enterprisesearchassessment #vendormanagement #InformationGovernance #usability #bakeoff #SharePoint #comparingsearchengines #proofofconcept #GSA #negotiation #Collaboration