Blogs

Faceted Search: how to go from a Static to a Dynamic Taxonomy

By Johannes Scholtes posted 05-27-2010 11:03

  

Many search engines used in portals are optimized to retrieve predefined, specific and precise specifications. For those instances, one must know exactly which words to use and the search result for these words will be very precise and accurate. This is “focalized” search. However, if one does not know exactly what words to use in the search, then traditional search tools will not help. For those instances, one requires “exploratory” search. See this entry for more information on the exact differences between focalized and exploratory search: http://zylab.wordpress.com/2010/03/26/understand-the-two-different-faces-of-search-exploratory-search-and-focalized-search/).

Focalized search techniques provide little to no ability to explore data; they assume the user knows the exact terms to investigate. They fit very well in a basic retrieval model, but for an exploratory model, one needs techniques that can deal with imprecise specifications and which, even more important, also are dynamic and self-adopting to changing environments and data-sets.

This is where a taxonomy can make a huge difference: when a collection of documents is full-text searchable, the end-user can find synonyms and other relevant words as suggested by the taxonomy.

A great example of the usefulness of a taxonomy for search can be found here: http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?name=Archaea. As you can see, in addition to a simple search box, it is also possible to browse potential search phases by using an organized and hierarchical list of related words (e.g. the taxonomy). Compare this to the limited “one search box” approach where you have to guess what words to use for your query.

This is a huge help to researchers, especially in cases where they did not catalogue, classify, create or produce the documents themselves. This is exactly the reason why various governments around the world have a mandatory requirement for government websites to include a taxonomy in order to help  visitors navigate the information and find the right documents. Here is an example of that from Australian Government: http://www.egov.vic.gov.au/website-practice/information-architecture/information-architecture-taxonomies-archive.html).

If one has multiple taxonomies, they can also be seen as search facets, which all provide different insight into the data and allow the user to refine search results. This is also called faceted search (http://zylab.wordpress.com/2010/04/28/how-to-find-more/). By combining multiple facets in one search, dynamic filtering becomes a reality and searching becomes much easier!

A great example of faceted search and faceted navigation can be found here: http://www.flickr.com/photos/morville/sets/72157623085918037/. By thy way, the book Search Patterns by Peter Morville and Jeffery Callender  is a must-read for anybody interested in these topics.

Now, this is where text-mining and content analytics become especially valuable: (http://zylab.wordpress.com/2010/01/26/finding-relevant-information-without-knowing-exactly-what-is-available-or-what-you-are-looking-for/). Taxonomy does come with its challenges: namely creation and maintenance. In a multi-taxonomy environment, required for faceted search, this problem becomes only more complex. In a text mining system, named entities and relationships among them, can be recognized and classified automatically. This means that once you have determined what different facets you want to offer, you can fill the values by running text-mining algorithms on the content that you want to disclose. Typically, for specific named entities one would have different taxonomies (facets) to cover different dimensions such as chronology, geography, spatial/geometric, increasing quantity/quality, simple to complex, etc.

What we see happening in the industry is an increased awareness of the need of faceted search. In order to keep the cost of ownership down, it is essential to use advanced text-mining and other content-analytics to maintain and generate the content of the facets. As a result, we will end up with so-called Self-Adopting Exploratory Search Structures that provide multiple choices and dynamic filters. Users can easily define sets and summarize and focus search results on the fly.

This is how static taxonomies will evolve into dynamic ones—there will be a completely new dimension to the original application of taxonomies for libraries and biology, for example, (http://en.wikipedia.org/wiki/Linnaean_taxonomy), but just as useful!



#e-discovery #Facetedsearch #InformationGovernance #exploratorysearch #TaxonomyandMetadata #Search #ElectronicRecordsManagement #Taxonomy #enterprisesearch
0 comments
919 views