Last week, Stephan Arnold wrote a great blog post about an older ZyLAB eDiscovery promotion video (http://www.youtube.com/watch?v=n6MhdqHlGzU) which I posted at You Tube a while ago (although we did not call it eDiscovery in those days!). You can find the blog post here: http://arnoldit.com/wordpress/2010/08/10/zylab-marketing-since-1994/.
The author is absolutely right that much of what can be seen in that video regarding search technology, is still more or less standard practice today. So, what happened in search in the last 15 years, nothing? Is search dead, or is search now a commodity or embedded functionality? These are all very valid questions.
In a way, there are many similarities with what we do today, but there are also some significant differences. Development in search has not completely stood still:
1. The amounts of data we have to deal with these days are at least 2 to the power of 10 (1024) larger than 15 years ago. Being able to search within such vast data populations required significant R&D , and that type of R&D investment will continue as there are no signs that the data growth will stop (http://zylab.wordpress.com/2010/07/02/to-infinity-and-beyond-how-to-avoid-ediscovery-3d-2/).
2. Text mining (both statistical and linguistic) and other exploratory search types such as faceted search (http://zylab.wordpress.com/2010/05/28/faceted-search-how-to-go-from-a-static-to-a-dynamic-taxonomy/) have contributed significant to the usability of search interfaces. 15 years ago, there was not enough electronic data to train the statistical algorithms and there was not enough coverage of languages to implement proper disambiguation of, for instance, pronouns, co-references and entity boundaries (http://zylab.wordpress.com/2010/04/28/how-to-find-more/).
3. Advanced data visualization was not possible 15 years ago, unless you had a lot of time and a very large mainframe. Huge progress has been made in this field (http://zylab.wordpress.com/2010/02/23/automatic-email-and-social-network-analysis/).
4. Natively searching multi-media such as sound, the sound component of videos is becoming a reality. Phone based search is showing acceptable precision and recall measures and the development of large libraries of objects for visual search is reaching levels where it is becoming really useful.
5. But most importantly, the application of content analytics and other search technology is now getting built-in to specific search applications for eDiscovery, compliance, auditing, and other real-world applications. There is more than enough search technology available, but it often lacks useful rules and examples of libraries to apply that technology in specific fields. We see more and more of these types of ready-to-use libraries for sentiment mining, categorization of documents in a large eDiscovery process, automatic clean up of legacy information and automatic filing- and records management in Enterprise Information Archiving.
15 years ago ZyLAB was often still evangelizing full text search, now we are all used to full-text, but we have forgotten about many other tools that are needed in a proper search interface such as taxonomies (http://zylab.wordpress.com/2010/07/20/6-practical-tips-for-designing-taxonomy/) and other tools to help a user to find the right search terms.
Unfortunately not everybody is aware of or using these new techniques. Too many search interfaces are still way too limited or do not fully leverage what is possible today. This will have to change as data size and types (multimedia) will continue to change and will continue to drive ongoing investment in new search technology and capabilities.
Search is dead, long live the new search!
#enterprisesearch #Text-Mining #searchdrivenapplications #ElectronicRecordsManagement
#Content Analytics #informationmanagement #e-discovery