Semantic Analysis and Mapping of Search Related Information

By Andrew Brom posted 05-30-2011 20:39


Information overload can be a daunting problem that can put an entire business process to a halt. Thus, developing methods that help us organize and analyze information in a concise and meaningful way are of paramount importance. Business and non-profit / government organizations nowadays use various approaches to structuring information - from elaborate database applications to filtering procedures, etc.

In the scientific world, statistical analysis that is usually applied to numerical and pseudo- numerical data is now being widely used to find the patterns and major contextual links in textual information. Be it a hundred tweets, thousand search engine results or article / book titles, the semantic analysis can provide more than just a few insights.

There are still not many business applications of this technique and I thought I would share what a powerful tool it could be if used properly. 

While working for a government body, I used this approach to analyze the patent structure of Argentina. L. Leydesdorff first employed this method to study a technological base of a country which he considers one of the main pillars of modern knowledge-based economies. The method revealed "sectors of innovation" within Argentinian economy and allows us to look into international cooperation opportunities between different countries.

Social media and Internet search provides yet another splendid application if one wants to look at various aspects of information-based society. While working on a Web 2 app for looking up U.S. cell phone numbers, I was interested in looking at various niches that existed within this search vertical. Basically, people use a search engine to find information and products / services to meet their needs. In order to build a successful marketing strategy, the first thing we should do is find a niche for our offer.

Google provides detailed data on what people search on the Web, what keywords they type in to find websites. These data are hard to digest without some sort of sorting. A semantic representation of keyword data will allow us to get a clear snapshot of the search intent and niche structure.

Using a semantic mapping software, one can produce a graph that shows interlinks within clusters of textual data. Take a look at one such semantic map of keywords related to cell phone numbers. I used only high frequency words (top 100 words)  and dropped keywords with no commercial intent (the ones that had fewer than 2 Google adwords ads). For example, keyword: "cell phone numbers go public" was dropped from the list. Also, the word "number" represents both singular and plural forms.

The level of significance (correlation) used for this graph is >=0.15 which helps us reduce some clutter. Solid lines represent the most significant links - the core of the niche. Needless to say that this picture gives a deep insight into the niche structure which can greatly help in developing and marketing of the service or product as well as learning about the most relevant searches for it.

From this picture we see that the main concern or need related to cell phone numbers is how to find them or trace them via reverse lookups (directory). People want to find the name, owner, address, etc. associated with the mobile number and they want to get this info for free.

We can adjust the level of significance to see some smaller niches within this vertical.

This analysis can be applied to a wide range of information loaded processes in marketing, social sciences, etc. This is an excellent tool for looking at a structure of textual data.  

#textualinformationstructure #semantickeywordmaps #ScanningandCapture