We are uploading for our readers chapter extracted from an conference of the 2007 Online Information Show. The last chapter is on hos way : How Web 3.0 new search services help information professionals mining the web for valuable open access information ?
Billions of pages that compose the WWW are growing faster and faster. Since a few months, this phenomenon has been speed up as social media democratisation (ex : blog) allows anyone to produce and broadcast content.
This great amount of information make specific content identification more and more difficult as today’s search engine approach is general and exhaustive. Content identification on Google or MSN is based on keywords correspondence.
Moreover, the first ranks of their organic list of results are filled in by merchant content. Slowly but surely, Search Engine Marketing (SEM) and Search Engine Optimisation (SEO) strategies did their job making more difficult for user to find information that nobody has been promoted.
In 5 years, average length of business information searches on the Internet get 40 % longer Outsell Inc says. This evolution made to the detriment of analysis time has an estimated costs for the companies of € 300 M worldwide.
The birth of tools allowing identification and diffusion of business information is a natural consequence of today’s situation, and is necessary for every company, however large it may be (Multinational or SME).
Specialised search engines, such as vertical ones, were the first one to simplify their index. They choose to cover a restricted scope of information, but with much better search features that can offer any generalist search engine.
We will present today their main features and identify technologies. We will
present two of these new search engines in
the business information market (Zoominfo and Reportlinker) next
week.
A/ Why are vertical search engines emerging ?
If the amount of information published over the Internet is more and more important, it is also more and more fragmented. Each piece of information has a limited value compared to the value of each of them linked together.
So the understanding that a vertical search engine has from its environment allows it a sharp exploitation of semantic technologies, to offer users genuine and innovative added-value services.
Specific information processing is done to fit their vertical features :
- Index visible and invisible web
- Analyze and extract each vertical specific concept (company names, market segment, executives names…)
- Information made uniform from heterogeneous sources
Vertical search engines aim at addressing homogeneous users’ needs (business executives search, industry reports search..) and create value-added services from their knowledge of restricted users scope of expectations.
That’s the best formula to reduce search time from Internet users. In doing
so, they provide users friendly application that allows users to more
efficiently mine the web for value added information.
B/ Vertical search engines features and technologies : One vertical
axis, three features
Vertical search engines offer 3 mains features : a vertical index, vertical
search features and a vertical contextualisation tools

C/ The semantic is back to the hearth of the Internet
Three technologies characterize web 3.0 search engines : Semantic, Thesaurus and Concept Extraction.
Semantic Search Engine :
“A semantic search engine is a search engine that takes the sense of a word as a factor in its ranking algorithm or offers the user a choice as to the sense of a word or phrase.”
Taking care of the meaning of a text corpus, semantic analyses enable the pre-treatment and filtering of search results :
- Search results clustering into thematic categories (categorization)
- Automatically adds tags to document description
- Displays additional stories linked to the document, even when the same keywords are not present
Semantic technologies play a very important role into vertical search engines as it allows to precisely organise the information among a finished number of dimensions.
For instance Farechase.com, the travel search engine, organise its result among the following dimensions :
- Prices
- Airlines
- Departure Time
- Flight duration
- Direct flight or not
A sharp management of document context is almost impossible in a general search engine as it would be necessary to create as many index as specific point of views users would like to have to analyze data (example : Webfountain Technology from IBM).
Thesaurus
Thesaurus Semantic analyses are based on thesaurus, a structured organisation of keywords. Thesaurus building and management will allow the definition of a semantic dimension of a document. This structured organisation is hierarchic, but also transversal. Link between concepts is established in thesaurus. That's how general sense of a document is understood by artificial intelligence.
Concept extraction
Semantic technologies are able to automatically recognize and extract concepts, based on different elements of a sentence : syntax, grammar, meaning, context... Thus, it is able to recognize specific entities, such as : - date - place - people - company - ........