Billions of pages that compose the WWW are growing faster and faster. Since a
few months, this phenomenon has been speed up as social media democratisation
(ex : blog) allows anyone to produce and broadcast content.
This great amount of information make specific content identification more and
more difficult as today’s search engine approach is general and exhaustive.
Content identification on Google or MSN is based on keywords
correspondence.
Moreover, the first ranks of their organic list of results are now trusted by
merchant content. Slowly but surely, earch Engine Marketing (SEM) and Search
Engine Optimisation strategies did their job making more difficult for user to
find information that nobody has been promoted.
1. At first there was The Semantic Web
Remedies exist. Tim Berners Lee imagined how it was possible to add "sense"
to the search engine index. He recommended to use semantic. This was in 1999,
another century in the WWW. Tim wanted to add a semantic layer on the search
engine to facilitate links creation between documents, using concept tags
(based on sense), rather than limiting this on hyperlinks, as Google does (the
famous Page Rank).
“We are going from a Web of connected documents to a Web of
connected data.” Nova Spivack, RADAR NETWORKS
This new approach also add another objective : reinforce the contextual
dimension of information searching.
Documents will be classified and put into contexts. Those contexts will
themselves be linked to other documents. This approach was known as “The
Semantic Web”. Behind this idea, Tim thought that publishers and content
producers would manually create tags allowing internet users to surf in a
structured semantic universe. Latter, the social media and Web 2.0 revolution
did the job : Bloggers create tags for the content they are publishing or Digg
users manually describe with their own word each website they find interesting
to share with others.
But this approach is making a lot of issues :
1/ For a perfect work, each publisher need to share and use the same universal
and structured vision of the world in order to make sure to use same words for
same concept.
2/ Each Publisher should be able, and willing (take the time to do it) to
describe its content in each of its dimension
3/ SEO and SEM temptation will still be possible
For each of these reasons, the semantic web remains a concept, and its
industrialisation, an utopia. Finally start-up reinvented this approach, under
the Web 3.0 concept.
The principle is based on the automatic information processing using semantic
search engines in order to extract concepts and to link them together. As long
as this approach is automatic, this task can be spread to a wide amount of
content, as this is not only the work of publishers or readers, as
before.
We can take here some examples :
- Extract date from a document to place it on a time axis - Extract company names - Extract executive names and details - Extract places...
But to be really efficient, this approach should be done in a finished world.
Web 3.0 will thus need to be vertical.
2. The born of Vertical search engine
A new generation of vertical search engine was born these last months. Each
of them follow the web 3.0 trend. They address homogeneous users’ needs
(airline tickets search, industry report search..) and create value-added
services from public information gathered from the open web (and free, as a
consequence).
1/ Vertical index
The specific approach of these engines allows them to build a specialized
index, and to delete peripheral contents, not directly linked to the thematic.
Doing so, they eliminate all the noise experienced in the general search
engine.
EXAMPLE : Their specialization allow them to index document from the deep web,
document that are not present in the general search engine results.
EXAMPLE : In the travel industry, recent search engine such as Sprice.com or Farechase.com automatically and
simultaneously query numbers of airlines website to offer user the best prices
without searching these websites one by one.
2/ Vertical search features
These new search engines also offer users search and filtering features
dedicated to the specific needs they answer.
Travel Industry : Sprice.com
allows users to compare airlines prices by amount, length, departure and
arrival time, type of flight.....
Executive search : Zoominfo
allows users to search among enterprise executive, by industry, geographic
zone, company revenues ...
Consumer Electronics : Retrevo.com helps users to identify for each product (Mp3 reader..)
each web resources by type (Documentation, consumer reviews...)
Market Research : Reportlinker.com allows users to identify open access market research
and dynamically organizes them by industry, geographic zone, date... and to
preview a document before downloading it.
Specific information processing is done to build these vertical features : -
Index visible and invisible web - Analyse and extract each concept of each
document in each of the dimension the user can search through the tool key
features - Information uniformisation from heterogeneous sources
3/ Vertical context
Information contextualisation is the third axes if these value-added vertical
search services. To generate new information, the idea is to extract and cross
each piece of information with a different one, classified under the same
concept.
EXAMPLE : Using press release, company website and other free Publication,
Zoominfo automatically
generate a genuine company profile.
The amount of information published over the internet is more and more
important, but also more and more fragmented. Each piece of information has a
limited value compared to the value of each of them linked together. So the
understanding that each Web 3.0 search engine has from its environment allow
them a sharp exploitation of semantic technologies, to offer user genuine and
innovative added-value services.