Semantic search

From Wikipedia, the free encyclopedia

Semantic Search attempts to augment and improve traditional Research Searches by leveraging XML and RDF data from semantic networks to disambiguate semantic search queries and web text in order to increase relevancy of results. Hildebrand et al [1] provide an overview that lists semantic search systems and identifies other uses of semantics in the search process.

Guha et al[2] distinguish two major forms of search: Navigational and Research. In navigational search, the user is using the search engine as a navigation tool to navigate to a particular intended document. Semantic Search is not applicable to navigational searches. In Research Search, the user provides the search engine with a phrase which is intended to denote an object about which the user is trying to gather/research information. There is no particular document which the user knows about that s/he is trying to get to. Rather, the user is trying to locate a number of documents which together will give him/her the information s/he is trying to find. Semantic Search lends itself well here.

Rather than use ranking algorithms such as Google's PageRank to predict relevancy, Semantic Search uses semantics, or the science of meaning in language to produce highly relevant search results. In most cases, the goal is to deliver the information queried by a user rather than have a user sort through a list of loosely related keyword results.

Other authors primarily regard semantic search as a set of techniques for retrieving knowledge form richly structured data sources like ontologies as found on the Semantic Web [3]. Such technologies enable the formal articulation of domain knowledge at a high level of expressiveness and could enable the user to specify his intent in more detail at query time.

Contents

[edit] Disambiguation

In order to understand what a user is searching for, word sense disambiguation must occur. When a term is ambiguous, meaning it can have several meanings (for example, if one considers the lemma "bark", which can be understood as "the sound of a dog" or as "the skin of a tree"), the disambiguation process is started, thanks to which the most probable meaning is chosen from all those possible.

Such processes make use of other information present in a semantic analysis system and takes into account the meanings of other words present in the sentence and in the rest of the text. The determination of every meaning, in substance, influences the disambiguation of the others, until a situation of maximum plausibility and coherence is reached for the sentence. All the fundamental information for the disambiguation process, that is all the knowledge used by the system, is represented in the form of a semantic network, organized on a conceptual basis. [4]

In a structure of this type, every lexical concept coincides therefore with a semantic network node and is linked to others by specific semantic relationships in a hierarchical and hereditary structure. In this way, each concept is enriched with the characteristics and meaning of the nearby nodes.

Every node of the network (called Synset) groups a set of synonyms which represent the same lexical concept (called Synsets) and can contain:

  • single lemmas ('seat', 'vacation'; 'work', 'quick'; 'quickly', 'more', etc.)
  • compounds ('non-stop', 'abat-jour', 'policeman')
  • collocations ('credit card', 'university degree', 'treasury stock', 'go forward', etc.).

The semantic relationships (links), which identify the semantic relationships between the synsets, are the order principals for the organization of the semantic network concepts. Startup companies involved in this sector are Powerset, Yedda and Hakia.[5]

[edit] Applications

At "ShowStoppers" CTIA 2007, AskMeNow demonstrated the latest in semantic search technology across multiple consumer content such as directory service, sports and Wikipedia as well as Enterprise database search & extensibility, highlighting Mobile Help Desk, CRM and Market Intelligence.

hakia.com, PowerSet and Trovix are 3 well-funded (over $10M) startups which tackle the semantic search problem.

Semantic Web search engines index RDF data stored on the Web and provide an interface to search through the crawled data.

Examples of semantic web search engines include:

Examples of health domain specific semantic search engines:

[edit] See also

[edit] References

Several scientific event cover the topic of semantic search explicitly, such as the Semantic Search 2008 Workshop at ESWC'08 and the Workshop on Exploiting Semantic Annotations in Information Retrieval at ECIR'08.

Languages