YaCy

From Wikipedia, the free encyclopedia

YaCy
Developer: Michael Christen
Latest release: 0.49 / December 2, 2006
OS: Platform independent (due to JavaJRE/J2SE)
Use: Search engine
License: GPL
Website: yacy.net

YaCy (read "ya see") is a distributed search engine, built on principles of peer-to-peer (P2P) networks. Its core is a computer program written in Java distributed on currently (September 2006) several hundreds of computers so called YaCy-peers. Each YaCy-peer independently crawls through the Internet, analyses and indexes found web pages and stores indexing results in a common database (so called index) which is shared with other YaCy-peers using principles of P2P networks.

YaCy-network has a decentralised architecture. All YaCy-peers are equal and no central server exists. It can be run either in a crawling mode or as a local proxy server, indexing web pages visited by the person running YaCy on his computer. (Several mechanisms are provided to protect user's privacy).

Access to the search functions is made by a locally running web server which provides a search box to enter the query and returns results of the search in form of a web page as usual on other search portals and engines

The program is released under the GPL license.

Contents

[edit] Architecture

YaCy search engine is based on five elements:

  • Crawler - a search robot which traverses from web page to web page and analyses their context.
  • Indexer - creates a Reverse Word Index (RWI) i.e. each word from the RWI has its list of relevant URLs and Ranking information. Words are saved in form of word hashes.
  • Reverse Word Index Database - AVL-tree structure creates ordered URLs and supports eficient table join needed for a search for given combination of words
  • Search and Admnistration interface - made as a web interface provided by a local http servlet with servlet engine
  • P2P network - used to store the Reverse Word Index Database

[edit] Advantages

As there is no central Server, the results can not be censored, and the reliability is (at least theoretically) higher.

Because the engine is not owned by a company, there is no advertising or manipulated ranking.

Because the design of yacy it can indexing hidden web, like tor or i2p (freenet maybe too).

[edit] Disadvantages

As there is no central server and the YaCy network is open to anyone, malicious peers are (theoretically) able to insert inaccurate or commercially biased search results. Also, at present, YaCy returns on the average significantly less results and is much slower than large commercial search engines.


[edit] External links



In other languages