STAVIES (algorithm)

From Wikipedia, the free encyclopedia

STAVIES is a proposed algorithm for extracting information from the World Wide Web.

The main innovation and contribution of the proposed system consists in introducing a signal-wise treatment of the tag structural hierarchy and using hierarchical clustering techniques to segment the web pages. STAVIES can operate without human intervention and does not require any training.

[edit] Sources

Papadakis, Nikolaos; Dimitrios Skoutas, Κonstantinos Raftopoulos and Theodora Varvarigou (December 2005). "STAVIES: A System for Information Extraction from Unknown Web Data Sources through Automatic Web Wrapper Generation, using Clustering Techniques". IEEE Transactions on Knowledge and Data Engineering 17 (12): 1638-1652.