IBM WebFountain

WebFountain is an Internet analytical engine implemented by IBM for the study of unstructured data on the World Wide Web. IBM describes WebFountain as:

. . . a set of research technologies that collect, store and analyze massive amounts of unstructured and semi-structured text. It is built on an open, extensible platform that enables the discovery of trends, patterns and relationships from data.[1]

The project represents one of the first comprehensive attempts to catalog and interpret the unstructured data of the Web in a continuous fashion. To this end its supporting researchers at IBM have investigated new systems for the precise retrieval of subsets of the information on the Web, real-time trend analysis, and meta-level analysis of the available information of the Web.

Factiva, an information retrieval company owned by Dow Jones and Reuters, licensed WebFountain in September 2003, and has been building software which utilizes the WebFountain engine to gauge corporate reputation.[2] Factiva reportedly offers yearly subscriptions to the service for $200,000. Factiva has since decided to explore other technologies, and has severed its relationship with WebFountain.

WebFountain is developed at IBM's Almaden research campus in the Bay Area of California.

IBM has developed software, called UIMA for Unstructured Information Management Architecture, that can be used for analysis of unstructured information. It can perhaps help perform trend analysis across documents, determine the theme and gist of documents, allow fuzzy searches on unstructured documents.[3]

References

  1. ^ http://www.redbooks.ibm.com/abstracts/redp3937.html
  2. ^ IBM sets out to make sense of the Web - CNET News. News.com.com. Retrieved on 2010-10-18.
  3. ^ IBM Open Sources WebFountain (UIMA). IBM Open Sources WebFountain (UIMA) – Unstructured Text Analysis software.

IBM WebFountain and WebFountain Appliance Overview

WebFountain is a Web scale mining and discovery platform that combines breakthrough text analytics technology and very large, heterogeneous data sources with custom solutions. It is an on demand solution delivered through IBM Business Consulting Services organization or through our business partners. The WebFountain platform identifies patterns, trends, and relationships from many documents, including Web pages, web logs, bulletin boards, newspapers, and other structured data feeds. WebFountain technology enables companies to find timely, global information. This knowledge offers executives and analysts valuable insights into the core opportunities and challenges affecting their business.

What is WebFountain?

WebFountain is a high-performance, scalable, and distributed platform that supports the data gathering, storing, indexing, and querying needs of analysis agents, called miners. A miner is a software component that extracts, analyzes, parses, and merges data from a WebFountain data store. The data storage layer consists of a large data store and several supporting components. The data store is responsible for data storage and retrieval. Additional performance and functionality are provided by other components within the data storage layer. WebFountain not only provides the data, but also the simple, yet rich, programmatic interface to interact with the data and associated metadata. WebFountain contains an array of application-specific miners that enable you to develop insights into such areas as:

� Business strategy � Marketing strategy � Product design � Public relations � Brand management � Product management � Advertising � Temporal trends � Government affairs

Data sets found in Web mining tend to be extremely large and complex in nature. WebFountain offers a broad assortment of techniques for accessing and interacting with the information found on the Web. With WebFountain, you can perform a sophisticated analysis of all relevant data sources and recognize important associations found within the data. This discovering knowledge is a process comparable to finding needles in haystacks. However, WebFountain provides the means for streamlining the process so that finding needles in haystacks becomes a simple procedure. Unlike simple search engines, WebFountain can perform advanced text analytics on a Web scale. Until now, if you wanted to perform advanced text analytics, you would have to refine and filter the scope of the data you were mining significantly to achieve any usable results. WebFountain is designed to ingest and generate rich metadata for the entire Web to offer insights that impact both enterprise-wide and industry-wide business processes.

2 IBM WebFountain and WebFountain Appliance Overview

You can tailor the WebFountain system to fit a specific application area. With WebFountain Appliance, documented Web Service application programming interfaces (APIs), and sample code, you can develop application-specific data mining software that fits your business needs, rapidly and seamlessly. For all application vendors and content providers who seek to leverage Web content as an asset, WebFountain offers a new platform for innovation. For more information regarding the IBM WebFountain project refer to the following Web site:

http://www.almaden.ibm.com/webfountain/

Where does WebFountain Appliance fit in?

The WebFountain Appliance lets software vendors develop applications that take advantage of the WebFountain system’s mining capabilities. The WebFountain Appliance exposes a well-defined set of Web Services based on Simple Object Access Protocol (SOAP) that communicate securely over HTTPS, either with the entire WebFountain cluster or with the external developed applications. Physically speaking, the Appliance is a single machine hosting a miniature version of WebFountain. We refer to it as the WebFountain cluster in a box. With the Appliance, you can port existing applications rapidly and develop new applications at a third-party location. The Appliance retains all of the WebFountain functionality, including infrastructure, miners, and Web services. The main difference is the limited amount of data present in the data store. The Appliance’s architecture lets you write and test mining applications using local machines only. When complete, you can deploy the application seamlessly for use on the WebFountain cluster. This design lets WebFountain maintain a production environment for performance reasons, and allows you to prototype and develop custom applications powered by WebFountain faster. The Appliance also allows you to create on-topic stores for different business or logical domains. An on-topic store for a domain consists of pages and associated metadata from the open Web, blog posts, bulletin board posts, and news articles. The on-topic store is the starting point for running more complex analytics specific to that domain. The more topic-focused the data store is, the faster the domain-specific analytics can be. The resulting “database” of pages and facts interacts via Web Services to external applications.

External links