WebQL is a software platform produced by QL2 Software, that is used to automate data integration and collection from unstructured and structured sources including the Web, PDF and Word documents, spreadsheets, email repositories, corporate data stores and more.
WebQL has been on the market since 2001. The most current version, WebQL 3.1, was released in November 2006. WebQL was named a "Trend Setting Product for 2006" by KM World[1]. WebQL customers include 5 of the top 10 pharmaceutical companies and 7 of the top 10 airlines.
In addition to handling content in text, WebQL is also capable of Optical Character Recognition that enables it to retrieve text within images.
In many web data integration tasks, the desired data is located on a web page that is accessible only through a form that needs to be completed to access detailed information. WebQL is capable of automatically populating such variable data to gain access to the “deep” Web. The data can then be extracted by WebQL and transformed into an actionable format to be used in a variety of analytical operations.
WebQL features novel URL schemes that allow for enhanced flexibility when accessing data sources that are external to WebQL. WebQL also support XML data of arbitrary size, and APIs for embedding WebQL in C, Java or .NET programs.
WebQL is driven by a sophisticated programming language similar to standard SQL. The language has a number of operations designed to simplify complex data integration tasks. By providing a virtual database layer, WebQL shields developers from the complexity of specific data formats and network protocols. WebQL programmers can use their existing SQL skills to access, transform and integrate data with minimal effort. WebQL can also be operated by less technical users through WebQL Desktop. In addition to licensing the WebQL software for deployment in a customer’s environment, QL2 Software has solutions and will develop customer solutions built using WebQL technology on behalf of its customers, and host them in the company’s secure online data center.
Below are several sample WebQL scripts. While scripts to perform real-world data integration tasks are generally much larger, these scripts give a sense the language’s capabilities.
The following script examines every document within two links of the QL2 Software home page and retrieves every phrase of the form “the X”:
select item1 from pattern '(the \w+)' within crawl of http://www.ql2.com/ to depth 2
The following script searches blogs for discussions about Wikipedia:
select URL, clean(CONTENT) as TITLE from links within http://blogsearch.google.com submitting values 'wikipedia' for 'q' where url_host(URL) not matching 'google'
The following script generates three-sentence summaries of current news stories:
select source_content as DOCUMENT, source_title as TITLE, source_url as URL from crawl of http://news.google.com to depth 2 following if url_host(URL) not matching 'google' join where URL not matching 'google' select URL, TITLE, summarize(clean(ARTICLE_BODY), 3) as SUMMARY from articles within inline DOCUMENT