HtmlUnit
From Wikipedia, the free encyclopedia
HtmlUnit is a pure Java headless web browser, which allows high-level manipulation of web pages, such as filling forms, clicking hyperlinks, accessing attributes and values of specific elements within the pages, you do not have to create lower-level requests of TCP/IP or HTTP, but just getPage(url)
, find a hyperlink, click()
and you have all the HTML, JavaScript, and AJAX are automatically processed.
The most common use of HtmlUnit is test automation of web pages (even with complex JavaScript libraries, for instance Google Web Toolkit 1.4.60 tests now pass), but sometimes it can be used for web scraping, or downloading website content.
Version 2.0 includes many new enhancements such as W3C DOM implementation, Java 5 features, better XPath support, and improved handling for incorrect HTML, in addition to the usual JavaScript various enhancements, while version 2.1 mainly focuses on tuning some performance issues reported by users.