Beautiful Soup (HTML parser)
Original author(s) | Leonard Richardson |
---|---|
Stable release |
4.6.0
/ May 7, 2017 |
Repository |
code |
Written in | Python |
Platform | Python |
Type | HTML parser library, Web scraping |
License | Python Software Foundation License (Beautiful Soup 3 - an older version) MIT License 4+[1] |
Website |
www |
Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.[1]
It is available for Python 2.6+ and Python 3.
Code example
# anchor extraction from html document
from bs4 import BeautifulSoup
import urllib2
webpage = urllib2.urlopen('http://en.wikipedia.org/wiki/Main_Page')
soup = BeautifulSoup(webpage,'html.parser')
for anchor in soup.find_all('a'):
print(anchor.get('href', '/'))
See also
References
- 1 2 "Beautiful Soup website". Retrieved 18 April 2012.
Beautiful Soup is licensed under the same terms as Python itself
This article is issued from
Wikipedia.
The text is licensed under Creative Commons - Attribution - Sharealike.
Additional terms may apply for the media files.