Beautiful Soup (HTML parser)

Beautiful Soup
Original author(s) Leonard Richardson
Stable release
4.6.0 / May 7, 2017 (2017-05-07)
Repository code.launchpad.net/beautifulsoup/
Written in Python
Platform Python
Type HTML parser library, Web scraping
License Python Software Foundation License (Beautiful Soup 3 - an older version) MIT License 4+[1]
Website www.crummy.com/software/BeautifulSoup/

Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.[1]

It is available for Python 2.6+ and Python 3.

Code example

# anchor extraction from html document
from bs4 import BeautifulSoup
import urllib2

webpage = urllib2.urlopen('http://en.wikipedia.org/wiki/Main_Page')
soup = BeautifulSoup(webpage,'html.parser')
for anchor in soup.find_all('a'):
    print(anchor.get('href', '/'))

See also

References

  1. 1 2 "Beautiful Soup website". Retrieved 18 April 2012. Beautiful Soup is licensed under the same terms as Python itself


This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.