Some experiments in website scraping using Python 2.7 with BeautifulSoup 3.2. The first function here shows various manipulations of an HTML page, including saving a scrubbed file to disk. The second function shows a simple crawler that attempts to traverse a domain and build a sitemap from hyperlinks encountered in the pages. Includes some commentary […]