"Tip: Rescue terrible HTML with TagSoup"

Well, since I've so emphatically broken my Weblogging pause for The Cup, I'd better post some professional items.

Subtitle: Turn poorly formed HTML into valid XHTML
Synopsis: XHTML is a friendly enough format for parsing and screen-scraping, but the Web still has a lot of messy HTML out there. In this tip Uche Ogbuji demonstrates the use of TagSoup to turn just about any HTML into neat XHTML.

TagSoup is very handy. EVen though it's a Java project I put it to use from Python code fairly often. It also recently went full 1.0.

[Uche Ogbuji]

via Copia