Finding URLs in plain text

John Gruber put in some good work to derive and test a regex to extract URLs from plain text.

"An Improved Liberal, Accurate Regex Pattern for Matching URLs"

I needed to use it today and found it needs a bit of care to translate for use in Python, especially with regard to its Unicode characters.  Here is my Python version, with a super-simple harness to use Gruber's test page:

I'm not entirely sure I've translated the original with 100% fidelity, but this has worked fine for my purposes.  I'm open to tweaks or suggestions, and will keep the Gist updated.
1 response
FYI Uche, I've run into text that gives this RE some trouble; it takes minutes or hours return anything. For example ''. I doubt that will survive the comment formatter, but the key seems to be the open parentheses followed by a character entity, separated by and followed by character data. Alas, I don't know enough about REs to know how to debug it, but thought I'd at least report it here.