feedparser bug in < > tag handling
Daniel Drake
dsd at gentoo.org
Thu May 4 02:03:35 EST 2006
Hi,
I recently added this weblog to Planet Gentoo:
http://my.opera.com/taviso/xml/rss/blog/
The layout got messed up, and every post following the one in the above
feed was italicized.
Instead of writing literal html tags into the RSS (e.g. <i>), this
provider uses <i> instead (not sure if thats relevant, but it
seems unusual).
feedparser got a little confused by this and changed this:
</b><br/><br/><i>terse</i>
into:
</b><br />><br /></b><br />><br /><i>terse<i>
Marien Zwart spent some time looking into this, and produced this patch,
which solves the issue:
http://dev.gentoo.org/~marienz/feedparser.diff
He says that the old regex tried to turn <br/> into <br></br>.
I checked feedparser CVS, and they have switched to a different regex
entirely. Can we either merge their recent changes, or apply Marien's patch?
Thanks,
Daniel
More information about the devel
mailing list