feedparser bug in < > tag handling

Daniel Drake dsd at gentoo.org
Thu May 4 02:03:35 EST 2006


Hi,

I recently added this weblog to Planet Gentoo:

http://my.opera.com/taviso/xml/rss/blog/

The layout got messed up, and every post following the one in the above 
feed was italicized.

Instead of writing literal html tags into the RSS (e.g. <i>), this 
provider uses &lt;i&gt; instead (not sure if thats relevant, but it 
seems unusual).

feedparser got a little confused by this and changed this:

	&lt;/b&gt;&lt;br/&gt;&lt;br/&gt;&lt;i&gt;terse&lt;/i&gt;

into:

	</b><br />><br /></b><br />><br /><i>terse<i>

Marien Zwart spent some time looking into this, and produced this patch, 
which solves the issue:

http://dev.gentoo.org/~marienz/feedparser.diff

He says that the old regex tried to turn <br/> into <br></br>.

I checked feedparser CVS, and they have switched to a different regex 
entirely. Can we either merge their recent changes, or apply Marien's patch?

Thanks,
Daniel


More information about the devel mailing list