feedparser bug in < > tag handling

Daniel Drake dsd at gentoo.org
Thu May 4 02:03:35 EST 2006


I recently added this weblog to Planet Gentoo:


The layout got messed up, and every post following the one in the above 
feed was italicized.

Instead of writing literal html tags into the RSS (e.g. <i>), this 
provider uses &lt;i&gt; instead (not sure if thats relevant, but it 
seems unusual).

feedparser got a little confused by this and changed this:



	</b><br />><br /></b><br />><br /><i>terse<i>

Marien Zwart spent some time looking into this, and produced this patch, 
which solves the issue:


He says that the old regex tried to turn <br/> into <br></br>.

I checked feedparser CVS, and they have switched to a different regex 
entirely. Can we either merge their recent changes, or apply Marien's patch?


