<img> Problems

Baz brian.ewins at gmail.com
Wed Feb 22 10:57:58 EST 2006


I tried reading your feed with the nightly planet, no problems. I also
took a look at feedparser to see if it had ever had a problem like the
one I mentioned; it seems it may have done, back in early 2004 -
http://diveintomark.org/projects/feed_parser/version_271.html (thats
before it reached sf.net's cvs, but older versions than that floating
about the net

So: planet doesn't mangle your feed on my machine, but does on planet
glug - is it possible that somehow an *extremely* old copy of
feedparser has got in there? Or is just an extremely old copy of
planet? (the arch revisions go back almost as far as this bug)

Here's what you get with a real old feedparser I found at
http://svn.nuxeo.org/trac/pub/file/CPSRSS/trunk/feedparser.py?rev=13190:
> python
>>> import feedparser
>>> d = feedparser.parse("http://beerandspeech.org/index.php?/feeds/index.rss2")
>>> d['items'][1]
{'date': 'Mon, 20 Feb 2006 22:00:00 +0000', 'guid':
'http://beerandspeech.org/index.php?/archives/24-guid.html',
'content_encoded': '\n    Every now and then you see a piece of
technology which leaves you breathless - <a
href="http://www.youtube.com/watch?v=zp-y3ZNaCqs" >this</a> is mine
for this year so far.<br />\r\n<br />\r\n<a
href=&#039;http://www.youtube.com/watch?v=zp-y3ZNaCqs&#039;><img
width=&#039;455&#039; height=&#039;374&#039; border=&#039;0&#039;
hspace=&#039;5&#039;
src=&#039;http://beerandspeech.org/uploads/200602/touchscreen.jpg&#039;
alt=&#039;&#039; /></a><br />\r\n<br />\r\n  \n    ', 'link':
'http://beerandspeech.org/index.php?/archives/24-I-found-this-amazing....html',
'title': 'I found this amazing...'}

Notice how the double quotes in there - which were also encoded - have
been decoded, but the numeric entities havent. Looks suspicious to me.


On 2/21/06, Baz <brian.ewins at gmail.com> wrote:
> On 2/21/06, Chris Dolan <chris at chrisdolan.net> wrote:
> > It looks like your html attributes are being quoted twice.  I believe
> > the problem is that you are incorrectly using single quotes for
> > attributes on your blog.  Most browsers allow that, but it's not
> > right.
>
> Really? It says exactly the opposite in the HTML and XML specs:
> http://www.w3.org/TR/REC-xml/#NT-AttValue (EBNF allows both)
> http://www.w3.org/TR/REC-html40/intro/sgmltut.html#h-3.2.2
> "...SGML requires that all attribute values be delimited using either
> double quotation marks (ASCII decimal 34) or single quotation marks
> (ASCII decimal 39). "
>
> I'm more inclined to think its the feed reader misinterpreting &#039;
> - not recognizing this as being the same as &apos;.
>


More information about the devel mailing list