Cache filename error with | "pipe" characters

Harry Fuecks hfuecks at gmail.com
Mon Oct 16 19:05:48 EST 2006


Seeing the following error in log, specific to this single field;

ERROR:planet.runner:Error processing http://feeds.feedburner.com/randomfoo
ERROR:planet.runner:IOError: [Errno 2] No such file or directory:
'D:\\www\\planet\\cache\\ew.com,ew,report,0,6115,1545453_1|114184||0_0_,00.html'
ERROR:planet.runner:  File "D:\py\venus\planet\spider.py", line 286,
in spiderPlanet
    spiderFeed(feed)
ERROR:planet.runner:  File "D:\py\venus\planet\spider.py", line 235,
in spiderFeed
    write(output, cache_file)
ERROR:planet.runner:  File "D:\py\venus\planet\spider.py", line 51, in write
    file = open(out,'w')

The feed contains the following;

<link>http://www.ew.com/ew/report/0,6115,1545453_1|114184||0_0_,00.html</link>

This looks like a problem with the cache filename: the | characters
(\x7C) in the name - may be specific to using NTFS.

Was able to fix it by changing line 14 in spider.py from;

re_slash         = re.compile(r'[?/:]+')

to;

re_slash         = re.compile(r'[?/:|]+')

BTW, following from here (Win32 file locking issue):
http://lists.planetplanet.org/archives/devel/2006-September/001072.html
- problem solved - all unit tests now pass - many thanks


More information about the devel mailing list