Cache filename error with | "pipe" characters
Harry Fuecks
hfuecks at gmail.com
Mon Oct 16 19:05:48 EST 2006
Seeing the following error in log, specific to this single field;
ERROR:planet.runner:Error processing http://feeds.feedburner.com/randomfoo
ERROR:planet.runner:IOError: [Errno 2] No such file or directory:
'D:\\www\\planet\\cache\\ew.com,ew,report,0,6115,1545453_1|114184||0_0_,00.html'
ERROR:planet.runner: File "D:\py\venus\planet\spider.py", line 286,
in spiderPlanet
spiderFeed(feed)
ERROR:planet.runner: File "D:\py\venus\planet\spider.py", line 235,
in spiderFeed
write(output, cache_file)
ERROR:planet.runner: File "D:\py\venus\planet\spider.py", line 51, in write
file = open(out,'w')
The feed contains the following;
<link>http://www.ew.com/ew/report/0,6115,1545453_1|114184||0_0_,00.html</link>
This looks like a problem with the cache filename: the | characters
(\x7C) in the name - may be specific to using NTFS.
Was able to fix it by changing line 14 in spider.py from;
re_slash = re.compile(r'[?/:]+')
to;
re_slash = re.compile(r'[?/:|]+')
BTW, following from here (Win32 file locking issue):
http://lists.planetplanet.org/archives/devel/2006-September/001072.html
- problem solved - all unit tests now pass - many thanks
More information about the devel
mailing list