Too long a filename...
Amit Chakradeo (अमित चक्रदेव)
chakradeo+planet at gmail.com
Fri Oct 6 11:50:29 EST 2006
On 10/5/06, Sam Ruby <rubys at intertwingly.net> wrote:
>
> === modified file 'planet/spider.py'
> --- planet/spider.py
> +++ planet/spider.py
> @@ -34,6 +34,16 @@
> filename = re_initial_cruft.sub("", filename)
> filename = re_final_cruft.sub("", filename)
<stuff deleted>
That patch works great. I am still trying to understand how the cache
is organized, but will it be okay if I just replace the whole filename
with md5 hash of the long name without regards to commas ? (it seems
like they are used to seperate feed and article id ?)
> > Is there a general way to skip something in a feed ? I looked at scrub
> > function which looks at ignore_in_feed config options, but that probably
> > is for skipping some field within a post ??? Maybe we can just add a
> > regexp for a particular feed in the config which when matched to an
> > entry will make the spider to skip that entry ?
>
> That's exactly what filters are designed for. Filters are arbitrary
> programs written in the programming language of your choice. Each time
> they are invoked, they are passed a single entry which has been
> sanitized and normalized to UTF-8, XHTML, and Atom 1.0.
>
> Normally, filters copy stdin to stdout. Clearly they can modify the
> data in transit. More importantly to you, if zero bytes are output, the
> entry is ignored.
>
> Filters can be defined at the [planet] level, or at the individual
> [feed] level in the configuration file.
Wow! That worked great too. Somehow I missed this feature. The sample
filters helped a great deal!
BTW how do I check out the venus branch ? I am still new to bzr and
couldn't find the answer. and
bzr get http://intertwingly.net/code/planet
didn't seemingly provide the same files as those in the tgz file...
BTW is there any interface planned to update the config file using the
web ? I looked at planetplus, but that code doesn't work with latest
cherrypy and sqlobject... I will see if I can patch together something
similar for this and share it if there is an interest...
Thanks a lot!
Amit
More information about the devel
mailing list