Too long a filename...

Amit Chakradeo (अमित चक्रदेव) chakradeo+planet at gmail.com
Fri Oct 6 11:50:29 EST 2006


On 10/5/06, Sam Ruby <rubys at intertwingly.net> wrote:
>
>    === modified file 'planet/spider.py'
>    --- planet/spider.py
>    +++ planet/spider.py
>    @@ -34,6 +34,16 @@
>         filename = re_initial_cruft.sub("", filename)
>         filename = re_final_cruft.sub("", filename)

 <stuff deleted>

That patch works great. I am still trying to understand how the cache
is organized, but will it be okay if I just replace the whole filename
with md5 hash of the long name without regards to commas ? (it seems
like they are used to seperate feed and article id ?)

> > Is there a general way to skip something in a feed ? I looked at scrub
> > function which looks at ignore_in_feed config options, but that probably
> > is for skipping some field within a post ??? Maybe we can just add a
> > regexp for a particular feed in the config which when matched to an
> > entry will make the spider to skip that entry ?
>
> That's exactly what filters are designed for.  Filters are arbitrary
> programs written in the programming language of your choice.  Each time
> they are invoked, they are passed a single entry which has been
> sanitized and normalized to UTF-8, XHTML, and Atom 1.0.
>
> Normally, filters copy stdin to stdout.  Clearly they can modify the
> data in transit.  More importantly to you, if zero bytes are output, the
> entry is ignored.
>
> Filters can be defined at the [planet] level, or at the individual
> [feed] level in the configuration file.

Wow! That worked great too. Somehow I missed this feature. The sample
filters helped a great deal!

BTW how do I check out the venus  branch ? I am still new to bzr and
couldn't find the answer. and
bzr get http://intertwingly.net/code/planet
didn't seemingly provide the same files as those in the tgz file...


BTW is there any interface planned to update the config file using the
web ? I looked at planetplus, but that code doesn't work with latest
cherrypy and sqlobject... I will see if I can patch together something
similar for this and share it if there is an interest...

Thanks a lot!
Amit


More information about the devel mailing list