Too long a filename...

Sam Ruby rubys at intertwingly.net
Fri Oct 6 12:47:59 EST 2006


Amit Chakradeo (अमित चक्रदेव) wrote:
> On 10/5/06, Sam Ruby <rubys at intertwingly.net> wrote:
>>
>>    === modified file 'planet/spider.py'
>>    --- planet/spider.py
>>    +++ planet/spider.py
>>    @@ -34,6 +34,16 @@
>>         filename = re_initial_cruft.sub("", filename)
>>         filename = re_final_cruft.sub("", filename)
> 
> <stuff deleted>
> 
> That patch works great. I am still trying to understand how the cache
> is organized, but will it be okay if I just replace the whole filename
> with md5 hash of the long name without regards to commas ? (it seems
> like they are used to seperate feed and article id ?)

That would work, but I personally find having a name I have a chance of 
recognizing helps me in debugging.

>> > Is there a general way to skip something in a feed ? I looked at scrub
>> > function which looks at ignore_in_feed config options, but that 
>> probably
>> > is for skipping some field within a post ??? Maybe we can just add a
>> > regexp for a particular feed in the config which when matched to an
>> > entry will make the spider to skip that entry ?
>>
>> That's exactly what filters are designed for.  Filters are arbitrary
>> programs written in the programming language of your choice.  Each time
>> they are invoked, they are passed a single entry which has been
>> sanitized and normalized to UTF-8, XHTML, and Atom 1.0.
>>
>> Normally, filters copy stdin to stdout.  Clearly they can modify the
>> data in transit.  More importantly to you, if zero bytes are output, the
>> entry is ignored.
>>
>> Filters can be defined at the [planet] level, or at the individual
>> [feed] level in the configuration file.
> 
> Wow! That worked great too. Somehow I missed this feature. The sample
> filters helped a great deal!

Venus does need more documentation.

> BTW how do I check out the venus  branch ? I am still new to bzr and
> couldn't find the answer. and
> bzr get http://intertwingly.net/code/planet
> didn't seemingly provide the same files as those in the tgz file...

Try bzr get http://intertwingly.net/code/venus

> BTW is there any interface planned to update the config file using the
> web ? I looked at planetplus, but that code doesn't work with latest
> cherrypy and sqlobject... I will see if I can patch together something
> similar for this and share it if there is an interest...

I'm not aware of anyone working on that.

> Thanks a lot!
> Amit

- Sam Ruby


More information about the devel mailing list