Venus and a single cache directory?

Sam Ruby rubys at intertwingly.net
Wed Oct 4 10:07:51 EST 2006


David L. Sifry wrote:
> Hey, love Venus (and Planet!) I've been playing around a lot with it.  I 
> was wondering if there were any plans to allow for a single cache 
> directory that a number of venous (or planet) installations could share 
> - It sure would be a waste of bandwidth to have a few installations that 
> shared the same set of base RSS feeds, for example, but had a few that 
> were unique.
> 
> Any plans? Or could someone point me to the right place in the code that 
> would need to be extended to allow for this behavior? I can go brush off 
> my python skills and take a few hacks at it. :-)

Venus allows you to invoke 'spider' and 'splice' operations separately.

     http://intertwingly.net/code/venus/docs/venus.svg

So... if you had (for example) three config.inis which specified three 
(possibly overlapping) sets of feeds, but specified the same 
cache_directory, and you defined a fourth config.ini which contained the 
union of feeds and specified the same cache_directory, you could 
"spider" the latter (fetching the data), and then serially "splice" 
using the original definitions (producing the output).

  - - -

To make things a bit easier to manage, one thing you can do today is to 
split out the subscriptions into a separate file, and subscribe to that 
file, specifying a content_type.  Content types supported today are 
"opml" and "foaf".  A working example can be found here:

     http://intertwingly.net/code/venus/examples/opml-top100.ini

And some of the outputs produced:

     http://planet.intertwingly.net/top100/
     http://planet.intertwingly.net/top100/mobile.html

The point being that it might be easier to maintain if the subscriptions 
lists are kept separate from the rest of the configuration.  Each OPML 
or FOAF file could be referenced twice, once from a "splice" 
configuration, and once from the common "spider" configuration.

Note: if there was interest, additional formats (like XOXO) could easily 
be supported, leading to...

  - - -

One format that would be trivial to support would be the same config.ini 
format that the rest of planet uses.  This means that the "spider" 
configuration would reduce to a series of sections that merely listed 
the other configurations as input.  If this is of interest, let me know.

Notes:

  * While subscriptions referenced by config.ini files are normally URLs,
    they can also simply be relative or absolute file paths.

  * If you reference mulitiple subscription lists, it is OK for
    individual subscriptions to be appear multiple times; what you will
    get is a proper union (i.e., the feed will only be fetched one time).

  * What you put in the various config.inis is actually up to you.  I've
    described a use case where you put subscriptions in there, but you
    actually could factor out the common [planet] definitions (like the
    cache_directory, for example) into a separate config.ini

What I like about all this is that you can start simple (like with 
"classic" planet) and put everything in one file, but as your needs 
grow, you can re-arrange to your hearts content.

- Sam Ruby


More information about the devel mailing list