Venus and a single cache directory?

Sam Ruby rubys at intertwingly.net
Wed Oct 4 12:09:22 EST 2006


David L. Sifry wrote:
> Sam,
> 
> Thanks for the response.  I've got a few questions inline:
> 
> Sam Ruby wrote:
>> David L. Sifry wrote:
>>> Hey, love Venus (and Planet!) I've been playing around a lot with 
>>> it.  I was wondering if there were any plans to allow for a single 
>>> cache directory that a number of venous (or planet) installations 
>>> could share - It sure would be a waste of bandwidth to have a few 
>>> installations that shared the same set of base RSS feeds, for 
>>> example, but had a few that were unique.
>>>
>>> Any plans? Or could someone point me to the right place in the code 
>>> that would need to be extended to allow for this behavior? I can go 
>>> brush off my python skills and take a few hacks at it. :-)
>>
>> Venus allows you to invoke 'spider' and 'splice' operations separately.
>>
>>     http://intertwingly.net/code/venus/docs/venus.svg
>>
>> So... if you had (for example) three config.inis which specified three 
>> (possibly overlapping) sets of feeds, but specified the same 
>> cache_directory, and you defined a fourth config.ini which contained 
>> the union of feeds and specified the same cache_directory, you could 
>> "spider" the latter (fetching the data), and then serially "splice" 
>> using the original definitions (producing the output).
>>
> OK, I am already using the OPML capabilities for each config.ini, but if 
> I use the same cache directory, the results look very strange, showing 
> feeds in one planet that aren't listed in the OPML of its config.ini.  
> Is there any additional information that is kept in the cache directory 
> to preserve state? I'm using the latest code from 
> http://intertwingly.net/code/venus/.

Looking at the code, I can't explain this behavior.  Would it be 
possible for you to share with me the config.ini file(s) you are working 
with?  Either to this list, or directly to me would be fine...

>>  - - -
>>
>> To make things a bit easier to manage, one thing you can do today is 
>> to split out the subscriptions into a separate file, and subscribe to 
>> that file, specifying a content_type.  Content types supported today 
>> are "opml" and "foaf".  A working example can be found here:
>>
>>     http://intertwingly.net/code/venus/examples/opml-top100.ini
>>
>> And some of the outputs produced:
>>
>>     http://planet.intertwingly.net/top100/
>>     http://planet.intertwingly.net/top100/mobile.html
>>
>> The point being that it might be easier to maintain if the 
>> subscriptions lists are kept separate from the rest of the 
>> configuration.  Each OPML or FOAF file could be referenced twice, once 
>> from a "splice" configuration, and once from the common "spider" 
>> configuration.
> Could you show me what a "spider" configuration would look like, and 
> what a "splice" configuration would look like? Are these different 
> config options in the config.ini files?

In your existing config.ini files, you have pairs of lines that look 
like the following:

[http://example.com/file1.opml]
content_type=opml

If you create one additional file that collects up all of these pairs, 
and adds a single [Planet] section containing the definition for 
cache_directory and optionally things like log_level, you can use this 
as additional config file as input to spider.py.

So.... if you have three existing config.ini files, and define an 
additional configAll.ini file as described above, the script your cron 
job runs could look like the following:

python spider.py configAll.ini
python splice.py config1.ini
python splice.py config2.ini
python splice.py config3.ini

>> Note: if there was interest, additional formats (like XOXO) could 
>> easily be supported, leading to...
>>
>>  - - -
>>
>> One format that would be trivial to support would be the same 
>> config.ini format that the rest of planet uses.  This means that the 
>> "spider" configuration would reduce to a series of sections that 
>> merely listed the other configurations as input.  If this is of 
>> interest, let me know.
>>
>> Notes:
>>
>>  * While subscriptions referenced by config.ini files are normally URLs,
>>    they can also simply be relative or absolute file paths.
>>
>>  * If you reference mulitiple subscription lists, it is OK for
>>    individual subscriptions to be appear multiple times; what you will
>>    get is a proper union (i.e., the feed will only be fetched one time).
>>
>>  * What you put in the various config.inis is actually up to you.  I've
>>    described a use case where you put subscriptions in there, but you
>>    actually could factor out the common [planet] definitions (like the
>>    cache_directory, for example) into a separate config.ini
>>
>> What I like about all this is that you can start simple (like with 
>> "classic" planet) and put everything in one file, but as your needs 
>> grow, you can re-arrange to your hearts content.
>>
> I love this philosophy, but maybe I just need a bit more handholding - 
> or if you could show me where in the code these options are defined, 
> I'll go experiment...
>> - Sam Ruby
> Thanks again!
> 
> Dave

- Sam Ruby


More information about the devel mailing list