Dealing with RSS dats vs. Filesystem dates in the cache?

Sam Ruby rubys at intertwingly.net
Thu Oct 5 07:10:48 EST 2006


David L. Sifry wrote:
> Perhaps I'm misunderstanding the code, but it seems that one of the 
> strange bugs I'm seeing is planet-based pages that include items that 
> are not in date-of-publication order. Near as I can tell, this often 
> happens when either (a) building a new planet when not all feeds have 
> been indexed in time order, and (b) when a feed is just added to a 
> preexisting planet.
> 
> I'm working on providing you guys with replicable details, but wanted to 
> post an anecdotal description the problem here first.

Pretty close.  :-)

Feeds without entry level dates (generally pubDates in RSS 2.0, dc:dates 
in RSS 1.0, or updated dates in Atom 1.0) are problematic.  Planet 2.0 
provides an option to throw away all but the last 'n' entries in such 
circumstances.  Venus doesn't have that option yet.  Instead every such 
entry gets 'now'.

Instead of throwing away entries, I'm inclined to impose a "assumed 
frequency" on such feeds.  No dates?  The first one gets now, the second 
gets yesterday, the third gets the day before that...

Meanwhile, this does work itself out as such entries 'age'.

> Note this is using a shared cache directory, many thanks to Sam for his 
> fixes this morning...

Cool!  Ultimately, some sort of database backing and/or indexing will be 
necessary (I'm concerned that you are planning a 'technorati scale' 
deployment of Venus)

> Dave

- Sam Ruby



More information about the devel mailing list