Dealing with RSS dats vs. Filesystem dates in the cache?

David L. Sifry dsifry at technorati.com
Thu Oct 5 07:53:37 EST 2006


Hrm,

That's not what I'm seeing. I'm seeing items from valid RSS/Atom feeds 
with valid entry level pubDates. Here's one:

http://bayosphere.com/blog/dan_gillmor/feed

This seems to occur only when using a shared cache that has irregularly 
updated feeds.  I'm still investigating.

Dave

Sam Ruby wrote:
> David L. Sifry wrote:
>> Perhaps I'm misunderstanding the code, but it seems that one of the 
>> strange bugs I'm seeing is planet-based pages that include items that 
>> are not in date-of-publication order. Near as I can tell, this often 
>> happens when either (a) building a new planet when not all feeds have 
>> been indexed in time order, and (b) when a feed is just added to a 
>> preexisting planet.
>>
>> I'm working on providing you guys with replicable details, but wanted 
>> to post an anecdotal description the problem here first.
>
> Pretty close.  :-)
>
> Feeds without entry level dates (generally pubDates in RSS 2.0, 
> dc:dates in RSS 1.0, or updated dates in Atom 1.0) are problematic.  
> Planet 2.0 provides an option to throw away all but the last 'n' 
> entries in such circumstances.  Venus doesn't have that option yet.  
> Instead every such entry gets 'now'.
>
> Instead of throwing away entries, I'm inclined to impose a "assumed 
> frequency" on such feeds.  No dates?  The first one gets now, the 
> second gets yesterday, the third gets the day before that...
>
> Meanwhile, this does work itself out as such entries 'age'.
>
>> Note this is using a shared cache directory, many thanks to Sam for 
>> his fixes this morning...
>
> Cool!  Ultimately, some sort of database backing and/or indexing will 
> be necessary (I'm concerned that you are planning a 'technorati scale' 
> deployment of Venus)
>
>> Dave
>
> - Sam Ruby

-- 
David L. Sifry
Founder and CEO, Technorati, Inc.
dsifry at technorati.com
415 846-0232 (Mobile)



More information about the devel mailing list