Dealing with RSS dats vs. Filesystem dates in the cache?
Sam Ruby
rubys at intertwingly.net
Thu Oct 5 07:10:48 EST 2006
David L. Sifry wrote:
> Perhaps I'm misunderstanding the code, but it seems that one of the
> strange bugs I'm seeing is planet-based pages that include items that
> are not in date-of-publication order. Near as I can tell, this often
> happens when either (a) building a new planet when not all feeds have
> been indexed in time order, and (b) when a feed is just added to a
> preexisting planet.
>
> I'm working on providing you guys with replicable details, but wanted to
> post an anecdotal description the problem here first.
Pretty close. :-)
Feeds without entry level dates (generally pubDates in RSS 2.0, dc:dates
in RSS 1.0, or updated dates in Atom 1.0) are problematic. Planet 2.0
provides an option to throw away all but the last 'n' entries in such
circumstances. Venus doesn't have that option yet. Instead every such
entry gets 'now'.
Instead of throwing away entries, I'm inclined to impose a "assumed
frequency" on such feeds. No dates? The first one gets now, the second
gets yesterday, the third gets the day before that...
Meanwhile, this does work itself out as such entries 'age'.
> Note this is using a shared cache directory, many thanks to Sam for his
> fixes this morning...
Cool! Ultimately, some sort of database backing and/or indexing will be
necessary (I'm concerned that you are planning a 'technorati scale'
deployment of Venus)
> Dave
- Sam Ruby
More information about the devel
mailing list