Dealing with RSS dats vs. Filesystem dates in the cache?
David L. Sifry
dsifry at technorati.com
Thu Oct 5 07:53:37 EST 2006
Hrm,
That's not what I'm seeing. I'm seeing items from valid RSS/Atom feeds
with valid entry level pubDates. Here's one:
http://bayosphere.com/blog/dan_gillmor/feed
This seems to occur only when using a shared cache that has irregularly
updated feeds. I'm still investigating.
Dave
Sam Ruby wrote:
> David L. Sifry wrote:
>> Perhaps I'm misunderstanding the code, but it seems that one of the
>> strange bugs I'm seeing is planet-based pages that include items that
>> are not in date-of-publication order. Near as I can tell, this often
>> happens when either (a) building a new planet when not all feeds have
>> been indexed in time order, and (b) when a feed is just added to a
>> preexisting planet.
>>
>> I'm working on providing you guys with replicable details, but wanted
>> to post an anecdotal description the problem here first.
>
> Pretty close. :-)
>
> Feeds without entry level dates (generally pubDates in RSS 2.0,
> dc:dates in RSS 1.0, or updated dates in Atom 1.0) are problematic.
> Planet 2.0 provides an option to throw away all but the last 'n'
> entries in such circumstances. Venus doesn't have that option yet.
> Instead every such entry gets 'now'.
>
> Instead of throwing away entries, I'm inclined to impose a "assumed
> frequency" on such feeds. No dates? The first one gets now, the
> second gets yesterday, the third gets the day before that...
>
> Meanwhile, this does work itself out as such entries 'age'.
>
>> Note this is using a shared cache directory, many thanks to Sam for
>> his fixes this morning...
>
> Cool! Ultimately, some sort of database backing and/or indexing will
> be necessary (I'm concerned that you are planning a 'technorati scale'
> deployment of Venus)
>
>> Dave
>
> - Sam Ruby
--
David L. Sifry
Founder and CEO, Technorati, Inc.
dsifry at technorati.com
415 846-0232 (Mobile)
More information about the devel
mailing list