http://planete.websemantique.org/ and user defined content filtering

Thu Oct 12 20:08:35 EST 2006

Hi,

We have created a new planet for semantic web oriented blogs in French:
http://planete.websemantique.org/

Right now, this is just a plain vanilla installation of the venus
flavor. The install has been really trouble free and I'd like to thank
you for the quality of this software.

Our users have two requests that involve filtering the blog entries that
appear on the planet.

The first one is generic to most of the planet sites: most of the blogs
are not focused on a single topic and planets have a lot of entries that
are irrelevant to the planet main topic.

I personally find that this is a feature more than a bug since it's nice
to have a broad vision of what blogers write outside the scope of the
planet topic but I can also understand that it would be useful to give
users the ability to get a filtered view of the planet (I am not so
keen ).

The other one is more specific to multilingual planets. Although the
planet is primarily targeted to French speaking visitors, most of the
blogs that we federate contain both French and English posts. Again, I
personally prefer to see entries in both languages but can also
understand that some users might prefer to see only posts in a single
language.

The main issue is that we'd like to add these features on existing
feeds. These feeds do not always include subject or categories that can
be used for topic filtering and none of them include any specific
information about the language.

My idea would be to detect that an item is relevant to the planet main
topic by checking a number of keywords (for the semantic web, this seems
quite doable) but other algorithm could be used including Bayesian
filters like those used by anti spam systems but this would require a
phase of training.

For the language detection, I would try to find an open source system to
do that. Another option would be to check spell against different
languages (in our case there are only two) and take the language for
which the fewer errors have been detected.

I have taken a look at the filter mechanism and all that seems to be
pretty easy to implement (BTW, I am wondering if there is already a
generic XSLT filter). The only things which is still somewhat mysterious
to me is the options mechanism but I can probably find out by myself how
it works...

The difference between these filters and the other similar content
filters that I have found in the list archives would be that these one
would not remove entries but add new metadata based on their findings.
These metadata could then be copied into the XHTML pages and a piece of
JavaScript would hide entries based on user preferences. 

This seems quite simple and obvious to implement but I am wondering if
this has already been done and, if not, if you have any advise for me. 

Thanks,

Eric

-- 
GPG-PGP: 2A528005
Freelance consulting and training.
                                            http://dyomedea.com/english/
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(ISO) RELAX NG   ISBN:0-596-00421-4 http://oreilly.com/catalog/relax
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: Ceci est une partie de message
	=?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?=
Url : /archives/devel/attachments/20061012/cd1285bd/attachment.pgp