[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [curn-users] Translating character encodings.



Hi Brian,

Thanks very much for your help. Here's my config file:

[curn]
CacheFile:${system:user.home}/k2/contentfetch/curn/data/store/curn.cache
AllowEmbeddedHTML:false
CommonXMLFixups:true
DaysToCache:60
GzipDownload:true
MaxSummarySize:65535
MaxThreads:4
ReplaceEmptySummaryWith:title
ShowDates:true

[Feed.Publico.Geral]
TitleOverride:Publico,Geral
URL:http://www.publico.clix.pt/rss.asp?idCanal=10
SaveAs:${system:user.home}/k2/contentfetch/curn/data/store/ publico.geral.xml
SaveAsEncoding:utf-8

[Feed.DN]
TitleOverride:Diario de Noticias
URL:http://rss.sapo.pt/dn/
SaveAs:${system:user.home}/k2/contentfetch/curn/data/store/dn.xml
SaveAsEncoding:utf-8

[OutputHandler]
Class:org.clapper.curn.output.freemarker.FreeMarkerOutputHandler
Disabled:false
TemplateFile:file ${system:user.home}/k2/contentfetch/curn/data/etc/ rssout.ftl SaveAs:${system:user.home}/k2/contentfetch/curn/data/store/ forindexing.xml
SaveAsEncoding:utf-8
MimeType:application/xml

You will notice that, the "Publico.Geral" feed is an ISO-8859-1 feed, while the DN feed is UTF-8. I have set SaveAsEncoding on both the feed config and the OutputHandler (a FreeMarker template), yet what I get is:

* publico.geral.xml seems to be written in the original encoding,
* dn.xml seems to be written in the original encoding,
* forindexing.xml claims to be UTF-8 ('$ file forindexing.xml') but it has ISO-8859-1 characters (or what seem to be) in the actual file.

Any help will be very much appreciated.

Regards.

--Nuno


On 26 Jun 2007, at 04:25, Brian Clapper wrote:

On 06/25/07 21:43, Nuno Leitao wrote:
Hi,
I've been scratching my head trying to find a way to translate RSS character encodings withing curn without having to write my own plugin. Basically, when getting an RSS feed in, say, iso-xxxx- xx, writing it to utf-8 instead. Basically the application which will process these feeds only understands utf-8 and some feeds might just use a different character set. I know about the SaveAsEncoding and Encoding conf options, but these don't actually seem to actually perform character translation.

Really? The SaveAsEncoding option is designed for exactly that purpose. Can you send me your config file (and point out the offending feeds) so I can test it here?
--
-Brian

Brian Clapper, http://www.clapper.org/bmc/

---
*** Posted to the curn-users mailing list (curn-users@xxxxxxxxxxx).



 Back to curn-users archive.