[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [curn-users] Translating character encodings.
Hi Brian,
Thanks very much for your help. Here's my config file:
[curn]
CacheFile:${system:user.home}/k2/contentfetch/curn/data/store/curn.cache
AllowEmbeddedHTML:false
CommonXMLFixups:true
DaysToCache:60
GzipDownload:true
MaxSummarySize:65535
MaxThreads:4
ReplaceEmptySummaryWith:title
ShowDates:true
[Feed.Publico.Geral]
TitleOverride:Publico,Geral
URL:http://www.publico.clix.pt/rss.asp?idCanal=10
SaveAs:${system:user.home}/k2/contentfetch/curn/data/store/
publico.geral.xml
SaveAsEncoding:utf-8
[Feed.DN]
TitleOverride:Diario de Noticias
URL:http://rss.sapo.pt/dn/
SaveAs:${system:user.home}/k2/contentfetch/curn/data/store/dn.xml
SaveAsEncoding:utf-8
[OutputHandler]
Class:org.clapper.curn.output.freemarker.FreeMarkerOutputHandler
Disabled:false
TemplateFile:file ${system:user.home}/k2/contentfetch/curn/data/etc/
rssout.ftl
SaveAs:${system:user.home}/k2/contentfetch/curn/data/store/
forindexing.xml
SaveAsEncoding:utf-8
MimeType:application/xml
You will notice that, the "Publico.Geral" feed is an ISO-8859-1 feed,
while the DN feed is UTF-8. I have set SaveAsEncoding on both the
feed config and the OutputHandler (a FreeMarker template), yet what I
get is:
* publico.geral.xml seems to be written in the original encoding,
* dn.xml seems to be written in the original encoding,
* forindexing.xml claims to be UTF-8 ('$ file forindexing.xml') but
it has ISO-8859-1 characters (or what seem to be) in the actual file.
Any help will be very much appreciated.
Regards.
--Nuno
On 26 Jun 2007, at 04:25, Brian Clapper wrote:
On 06/25/07 21:43, Nuno Leitao wrote:
Hi,
I've been scratching my head trying to find a way to translate
RSS character encodings withing curn without having to write my
own plugin. Basically, when getting an RSS feed in, say, iso-xxxx-
xx, writing it to utf-8 instead. Basically the application which
will process these feeds only understands utf-8 and some feeds
might just use a different character set.
I know about the SaveAsEncoding and Encoding conf options, but
these don't actually seem to actually perform character translation.
Really? The SaveAsEncoding option is designed for exactly that
purpose. Can you send me your config file (and point out the
offending feeds) so I can test it here?
--
-Brian
Brian Clapper, http://www.clapper.org/bmc/
---
*** Posted to the curn-users mailing list (curn-users@xxxxxxxxxxx).
Back to curn-users archive.