[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [curn-users] Translating character encodings.
On 6/26/07 5:56 AM, Nuno Leitao wrote:
> Hi Brian,
>
> Thanks very much for your help. Here's my config file:
>
> [curn]
> CacheFile:${system:user.home}/k2/contentfetch/curn/data/store/curn.cache
> AllowEmbeddedHTML:false
> CommonXMLFixups:true
> DaysToCache:60
> GzipDownload:true
> MaxSummarySize:65535
> MaxThreads:4
> ReplaceEmptySummaryWith:title
> ShowDates:true
>
> [Feed.Publico.Geral]
> TitleOverride:Publico,Geral
> URL:http://www.publico.clix.pt/rss.asp?idCanal=10
> SaveAs:${system:user.home}/k2/contentfetch/curn/data/store/publico.geral.xml
>
> SaveAsEncoding:utf-8
>
> [Feed.DN]
> TitleOverride:Diario de Noticias
> URL:http://rss.sapo.pt/dn/
> SaveAs:${system:user.home}/k2/contentfetch/curn/data/store/dn.xml
> SaveAsEncoding:utf-8
>
> [OutputHandler]
> Class:org.clapper.curn.output.freemarker.FreeMarkerOutputHandler
> Disabled:false
> TemplateFile:file
> ${system:user.home}/k2/contentfetch/curn/data/etc/rssout.ftl
> SaveAs:${system:user.home}/k2/contentfetch/curn/data/store/forindexing.xml
> SaveAsEncoding:utf-8
> MimeType:application/xml
>
> You will notice that, the "Publico.Geral" feed is an ISO-8859-1 feed,
> while the DN feed is UTF-8. I have set SaveAsEncoding on both the feed
> config and the OutputHandler (a FreeMarker template), yet what I get is:
>
> * publico.geral.xml seems to be written in the original encoding,
> * dn.xml seems to be written in the original encoding,
> * forindexing.xml claims to be UTF-8 ('$ file forindexing.xml') but it
> has ISO-8859-1 characters (or what seem to be) in the actual file.
>
> Any help will be very much appreciated.
Nuno,
I didn't get to this yesterday. Perhaps tonight. And I do want to know
what's going on. However, this might also interest you. In the next version
of curn, I have a per-feed plug-in that can save the new contents of a feed
in any RSS format. It's deliberately implemented as a plug-in, rather than
an output handler, because output handlers consolidate the new items from
ALL feeds into one output file. A per-feed plug-in, by contrast, can save a
file of new items per feed.
I can supply you with an alpha version of that release if you want to play
with that feature.
In the meantime, I'll try to figure out why your encodings seem to be off.
If there's a bug in there, it ought to be fixed regardless.
--
-Brian
Brian Clapper, http://www.clapper.org/bmc/
If I had a hammer, I'd use it on Peter, Paul and Mary.
-- Howard Rosenberg
---
*** Posted to the curn-users mailing list (curn-users@xxxxxxxxxxx).
Back to curn-users archive.