[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [curn-users] issue with IgnoreDuplicateTitles plug-in
> Are you expecting curn to remove duplicates as part of the SaveAs
> processing?
> Is that what you're trying to do?
Well!... that's what I was looking for Brian. Thanks :) This alpha version
is looking good to me.
But I need a clarification. In your previous mail (yesterday), you said curn
was able to identify duplicate articles. If we do not use "SaveAs", how are
we going to see any output. (Forgive me if this is a silly question)
Thanks again for all the help! I will bug you again ;)
- Bharath
-----Original Message-----
From: Brian Clapper [mailto:bmc@xxxxxxxxxxx]
Sent: Monday, August 06, 2007 1:41 PM
To: Bharath Prathipati
Cc: curn-users@xxxxxxxxxxx
Subject: Re: [curn-users] issue with IgnoreDuplicateTitles plug-in
On 8/6/07 2:53 AM, Bharath Prathipati wrote:
> I was testing the duplicate article issue on my server, and not on
> Slashdot feeds. I should have made it clear beforehand. And I'm
> attaching the output file created from my rss feed using curn, which has
> duplicate articles. If you search for "cure for oil" you'll find 2
> articles.
Okay, I think I may understand what's happening here.
Are you expecting curn to remove duplicates as part of the SaveAs
processing? If so, that's not going to work. SaveAs saves the raw XML,
downloaded from the remote site, BEFORE the file is parsed. curn does no
processing of the XML feed before saving it to the file specified by
SaveAs, so if there are duplicates in the feed, there will be duplicates in
the file. (That's what I meant by the term "raw" in the description of the
SaveAs parameter.)
Is that what you're trying to do?
The IgnoreDuplicateTitles plug-in operates on the parsed feed data, which
means it logically runs AFTER the SaveAs plug-in. Thus, by the time the
IgnoreDuplicateTitles plug-in runs, SaveAs has already done its work.
In curn 3.2 (which is not yet released, but see below), there's a new
SaveAsRSS parameter. That parameter instructs curn to save the PARSED
(i.e., not raw) feed data to an RSS feed format. Since the new SaveAsRSS
plug-in runs AFTER the data is parsed, it can (and does) honor what the
IgnoreDuplicateTitles plug-in does.
Even though curn 3.2 isn't released, you can play with an alpha release.
It's located here:
http://www.clapper.org/software/java/curn/download/tmp/install-curn-3.2-alph
a-9.jar
The SaveAsRSS parameter is described in the User's Guide in there, but
here's a brief run-down. It's honored with a Feed section.
----------
SaveAsRSS: If set, this parameter specifies that the parsed feed data
should be rewritten in the specified RSS format and saved to
the specified file. This configuration item takes a command
line-style value:
[--backups total_backups] [--type rsstype] [--encoding enc] path
or
[-b total_backsup] [-t rsstype] [-e enc] path
where:
- <total_backups> specifies how many backups (i.e., previous
versions) of the generated RSS file to keep. For instance, a
value of 5 means "keep 5 previous versions of the file, plus
the one from the current run." This is the best way to keep
RSS files from previous curn runs. The backup files have
version numbers preceding their extensions. For instance, if
the output file is foo.xml, and total_backups is 2, curn
will keep foo.0.xml and foo.1.xml. The file with the largest
version number is the oldest one. If not specified, this
parameter defaults to 0, which means "no backups".
- <rsstype> is the type of RSS output to generate. Currently,
"rss1", "rss2" and "atom" are the supported values.
- <encoding> is optional and specifies the desired encoding of
the file. It defaults to "utf-8".
- <path> is the path to the file where the RSS output should
be written
Note that only the new data in the feed is converted to RSS.
EXAMPLES:
SaveAsRSS: -b 1 -t rss2 -e Cp1252
${system:user.home}/feed-rss2.xml
SaveAsRSS: -t atom -e UTF8 ${system:user.home}/feed-atom.xml
OPTIONAL. Default: none
----------
Regards,
-Brian
Brian Clapper, http://www.clapper.org/bmc/
Why is it that there are so many more horses' asses than there are horses?
-- G. Gordon Liddy
---
*** Posted to the curn-users mailing list (curn-users@xxxxxxxxxxx).
Back to curn-users archive.