|
Thanks Brian.. that was
a prompt reply! Here's my config file: [var] # "feedDir" dumps to a directory
that's accessible internally via URL feedDir: . # curnDir: where
this file and the cache live curnDir: . [curn] CacheFile: ${var:curnDir}/common.cache MaxThreads: 15 ParserClass: org.clapper.curn.parser.rome.RSSParserAdapter GzipDownload: true [Feed_slashdot] # Slashdot URL: http://rss.slashdot.org/Slashdot/slashdot SaveAs: ${var:feedDir}/slashdot.xml IgnoreDuplicateTitles: true And the command line: (to enable logging..) set
CURN_JAVA_VM_ARGS=-Djava.util.logging.config.file=./logging.properties (to invoke curn) curn --logging -C curn.cfg and this is the "logging.properties" (in case you want to have a look
at it) log4j.rootLogger=debug, File log4j.appender.File=org.apache.log4j.FileAppender log4j.appender.File.layout=org.apache.log4j.PatternLayout log4j.appender.File.file=./log.out # Overwrite the file each time log4j.appender.File.append=false # Print the date in ISO 8601 format log4j.appender.File.layout.ConversionPattern=%d %-5p (%c{1}): %m%n log4j.logger.org.clapper.curn=debug The issue here is not to enable logging .. though J .. but to get the duplicate plugin
working. And also I have a question about IgnoreDuplicateArticlesPlugIn.java'.
I suppose this is the underlying class that's called when IgnoreDuplicateTitles
is set to true. I somehow forced articles with duplicate titles to see how they're
handled. They have exactly the same title. But curn couldn’t
suppress it. I was hoping to modify that plugin to
make it more sophisticated, if this works well. Thanks again for taking out time on a Sunday evening! - Bharath -----Original Message----- On 8/5/07 8:34 PM, Bharath Prathipati wrote: > hi there! > > I was trying out the windows version of curn and was trying this > IgnoreDuplicateTitles plugin, but could never get that to work.
And even > my attempt to enable logging was not successful. > > For the plugin.. I used “IgnoreDuplicateTitles: true”
for each feed. Is > there something else to be done? > > May be my question/request is too abstract! Please let me know if > anything else is to be provided. Bharath, Logging does work, but getting it configured properly can be a
challenge the first time you try it. (That has a lot to do with how the
underlying logging APIs work.) Please send me: a) Your curn configuration file. b) The command line you used to invoke curn. Note that the IgnoreDuplicateArticles plug-in is rather simplistic. It simply compares the article titles in each feed to see if there are duplicates. It attempts to normalize the titles slightly, but only slightly: - It converts all adjacent white space into a single space. - It converts the title to lower case. Thus, these two titles will compare as equal (and the second one will
be suppressed:
Dog drags owner from well
Dog drags
Owner from well The first one will be converted to "dog drags owner from
well" and saved. When curn sees the second title, it will remove the extra spaces and
convert it to lower case; the result will match the first title, and the second article will be suppressed. It doesn't do anything fancier than that, though. Send me your config file. I'll take a look. -- -Brian Brian Clapper, http://www.clapper.org/bmc/ A day without sunshine is like night. |
Back to curn-users archive.