curn: Customizable Utilitarian RSS Notifier
User's Guide

Brian M. Clapper
bmc @ clapper . org
$Id: index.html 7024 2007-08-25 02:25:05Z bmc $


Table of Contents

Introduction

curn is an RSS reader. It scans a configured set of URLs, each one representing an RSS feed, and summarizes the results. By default, curn keeps track of individual items within each RSS feed, using an on-disk cache; when using the cache, it will suppress displaying information for items it has already processed (though that behavior can be disabled).

Unlike many RSS readers, curn does not use a graphical user interface. It is a command-line utility, intended to be run periodically in the background by a command scheduler such as cron(8) (on UNIX-like systems) or the Windows Scheduler Service (on Windows).

curn can read RSS feeds from any URL that's supported by Java's runtime. When querying HTTP sites, curn uses the HTTP If-Modified-Since and Last-Modified headers to suppress retrieving and processing feeds that haven't changed (though a Force Feed Download Plug In, such as the Retain Articles, can override that capability). By default, it also requests that the remote HTTP server gzip the XML before sending it. (Some HTTP servers honor the request; some don't.) These measures both minimize network bandwidth and ensure that curn is as kind as possible to the remote RSS servers. (There are some additional steps you can take to be more bandwidth-friendly.)

curn comes with a built-in adapter for the ROME feed parser, but it can easily be extended to use any RSS parser. (curn uses ROME by default.) See the ParserClass configuration item for information on how to specify which parser curn should use. See the section entitled Using an Unsupported RSS Parser for more details on adapting curn to use other RSS parsers.

curn supports a several output formats; you can configure one or more output handlers in curn's configuration file. In addition, someone conversant with Java programming or comfortable with a scripting language, such as Python or Ruby, can easily extend curn to handle a new output format. See the section entitled Writing Your Own Output Handler for more details. Finally, as of version 2.6, curn has a built-in template-driven output handler, based on the FreeMarker template engine; The FreeMarkerOutputHandler this handler uses a text template to generate output, so anyone conversant with FreeMarker can easily write his own template to generate custom output. See the section describing the FreeMarkerOutputHandler for more details.

curn's predefined output handlers can generate:

In addition, curn supports emailing its output. If email addresses are specified in the configuration file, then curn creates a MIME multipart/alternative email message [1], using the output of each output handler as one of the alternative attachments. (As of version 3.2, curn can also send individual email messages for each article; see the MailIndividualArticles parameter.)

Terminology

Throughout this document, the following terms are used:

curn Command Line Syntax

curn is invoked from the command line as follows:

curn

The curn graphical installer automatically creates a Unix shell script (called curn) or a Windows command file (curn.bat) in the bin directory beneath the curn installation directory. You must put the curn bin directory in your path.

Note: While it is possible to invoke curn via the java command, it's not recommended. For curn's plug-ins to work properly, curn must do some fancy class loader footwork. Basically, curn uses a special bootstrap class to find all plug-ins and create a special class loader that can load everything—plug-ins, core code, etc. If you don't invoke curn via the bootstrap class, the plug-ins don't load properly. The curn shell script and command file handle invoking curn so that plug-ins will work properly.

curn's command line uses a UNIX-like syntax. If you invoke curn without any parameters, you get the following usage display.

Usage: curn [options] config

OPTIONS:

-B, --build-info  Show full build information, then exit. This option shows a
                  bit more information than the --version option. This option
                  can be combined with the --plug-ins option to show the
                  loaded plug-ins.
-C, --no-cache    Don't use a cache file at all.
--logging         Enable logging via Jakarta Commons Logging.
-p, --plug-ins    Show the list of located plug-ins and output handlers, then
                  exit. This option can be combined with either --build-info
                  or --version to show version information, as well.
-t, --time <time> For the purposes of cache expiration, pretend the current
                  time is <time>. <time> may be in one of the following
                  formats.
                  2006/06/08 02:59:36 PM
                  2006/06/08 02:59:36
                  2006/06/08 02:59 PM
                  2006/06/08 02:59
                  2006/06/08 2:59 PM
                  2006/06/08 2:59
                  2006/06/08 02 PM
                  2006/06/08 2 PM
                  2006/06/08 14:59:36
                  2006/06/08 14:59
                  2006/06/08
                  06/06/08
                  02:59:36 PM
                  02:59:36
                  02:59 PM
                  02:59
                  2:59 PM
                  2:59
                  02 PM
                  2 PM
                  14:59:36 PM
                  14:59:36
                  14:59 PM
                  14:59
-u, --no-update   Read the cache, but don't update it.
-v, --version     Show version information, then exit. This option can be
                  combined with the --plug-ins option to show the loaded
                  plug-ins.

PARAMETERS:

config  Path to configuration file

Many of curn's command-line options simply override settings in the curn configuration file. Each option and argument is discussed in more detail, below.

Command Line Options

OPTIONS
Short Option Long Option Explanation
-B --build-info Display detailed information about how and when curn was built, then exit without doing anything. Useful primarily when debugging or submitting problem reports. For instance, the command
curn -B
products output similar to the following:
curn, version 3.0 (build 20060608.185936.321)

Build:          20060608.185936.321
Build date:     2006/06/08 14:59:36 EDT
Built by:       bmc on sunball.inside.clapper.org
Built on:       Linux 2.6.16-1.2122_FC5smp (i386)
Build Java VM:  Java HotSpot(TM) Client VM 1.5.0_07-b03 (Sun Microsystems Inc.)
Build compiler: javac
Ant version:    Apache Ant version 1.6.5 compiled on June 2 2005

For a simple one-line version display, use the --version option.

-C --no-cache Run without a cache. Each RSS item curn encounters will appear to be new and will be passed to the output handlers. Also see the CacheFile configuration directive.
  --logging Enable logging via the java.util.logging API. You will also have to specify a logging configuration file via a -Djava.util.logging.config.file system property. For instance,
java -Djava.util.logging.config.file=/tmp/logging.properties org.clapper.curn.Tool --logging ...
See the section entitled Logging for more details on specifying logging parameters.
-t <time> --time <time>

For the purposes of cache expiration, pretend the current time is <time>, instead of the wall clock time. <time> may be specified in one of the following formats:

2004/07/22 09:37:29 AM
2004/07/22 09:37:29
2004/07/22 09:37 AM
2004/07/22 09:37
2004/07/22 9:37 AM
2004/07/22 9:37
2004/07/22 09 AM
2004/07/22 9 AM
2004/07/22
04/07/22
09:37:29 AM
09:37:29
09:37 AM
09:37
9:37 AM
9:37
09 AM
9 AM

This option is useful primarily for debugging. Before reading the RSS feeds, curn first loads its cache and prunes any cache entries that are out of date. When pruning its cache of out-of-date items, or when loading cache items, curn will behave as if the current time is the specified time.

-u --no-update Load (and prune) the cache file before processing the RSS feeds, but do not save the modified in-memory cache back to disk. Useful primarily for debugging.
-v --version Show just the one-line version information, then exit. For more detailed curn build and version information, use the --build-info option.

Command Line Parameters

A list of curn's positional parameters follows.

PARAMETERS
Positional Parameter Explanation
config The path or URL to the curn configuration file. This parameter is required.

The curn Configuration File

curn's configuration file controls all aspects of curn's behavior. The configuration file contains parameters that control curn's behavior, the output handlers, and the individual RSS feed sites. This section first describes the overall configuration file syntax, and then describes each curn configuration item in detail.

You can view a sample curn configuration file by following this link.

Configuration File Syntax

curn's configuration file is a simple text file. It resembles a standard Java properties file, but it is broken into individual sections, each of which has its own variable namespace. At a glance, the configuration file is reminiscent of a Windows .INI file, but there are quite a few differences. [2].

Like a .INI file, each section in the configuration file consists of a name surrounded by brackets. Each section contains variable assignments; the variable assignment syntax is similar to that of a Java properties file. For example:

[curn]
CacheFile: /home/bmc/.curn/cache
DaysToCache: NoLimit
ParserClass: org.clapper.curn.parser.rome.RSSParserAdapter
...

Section Name Syntax

There can be any amount of whitespace before and after the brackets in a section name; the whitespace is ignored. That is. "[curn]", "[ curn]" and "[ curn ]" all specify a section named "curn".

Variable Name Syntax

Each section contains zero or more variable settings. Similar to a Java properties file, the variables are specified as name/value pairs, separated by an equals sign ("=") or a colon (":"). Variable names are case-sensitive and may contain any printable character (including white space), other than '$' '{', and '}'. Variable values may contain anything at all. The parser ignores whitespace on either side of the "=" or ":"; that is, leading whitespace in the value is skipped. The way to include leading whitespace in a value is escape the whitespace characters with backslashes. (See below).

Variable definitions may span multiple lines; each line to be continued must end with a backslash ("\") character, which escapes the meaning of the newline, causing it to be treated like a space character. The following line is treated as a logical continuation of the first line; however, any leading whitespace is removed from continued lines. For example, the following four variable assignments all have the same value:

[test]
a: one two three
b:            one two three
c: one two \
three
d:        one \
two \
three

Because leading whitespace is skipped, all four variables have the value "one two three".

Only variable definition lines may be continued. Section header lines, comment lines (see below) and include directives (see below) cannot span multiple lines.

Expansions of Variable Values

The configuration parser preprocesses each variable's value, expanding embedded metacharacter sequences and substituting variable references. (See below.) You can use backslashes to escape the special characters that the parser uses to recognize metacharacter and variable sequences; you can also use single quotes. See Suppressing Metacharacter Expansion and Variable Substitution, below, for more details.

Metacharacter Expansion

Within a variable's value, Java-style ASCII escape sequences \t, \n, \r, \\, \", \', (a backslash and a space), and \uxxxx are recognized and converted to single characters. Note that metacharacter expansion is performed before variable substitution.

Variable Substitution

A variable's value can interpolate the values of other variables, using a variable substitution syntax reminiscent of the Unix shell (The syntax is also similar to the ant variable substitution syntax). The general form of a variable reference is ${sectionName:varName}. sectionName is the name of the section containing the variable to substitute; if omitted, it defaults to the current section. varName is the name of the variable to substitute. If the variable has an empty value, an empty string is substituted. If the variable (or the referenced section) does not exist, the curn will abort. If a variable reference specifies a section name, the referenced section must precede the current section. It is not possible to substitute the value of a variable in a section that occurs later in the file.

The section names "system", "env", and "program" are reserved for special "pseudosections."

The "system" pseudosection is used to interpolate values from Java's System.properties class. For instance, ${system:user.home} substitutes the value of the user.home system property (typically, the home directory of the user running curn). Similarly, ${system:user.name} substitutes the user's name.

The "env" pseudosection is used to interpolate values from the environment. On UNIX systems, for instance, ${env:HOME} substitutes user's home directory (and is, therefore, a synonym for ${system:user.home}. On some versions of Windows, ${env:USERNAME} will substitute the name of the user running curn. Note: On UNIX systems, environment variable names are typically case-sensitive; for instance, ${env:USER} and ${env:user} refer to different environment variables. On Windows systems, environment variable names are typically case-insensitive; ${env:USERNAME} and ${env:username} are equivalent.

The "program" pseudosection is a placeholder for various special variables provided by the Configuration class at runtime. Those variables are:

"program" Section Variable Explanation
cwd The program's current working directory. Thus, ${program:cwd} will substitute the current working directory, with an appropriate path separator for the host operating system (e.g., "\" for Windows, "/" for UNIX.)
cwd.url The program's current working directory, as a file URL, without the trailing "/". Useful when you need to create a URL reference to something relative to the current directory. This is especially helpful on Windows, where
file://${program:cwd}/something.txt
produces an invalid URL, with a mixture of backslashes and forward slashes. By contrast,
${program:cwdURL}/something.txt
always produces a valid URL, regardless of the underlying host operating system.
now The current time, formatted by calling java.util.Date.toString() with the default locale. The program's current working directory. For example, ${program:now} would produce something like "Fri Aug 20 15:18:56 EDT 2004" on a machine with a default English locale.
now delim fmt [delim lang delim country]] The current date/time, formatted with the specified java.text.SimpleDateFormat format string. If specified, the given locale and country code will be used; otherwise, the default system locale will be used. lang is a Java language code, such as "en", "fr", etc. country is a 2-letter country code, e.g., "UK", "US", "CA", etc. delim is a user-chosen delimiter that separates the variable name ("now") from the format and the optional locale fields. The delimiter can be anything that doesn't appear in the format string, the variable name, or the locale. For example:
${program:now|yyyy.MM.dd 'at' hh:mm:ss z} 2004.08.20 at 03:26:27 EDT
${program:now/yyyy.MM.dd 'at' HH:mm:ss z/en/US} 2004/08/20 at 15:28:37 EDT
${program:now|dd MMM, yyyy HH:mm:ss z|fr|FR} 20 aoät, 2004 at 03:30:29 EDT

Note: SimpleDateFormat requires that literal strings (i.e., strings that should not be processed as part of the format) be enclosed in quotes. For instance:

yyyy.MM.dd 'at' hh:mm:ss z

Because single quotes are special characters in configuration files, it's important to escape them if you use them inside date formats. So, to include the above string in a configuration file's ${program:now} reference, use the following:

${program:now/yyyy.MM.dd \'at\' hh:mm:ss z}

See Suppressing Metacharacter Expansion and Variable Substitution, below, for more details.


For example:

Variable Reference Explanation Sample
${system:user.home} Substitutes the value of the system property "user.home" (usually set to the current user's home directory).
[curn]
myCurnDir = ${system:user.home}/.curn
${curn:myCurnDir} Substitutes the value of variable "myCurnDir" from section the [curn] section.
[Feed_Wired]
URL: http://www.wired.com/news_drop/netcenter/netcenter.rdf
SaveAs: ${curn:myCurnDir}/feeds/wired.rdf
${myCurnDir} Substitutes the value of variable "myCurnDir" from the current section.
[curn]
myCurnDir = ${system:user.home}/.curn
CacheFile = ${myCurnDir}/cache

The configuration file also supports a simple conditional-substitution logic, which allows you to specify a default value to be substituted if a variable is empty or does not have a value. The general form of a conditional substitution is:

${var?some default value}
If ${var} does not have a value, or has an empty string as its value, the string "some default value" will be substituted.

Suppressing Metacharacter Expansion and Variable Substitution

To prevent the parser from interpreting metacharacter sequences, variable substitutions and other special characters, enclose part or all of the value in single quotes. (See [3] for additional comments.) For example, suppose you want to set variable "prompt" to the literal value "Enter value. To specify a newline, use \n." The following configuration file line will do the trick:

prompt: 'Enter value. To specify a newline, use \n'

Similarly, to set variable "abc" to the literal string "${foo}" suppressing the parser's attempts to expand "${foo}" as a variable reference, you could use:

abc: '${foo}'

To include a literal single quote, you must escape it with a backslash.

Path Names

Regardless of the underlying operating system, path names in the curn configuration file can always use Unix-style forward slash ("/") characters. At runtime curn will convert the path names to use the appropriate file separator (e.g., "\" on Windows). This capability provides two benefits:

  1. It enhances the portability of curn configuration files.
  2. It provides a means to avoid using (and, therefore, having to escape) backslash characters in the configuration file.

Includes

A special include directive permits inline inclusion of another configuration file. The include directive takes two forms:

%include "path"
%include "URL"

For example:

%include "/home/bmc/mytools/common.cfg"
%include "file:///home/bmc/mytools/common.cfg"

The included file may contain any content that is valid for this parser. It may contain just variable definitions (i.e., the contents of a section, without the section header), or it may contain a complete configuration file, with individual sections. Since the parser recognizes a variable syntax that is essentially identical to Java's properties file syntax, it's also legal to include a properties file, provided it's included within a valid section.

Attempting to include a file from itself, either directly or indirectly, will cause curn to abort processing.

Comments and Blank Lines

A comment line is a one whose first non-whitespace character is a "#" or a "!". This comment syntax is identical to the one supported by a Java properties file. A blank line is a line containing no content, or one containing only whitespace. Blank lines and comments are ignored. For example:

[curn]

# ---------------------------------------------------------------------------
# CacheFile: The full path to the file in which curn should cache URLs.
#            curn uses the cache file to keep track of which URLs it
#            has already received and displayed, and when it received them.
#            Under normal operation, curn won't display a URL it has
#            already displayed and cached.
#
#            This path may contain the ~ metacharacter, to denote the
#            invoking user's home directory.
#
#            The use of a cache can be disabled by omitting this parameter.
#            Use the "NoCacheUpdate" parameter to tell curn to read,
#            but not update, the cache.
#
# See also: Configuration parameter "NoCacheUpdate"
#           Command line parameter -C, --nocache
#
# OPTIONAL. Default: None

CacheFile: test.cache

Overview of curn's Configuration File

curn's configuration file has three kinds of sections:

All other sections in the configuration file are parsed (and subject to syntactic constraints), but otherwise ignored. Thus, it's perfectly legal to have a separate section, e.g., "[var]", where you define variables that exist solely to be substituted into other sections.

Any boolean parameter (i.e., one documented as taking a true or false value) can also take a value of "0" (false), "1" (true), "no" (false) or "yes" (true).

The [curn] Section

This section contains variable global parameters. Each is described in detail, below. (Parameters marked with plug-in are handled by one of curn's stock plug-ins, rather than by the core code.)

Variable Argument type Description Required? Default value See also
AllowEmbeddedHTML
plug-in
Boolean Default setting for whether or not to allow embedded HTML in certain RSS feed elements, such as description, author, etc. Some RSS formats permit embedded HTML. Setting this parameter to true preserves any embedded HTML markup within a feed; setting this parameter to false causes embedded HTML to be stripped.

Note that certain output handlers will strip HTML regardless of this setting. An output handler that produces text, for instance, is not required to support embedded HTML. This global parameter can be overridden on a per-feed basis.

Notes:
  • Use this parameter with care. If supported, the raw HTML is copied directly into the resulting output, without modification. With HTML output, malformed embedded HTML can screw up the resulting HTML document.
No false  
CacheFile File name or path name The full path to the file in which curn should cache feed item data. curn uses the cache file to keep track of which feed items it has already received and displayed, and when it received them. Under normal operation, curn won't display a feed item it has already displayed and cached.

The use of a cache can be disabled by omitting this parameter. Use the NoCacheUpdate parameter, or the --no-update command line option, to tell curn to read, but not update, the cache.

The cache file is an XML file. However, since it is generated automatically, you should not edit it.
No None. (If not specified, no cache is used.) NoCacheUpdate
CacheBackup
--no-cache
--no-update
CacheBackup

File name or path name.

The full path to a cache backup file. If this parameter is defined, curn will copy the cache to this backup file before updating the cache on disk.

Warning: This parameter was replaced with TotalCacheBackups in curn version 2.6.

No None. CacheFile
TotalCacheBackups
CommonXMLFixups
plug-in
Boolean Enables or disables the Common XML Fixups plug-in, which attempts to fix common syntax problems in downloaded XML feeds. There is some XML badness that is surprisingly common across feeds, including (but not limited to):
  • Using a "naked" ampersand (&) without escaping it.
  • Use of nonexistent entities (e.g., &ouml;, &nbsp;)
  • Improperly formatted entity escapes
This plug-in attempts to fix those problems.

This global parameter can be overridden on a per-feed basis. This global setting defines the default value for all feeds that don't explicitly set it themselves.
No false The per-feed CommonXMLFixups setting
DaysToCache Positive integer Default maximum number of days to cache an already-read item. This parameter is used when the configuration section for a particular site lacks its own DaysToCache value. Items older than this many days are tossed from the cache when it's read, which means curn forgets that it saw them before. A value of 0 renders the cache is essentially useless (i.e., 0 ensures that curn always forgets items that are cached). The special value "NoLimit" causes curn to leave items in the cache forever. No 365 (days) Per-feed DaysToCache parameter
GzipDownload
plug-in
Boolean If set to true, this parameter directs curn to use the "Accept-Encoding: gzip" HTTP header when retrieving an RSS feed from an HTTP server. Since RSS feeds are XML, they typically compress well; retrieving gzipped data, rather than the uncompressed HTML, can save a significant amount of time and network bandwidth. (Note, however, that HTTP servers are not obligated to honor a request to gzip the feed.) This parameter can be overridden on a per-feed basis. This global value sets the default value.

For backward compatibility, this parameter can also be specified as GetGzippedFeeds.
No true
IgnoreArticlesOlderThan
plug-in
String Provides a way to ignore articles that are older than a certain interval. Intervals are expressed in a natural language syntax. For instance:
IgnoreArticlesOlderThan: 3 days
IgnoreArticlesOlderThan: 1 week
IgnoreArticlesOlderThan: 365 days
IgnoreArticlesOlderThan: 12 hours, 30 minutes
Valid interval names (in English) are:
  • millisecond, milliseconds, ms
  • second, seconds, sec, secs
  • minute, minutes, min, mins
  • hour, hours, hr, hrs
  • day, days
  • week, weeks
If you're running curn in a Spanish or French locale, the appropriate Spanish or French equivalents are also available, as well as the English versions.

"year" and "month" are not supported, to avoid the irregularity of leaps years and different month lengths, respectively.

The actual conversion of the strings is done by the org.clapper.util library's Duration class. See that class for more details.

This global value sets the default value.

NOTE: The plug-in that implements this capability uses the timestamp in the XML to determine "older than", not the cached timestamp, because the intent is to weed old articles from a feed that you haven't processed in a while (or perhaps are processing for the first time.) If the article has no timestamp in the XML, it is assumed to be current, i.e., to have a date/time of "now".
No None (i.e., Articles are not ignored based on age) Per-feed IgnoreArticlesOlderThan parameter
MailOutputTo
plug-in
String One or more comma-separated email addresses to receive the output. This parameter is optional. If any email addresses are specified, then curn sends its generated output to those addresses. Depending on the setting of the MailIndividualArticles parameter, curn either sends a single MIME multipart/alternative email with all the output, or it sends one message per article found in the feeds. See MailIndividualArticles for details. No Output is not emailed. SMTPHost
MailFrom
MailSubject
MailFrom
plug-in
String The email address to use as the sender, when mailing output. The address can be a full RFC 2822-compliant address (e.g., "Joe Blow <joe@example.org>") or just a simple address (e.g., "joe@example.org"). This parameter is only honored when at least one email address is specified via the MailOutputTo configuration parameter. No curn constructs its own "from" address from the user name associated with running process and the current host name. SMTPHost
MailSubject
MailOutputTo
MailSubject
plug-in
String The subject line to use when mailing output. This parameter is only honored when at least one email address is specified via the MailOutputTo configuration parameter. No curn output SMTPHost
MailFrom
MailOutputTo
MailIndividualArticles
plug-in
Boolean If set to true, this parameter instructs curn to send an email per article; that is, instead of a single email containing the output from all output handlers, curn will send one individual email for each article. If curn finds 20 unread articles, it'll send 20 email messages, each with a single article; if there are 100 unread articles, curn will send 100 separate email messages. If there are multiple output handlers that actually produce output, then each article email will be a MIME multipart/alternate email containing separate attachments from each output handler for that article.

If this parameter is false or absent, curn will send one email containing the generated output for all feeds and items. If there are multiple output handlers that actually produce output, curn will combine all the outputs into a single MIME multipart/alternative email. Each output handler's output will be a separate multipart/alternative attachment. (curn assumes that each output handler is generating an alternate form of the same information.)

Output handlers that don't generate output are skipped. If none of the configured output handlers generate any output, then curn doesn't send an email message.

This parameter is ignored if no email addresses are specified by the MailOutputTo parameter.

WARNINGS:
  • Obviously, if this parameter is true, and there are lots of new articles, curn will send lots of small email messages. Use with caution.
  • If the output handler supports a SaveOnly parameter (e.g., the FreeMarkerOutputHandler), and you've set the SaveOnly parameter, the output handler won't generate emailable output. Any output handler that's derived from curn's FileOutputHandler automatically supports SaveOnly.
No Output is not emailed. SMTPHost
MailFrom
MailSubject
MaxArticlesToShow
plug-in
Integer Sets an upper limit on the number of articles displayed for the feed. This maximum is applied after the articles are sorted (see SortBy) and after the ShowArticlesFor and IgnoreArticlesOlderThan policies are applied. This parameter can be overridden on a per-feed basis. This global parameter sets the default value. No None (i.e., no maximum)
MaxSummarySize
plug-in
Positive integer If an article has a summary, you can optionally set a maximum size for the summary. If a summary exceeds the maximum size, curn will truncate it and add a trailing ellipsis ("...") to indicate the truncation. A value of 0 effectively disables this option. This parameter can be overridden on a per-feed basis. This global parameter sets the default value. No 0 (i.e., no limit on summary size) ReplaceEmptySummaryWith
MaxThreads Positive integer Defines the number of concurrent download threads. If this value is greater than 1, then curn will spawn that many worker threads to handle the downloading and parsing of the RSS feeds concurrently. If this value is 1, curn will process the feeds sequentially. If this value is greater than 1, but less than the total number of feeds, some of the worker threads will end up processing more than one feed (sequentially). Values less than 1 are illegal. No false
NoCacheUpdate Boolean If set to true (and if a cache file is specified), this parameter tells curn to read the cache file and honor its contents, but not to save the modified in-memory cache back to disk. No false CacheFile
--no-update
ParserClass String The full name of the underlying RSS parser class to be used. This class must implement the org.clapper.curn.parser.RSSParser interface. It can be a first-class parser of its own, or it can be nothing more than an adapter for a third party RSS parser class.

curn comes bundled with one parser:
org.clapper.curn.parser.rome.RSSParserAdapter
An adapter class that makes the Rome RSS parser work with curn. (The Rome adapter is only available if the appropriate Rome jar files are in curn's class path. Note also that Rome requires version 1.0 of the JDOM library.)

Any class that implements org.clapper.curn.parser.RSSParser may be used as a value for ParserClass.

No org.clapper.curn.parser.rome.RSSParserAdapter  
Quiet Boolean Normally, if an RSS feed contains no new items, most curn output handlers display the site's name and URL, followed by something like "No new items." Similarly, if curn can't contact a feed site, or if the site's XML is unparseable, curn displays an error message. This option tells curn to silently ignore sites with no data or bad XML. Setting Quiet to true tells curn to suppress both of the above displays. No false --quiet
--no-quiet
ReplaceEmptySummaryWith
plug-in
String Tells curn what to do when the summary for a feed article is missing. Legal values:
  • nothing: Leave the summary blank. This is the default.
  • content: Replace the summary with the article's content, if there is any content.
  • title: Replace the summary with the article's title.
overridden on a per-feed basis. This global value sets the default value.
No nothing Per-feed SortBy parameter
ShowArticlesFor String How long to display show articles from feeds. If specified, this parameter is only used when individual feeds don't specify a ShowArticlesFor parameter if their own. The value is a time interval, expressed using the same natural language strings supported by the IgnoreArticlesOlderThan parameter. For instance:
ShowArticlesFor: 3 days
ShowArticlesFor: 1 week
ShowArticlesFor: 365 days
ShowArticlesFor: 12 hours, 30 minutes
Valid interval names (in English) are:
  • millisecond, milliseconds, ms
  • second, seconds, sec, secs
  • minute, minutes, min, mins
  • hour, hours, hr, hrs
  • day, days
  • week, weeks
If this parameter is not specified, then the default value is to show an article one time only.

NOTE: The plug-in that implements this capability uses the timestamp in the curn cache when aging an article, not the timestamp in the feed's XML. That's because the intent of this configuration parameter is to permit you to keep showing an article for a certain amount of time after the article was first displayed. The article timestamp in the XML is the time that the article was published, not the time that curn first displayed it. The time in the curn cache represents the time that curn first saw (and presumably displayed) the article.

WARNINGS:
  • Specifying this parameter forces feeds to be downloaded, even if they haven't changed. curn does not keep cached copies of feed data; the only way it can redisplay an article is to download and re-parse the feed. Also, if the article is no longer in the feed, curn can't redisplay the article even if the elapsed time hasn't yet passed.
  • Beware of interactions with the IgnoreArticlesOlderThan parameter. Here's a simple example. Assume the configuration settings are:
    IgnoreArticlesOlderThan: 5 days
    ShowArticlesFor: 2 days
    In this case, any article in the feed that's older than 5 days will be discarded by the Ignore Old Articles plug-in, which will run first. Now, assume there are 4 articles:
    • Article 1 has never been processed by curn (i.e., isn't in the cache), and it has an XML timestamp of 3 days ago.
    • Article 2 has been processed by curn, 3 days ago. It also has an XML timestamp of 3 days ago.
    • Article 3 has been processed by curn, 1 hour ago. It has no XML timestamp.
    • Article 4 has been processed by curn, 3 days ago. It has an XML timestamp of 6 days ago.
    Now let's see what happens when the plug-ins run. The Ignore Old Articles plug-in runs first. (It sorts higher in the plug-in list. You'll have to take my word for that, or look at the source code.)
    • The Ignore Old Articles plug-in keeps Article 1, because its XML timestamp is 3 days ago, which is newer than the 5-day cutoff.
    • Ditto for Article 2.
    • Article 3 has no XML timestamp, so Ignore Old Articles assumes that it's current and keeps it.
    • Article 4 has an XML timestamp of 6 days ago, which is past the 5-day cut-off, so Ignore Old Articles discards it.
    At this point, there are three articles left. The Retain Articles PlugIn runs. (That's the plug-in that handles the ShowArticlesFor parameter.)
    • The Retain Articles plug-in keeps Article 1, because curn has never seen Article 1 before (i.e., it isn't in the cache).
    • The Retain Articles plug-in discards Article 2, because curn first displayed the article 3 days ago, which is past the ShowArticlesFor cut-off of 2 days.
    • The Retain Articles plug-in keeps Article 1, because curn first processed it an hour ago, so it's under the 2-day threshold.
    In the end, two articles are left.
No 1 millisecond (i.e., show each article once) Per-feed ShowArticlesFor parameter
ShowAuthors
plug-in
Boolean If set to true, this configuration item instructs curn to display author version for each feed item, if available. This global value can be overridden on a per-feed basis. No false
ShowDates
plug-in
Boolean Some RSS feeds or the individual items within each feed contain dates (usually corresponding to the publication dates for the feed or item). If this option is set to true, then curn will display the date for each item that provides a date. This global value can be overridden on a per-feed basis. No false
ShowRSSVersion Boolean Display the RSS version for each feed. No false
SummaryOnly
plug-in
Boolean Some RSS feeds provide a description for each item, in addition to the (brief) title. Setting SummaryOnly to true suppresses display of the description. This parameter can be overridden on a per-feed basis. This global value sets the default value.

WARNING: This parameter is deprecated. Use the ReplaceEmptySummaryWith parameter, instead.
No false ReplaceEmptySummaryWith
SMTPHost
plug-in
String The SMTP host to use when mailing output. This parameter is only honored when at least one email address is specified via the MailOutputTo configuration parameter. No localhost per-feed ReplaceEmptySummaryWith parameter
SortBy
plug-in
String Default method to use to sort items within each feed. This parameter is used when the configuration section for a particular site lacks its own SortBy value. Legal values:
  • time: Sort by timestamp, if present. Current time is assumed for items that don't have timestamps.
  • title: Sort by item title, if present. Any item without a title is sorted as if its title were the empty string ("").
  • none: Don't sort (i.e., leave items in the order they appear in the XML).
No none Per-feed SortBy parameter
TotalCacheBackups Positive integer The total number of cache backup copies to keep. If this parameter is greater than 0, then curn will keep that many numbered backups of the cache. If the cache exists when curn attempts to update it, curn will copy the existing cache to cacheFile.0. If cacheFile.0 exists, it will be moved to cacheFile.1 first, and so on down the line, until the maximum number of cache backup files exists. The newest cache is always the one without a numeric extension. the oldest file is the one with the largest numeric extension. This parameter is useful if you want to roll back to a previous cache.

If this parameter is not specified, or is 0, then no cache backups are made.
No 0 CacheFile
UserAgent
plug-in
String Specifies the default HTTP User-Agent header to use. This configuration parameter permits you to have curn masquerade as a known browser, for sites that refuse access to robots and spiders and other unknown web clients. This global value is used when the section for a particular feed does not supply its own UserAgent value. No A string that identifies curn as the user agent. Per-feed UserAgent parameter.
ZipOutputTo
plug-in
String Path to a zip file to receive all output generated by output handlers. No None  

Configuring the RSS Feeds

The curn configuration file also contains a list of RSS feeds to be polled. Each feed must be specified in its own section in the configuration file. The name of the section must start with the string "Feed". If more than one feed is present, then each section name must also have additional characters, to make the section name unique. The following section names are all valid for RSS feed sections.

Each feed section supports the following parameters. (Parameters marked with plug-in are handled by one of curn's stock plug-ins, rather than by the core code.)

Variable Argument type Description Required? Default Value
AllowEmbeddedHTML
plug-in
Boolean Whether or not to allow embedded HTML in certain RSS feed elements, such as description, author, etc, for this feed. Some RSS formats permit embedded HTML; setting this parameter to true tells curn output handlers that they should preserve such embedded HTML markup, if possible. If this parameter is false, any embedded HTML is stripped.

Note that certain output handlers will strip HTML regardless of this setting. An output handler that produces text, for instance, is not required to support embedded HTML.

Notes:

  • Use this parameter with care. If supported, the raw HTML is copied directly into the resulting output, without modification. With HTML output, malformed embedded HTML can screw up the resulting HTML document.
This parameter overrides the AllowEmbeddedHTML setting in the main configuration section.
No false
ArticleFilter
plug-in
Strings Specifies a set of filters to discard feed item (article) content, based on regular expressions.

The filtering syntax is (shamelessly) adapted from the rawdog RSS reader's article-filter plug-in. A feed filter is configured by adding an ArticleFilter property to the feed's configuration section. The property's value consists of one or more filter command sequences, separated by ";" characters. (The ";" must be surrounded by white space; see below.) Each filter command sequence is of this form:
show|hide [field 'regexp' [field 'regexp' ...]]
field can be one of:
  • author: search the author field
  • title: search the title field
  • summary: search the summary, or description, field
  • text: search the full content, if available
  • category: search the article's category (or categories)
  • any: search all fields
Each regular expression must be enclosed in single quotes. For example:
hide author 'Raymond Luxury-yacht' ; \
show author 'Arthur +.Two-sheds. +Jackson'
      
If the command is "hide", then the entry will be hidden if the specified field matches the regular expression. If the command is "show", then the entry will be shown if the field matches the regular expression. If there are no fields or regular expressions, then the command is a wildcard match. That is:
hide
is equivalent to:
hide any '.*'
and:
show
is equivalent to:
show any '.*'
Wildcard matches are useful in situations where you want to hide or show "everything but ...". See the examples, below, for details.

All filtering commands are processed, and the end result is what defines whether a given entry is suppressed or not. Regular expressions are matched in a case-blind fashion. The match logic also:
  • ignores any embedded newlines in article contents
  • (temporarily) strips all HTML from the article text before matching
You can use multiple ArticleFilter parameters per feed, as long as they have unique suffixes (e.g., ArticleFilter1, ArticleFilter2, etc.). All filters are applied to each article to determine whether the article should be filtered out or not.

Examples

Some examples will help clarify the syntax.

For example, the following set of commands hide all articles with the phrase "mash-up" (because mash-ups bore me):
ArticleFilter: hide any 'mash[- \t]?up'
The following, more complicated, entry hides everything by author "Joe Blow", unless the title has the word "rant" in it ('cause his rants are hilarious):
ArticleFilter: hide author '^joe *blow$' ; \
               show author '^joe *blow$' title rant
Finally, this example hides everything except articles by Moe Howard:
ArticleFilter: hide ; show author '^moe *howard$'
No Articles are not filtered
CommonXMLFixups
plug-in
Boolean Enables or disables the Common XML Fixups plug-in, which attempts to fix common syntax problems in downloaded XML feeds. Among the corrections this plug-in makes:
  • Conversion of unescaped ampersand ("&") characters
  • Conversion of certain commonly seen, but nonexistent, XML entities, such as &mdash; and &ouml;
  • Conversion of illegal character entities (which are usually leaked unescaped from embedded HTML text)
  • "Demoronizing" text by converting Microsoft Windows-specific characters (such as smart quotes) to something that will display in any browser. (The term "demoronize" is borrowed from John Walker's demoroniser command-line Unix tool.)
This per-feed setting overrides the global default value.
No The value of the global CommonXMLFixups parameter in the [curn] section, or false, if that value is not set.
DaysToCache Positive integer Maximum number of days to cache an already-read item for this feed. This value locally overrides the global DaysToCache default in the [curn] section. Items older than this many days are tossed from the cache when it's read, which means curn forgets that it saw them before. A value of 0 renders the cache is essentially useless for this feed (i.e., 0 ensures that curn always forgets items that are cached for this feed). The special value "NoLimit" causes curn to leave items in the cache forever. No The value of the global DaysToCache parameter in the [curn] section or 365 if that value is not set.
Disabled
plug-in
Boolean If true, then the feed is skipped. If false, the feed is processed. This variable provides a simple way to disable a feed without having to comment its entire section out. No false
EditFeedURL
EditItemURL
plug-in
String Apply the specified regular expression edit to the site's feed URL (EditFeedURL) or to each of the site's RSS item URLs (EditItemURL). The value for this option consists of a Perl 5-style substitution applied to the URL. For example:

Remove all the parameters from the URL:

's/?.*$//'

(The PruneURLs parameter provides a simpler mechanism for this common operation.)

Remove a "redirect" CGI from a site whose URLs look like: http://www.example.com/redir.cgi?http://...

s+http://www.example.com/cgi-bin/redir.cgi?++

The substitution syntax supports perl's $1, $2, etc., grouping syntax. However, because the "$" character also introduces a configuration file variable reference, you must escape the "$" to use it in a regular expression. For instance, use either:

s/^([a-z]+)foo(.*)\$/\$1bar\$2/

or

's/^([a-z]+)foo(.*)$/$1bar$2/'

If there are backslashes in the string, you must escape them, as well, preferably by single-quoting the value. See Suppressing Metacharacter Expansion and Variable Substitution for more details. To get the equivalent of Perl 5 expression.

s/^\*.*$//

you must specify

's/^\*.*$//'

This substitution syntax supports the following Perl-like modifiers, which are appended to the end of the substitution command:

Substitute for all occurrences of the regular expression, not just the first one
Do case-insensitive pattern matching. Case-sensitive pattern matching is the default.
Treat the string is consisting of multiple lines. This modifier changes the meaning of "^" and "$" so that they match just after or just before, respectively, a line terminator or the end of the input sequence. By default these expressions only match at the beginning and the end of the entire input sequence.

The modifiers can be concatenated. Thus,

's/abc/xyz/ig'

will match and replace all occurrences of the string "abc", whether upper-, lower- or mixed-case.

Hint: When logging is enabled, curn will log the parsed expression at the "debug" log level.

No None
ForceEncoding String Force curn to ignore the character set encoding advertised by the remote server (if any), and use the character set specified by this configuration item, instead. This is useful in the following cases:
  • the remote HTTP server doesn't supply an HTTP Content-Encoding header, and the local (Java) default encoding doesn't match the document's encoding
  • the remote HTTP server supplies the wrong encoding
  • the feed is coming from a file or an FTP server, and the default encoding (see below) isn't correct

This value should be a character set encoding that is recognized by the Java runtime environment.

ForceCharacterEncoding is a synonym for this parameter, retained for backward compatibility.

No
  • For http and https URLs, the encoding comes from the HTTP Content-Encoding header; if that header isn't present, then the Java VM's default encoding (usually "ISO-8859-1" on UNIX, and "Cp1252" on Windows) is used.
  • For file URLs, the default encoding is "utf-8", the same as the default value for the SaveAsEncoding parameter.
  • For all other URL types, the Java VM's default encoding is used.
GzipDownload
plug-in
Boolean If set to true, this parameter directs curn to use the "Accept-Encoding: gzip" HTTP header when retrieving this RSS feed from an HTTP server. Since RSS feeds are XML, they typically compress well; retrieving gzipped data, rather than the uncompressed HTML, can save a significant amount of time and network bandwidth. (Note, however, that HTTP servers are not obligated to honor a request to gzip the feed.) This parameter overrides the global GzipDownload. No true
IgnoreArticlesOlderThan
plug-in
String Provides a way to ignore articles that are older than a certain interval. Intervals are expressed in a natural language syntax. Please see the documentation for the global IgnoreArticlesOlderThan parameter for a more complete description of this parameter. No The default, as defined by the global IgnoreArticlesOlderThan parameter. If no global IgnoreArticlesOlderThan value is set, then articles aren't ignored based on their age.
IgnoreDuplicateTitles
plug-in
Boolean If true, curn will ignore any item whose title matches the title of another item in the feed. It only compares titles within the feed itself; it does not compare against titles of cached items.) Titles are compared without regard to upper or lower case.

This feature (hack, really) is useful for sites whose feeds often contain duplicate items (with the same titles) that have different IDs and different URLs, and thus appear to be unique. (Yahoo! News feeds sometimes exhibit this trait.)

No false
MaxArticlesToShow
plug-in
Integer Sets an upper limit on the number of articles displayed for the feed. This maximum is applied after the articles are sorted (see SortBy) and after the ShowArticlesFor and IgnoreArticlesOlderThan policies are applied. No The default, as defined by the global MaxArticlesToShow parameter. If no global MaxArticlesToShow value is set, then there is no maximum.
MaxSummarySize
plug-in
Positive integer If an article has a summary, you can optionally set a maximum size for the summary. If a summary exceeds the maximum size, curn will truncate it and add a trailing ellipsis ("...") to indicate the truncation. A value of 0 effectively disables this option. This parameter overrides the global MaxSummarySize parameter. No 0 (i.e., no limit on summary size)
PreparseEditsuffix
plug-in
String A parameter in a Feed section that starts with PreparseEdit (e.g., PreparseEdit1, PreparseEditFoo, etc.) defines a substitution to be applied to the downloaded XML file before it is parsed. As with the EditItemURL and EditFeedURL options, the value for this option this option consists of a Perl 5-style substitution.

This capability is rarely needed, but it's sometimes useful for sites that serve unparseable, but easily fixed, XML. (Though the CommonXMLFixups capability covers a lot of these errors with less configuration.)

For instance, one news site I read has an RSS channel whose title always contains an unescaped "&". The XML parser will not parse that feed; however, a simple preparse edit command of:

's/ & / \&amp; /g'

fixes the problem. (Again, this is one of the common XML syntax errors that CommonXMLFixups will correct.)

Another use for PreparseEdit is fixing incorrectly formatted links in the RSS feed. Consider the following <link> element, for fictitious site news.example.com:

<link>http://news.example.com&article=12573</link>

This is a perfectly parseable URL, but it happens to be wrong. It's missing a "/" between ".com" and "&". It really ought to be:

<link>http://news.example.com/&article=12573</link>

A quick PreparseEdit rule can fix it, though:

PreparseEdit: 's|(news.example.com)([^/]+)|$1/$2|

Note the use of a different delimiter in the edit command ("|", instead of "/"). Any non-alphabetic character will work.

Multiple instances of this parameter are permitted, as long as each instance's name begins with the string "PreparseEdit" and contains a unique suffix.

The substitution syntax supports perl-style $1, $2, etc., grouping syntax. However, because the "$" character also introduces a configuration file variable reference, you must escape the "$" to use it in a regular expression. For instance, use either:

s/^([a-z]+)foo(.*)\$/\$1bar\$2/

or

's/^([a-z]+)foo(.*)$/$1bar$2/'

If there are backslashes in the string, you must escape them, as well, preferably by single-quoting the value. See Suppressing Metacharacter Expansion and Variable Substitution for more details. To get the equivalent of Perl 5 expression.

s/^\*.*$//

you must specify

's/^\*.*$//'

This substitution syntax supports the following perl-like modifiers, which are appended to the end of the substitution command:

Substitute for all occurrences of the regular expression, not just the first one
Do case-insensitive pattern matching. Case-sensitive pattern matching is the default.
Treat the string is consisting of multiple lines. This modifier changes the meaning of "^" and "$" so that they match just after or just before, respectively, a line terminator or the end of the input sequence. By default these expressions only match at the beginning and the end of the entire input sequence.

The modifiers can be concatenated. Thus,

's/abc/xyz/ig'

will match and replace all occurrences of the string "abc", whether upper-, lower- or mixed-case.

Hint: When logging is enabled, curn will log the parsed expression at the "debug" log level.

No None
PruneURLs
plug-in
Boolean Specifies that all URLs should be pruned of their HTTP parameters. This action can also be accomplished with EditItemURL and EditFeedURL directives; PruneURLs is convenient shorthand for a common operation. No None
ReplaceEmptySummaryWith
plug-in
String Tells curn what to do when the summary for a feed article is missing. Legal values:
  • nothing: Leave the summary blank. This is the default.
  • content: Replace the summary with the article's content, if there is any content.
  • title: Replace the summary with the article's title.
This per-feed setting overrides the global setting.
No nothing
SaveAs
plug-in
[options] Path If set, this parameter specifies the path to a file where curn should save the raw XML contents of the feed, whenever it downloads the feed. This can be useful if you have a master version of curn that downloads a bunch of feeds, with multiple slave versions of curn that then run against the downloaded files. (See Being Bandwidth Friendly for a more detailed discussion of this tactic.)

This configuration item takes a command line-style value:
SaveAs: [--backups total_backups] [--encoding encoding] path
or
SaveAs: [-b total_backups] [-e encoding] path
The parameters have the following meanings:
  • total_backups specifies how many backups (i.e., previous versions) of the generated RSS file to keep. For instance, a value of 5 means "keep 5 previous versions of the file, plus the one from the current run." This is the best way to keep RSS files from previous curn runs. The backup files have version numbers preceding their extensions. For instance, if the output file is foo.xml, and total_backups is 2, curn will keep foo.0.xml and foo.1.xml. The file with the largest version number is the oldest one. If not specified, this parameter defaults to 0, which means "no backups".

  • encoding is optional and specifies the desired encoding of the file. It defaults to "utf-8".

  • path is the path to the file where the raw RSS data should be written.
Note: Often, curn can't tell whether there's any new data in a feed without downloading it. (This is true, for instance, if the remote HTTP server doesn't supply a valid Last-Modified header, or if it doesn't honor the If-Modified-Since header.) If curn decides it has to download a feed, and the feed has a configured SaveAs value, the feed will be saved even if curn later decides there's no new data in the feed.
No None
SaveAsEncoding
plug-in
String If set, and if SaveAs parameter is also set, then this parameter specifies the character encoding to use when saving the feed to the file. If SaveAs is not set for the feed, then any SaveAsEncoding parameter is ignored.

WARNING: This parameter is deprecated. Use the --encoding option to the SaveAs parameter, instead.
No "utf-8". Note that this default value is the same as the default value of the ForceEncoding, for file URLs. This makes it easy to have one instance of curn save RSS feeds for other instances to parse.
SaveOnly
plug-in
Boolean If set, and if SaveAs is also set, then the feed will be downloaded and saved, but not parsed and not included in the generated output. This parameter can be useful when Being Bandwidth Friendly. No false
SaveAsRSS
plug-in
[options] Path If set, this parameter specifies that the feed should be rewritten in the specified RSS format and saved to the specified file. This configuration item takes a command line-style value:
SaveAsRSS: [--backups total_backups] [--type rsstype] [--encoding encoding] path
or
SaveAsRSS: [-b total_backups] [-t rsstype] [-e encoding] path
The parameters have the following meanings:
  • total_backups specifies how many backups (i.e., previous versions) of the generated RSS file to keep. For instance, a value of 5 means "keep 5 previous versions of the file, plus the one from the current run." This is the best way to keep RSS files from previous curn runs. The backup files have version numbers preceding their extensions. For instance, if the output file is foo.xml, and total_backups is 2, curn will keep foo.0.xml and foo.1.xml. The file with the largest version number is the oldest one. If not specified, this parameter defaults to 0, which means "no backups".

  • rsstype is the type of RSS output to generate. Currently, "rss1", "rss2" and "atom" are the supported values.

  • encoding is optional and specifies the desired encoding of the file. It defaults to "utf-8".

  • path is the path to the file where the RSS output should be written.
Note that only the new data in the feed is converted to RSS.
No None
SaveRSSOnly
plug-in
Boolean If set, and if SaveAsRSS is also set, then the feed will be downloaded and parsed, and the RSS output will be generated, but the feed will not be passed to any output handlers (or, for that matter, any other plug-ins). No false
SavedBackups Positive integer Number of saved backups to keep. If this value is non-zero, the handler will back the SaveAs file up before overwriting it. Up to SavedBackups total backed-up files will be kept. A value of 0 disables the feature. No 0
ShowArticles
plug-in
String How long to display show articles from the feed. The value is a time interval, expressed using the same natural language strings supported by the IgnoreArticlesOlderThan parameter. Please see the documentation for the global ShowArticlesFor parameter for a more complete description of this parameter.

This value overrides the global ShowArticlesFor parameter.
No The value of the global ShowArticlesFor parameter.
ShowAuthors
plug-in
Boolean If set to true, this configuration item instructs curn to display author version for this feed, if available. This value overrides the global ShowAuthors parameter. No The value of the global ShowAuthors parameter.
ShowDates
plug-in
Boolean If set to true, this configuration item instructs curn to display any dates associated with this feed, if available. This value overrides the global ShowDates parameter. No The value of the global ShowDates parameter.
SortBy
plug-in
String How to sort items in this feed. This value locally overrides the global SortBy parameter in the [curn] section. Legal values:
  • time: Sort by timestamp, if present. Current time is assumed for items that don't have timestamps.
  • title: Sort by item title, if present. Any item without a title is sorted as if its title were the empty string ("").
  • none: Don't sort (i.e., leave items in the order they appear in the XML).
No The value of the global SortBy parameter in the [curn] section.
SummaryOnly
plug-in
Boolean Some RSS feeds provide a description for each item, in addition to the (brief) title. Setting SummaryOnly to true suppresses display of the description. This parameter overrides the global SummaryOnly parameter.

WARNING: This parameter is deprecated. Use the ReplaceEmptySummaryWith parameter, instead.
No The value of the global SummaryOnly parameter.
TitleOverride
plug-in
String Specifies a string to be used as the site's title, instead of the title supplied in the RSS XML. Useful when the real site-supplied title is not suitable. No None
URL String The fully-qualified URL for the feed. For local files, use a "file:" URL. Yes None
UserAgent
plug-in
String Specifies the HTTP User-Agent header to use when retrieving this feed. This local value overrides the global UserAgent parameter in the [curn] section. This configuration parameter permits you to have curn masquerade as a known browser, and it's useful for sites that refuse access to robots and spiders and other unknown web clients. No The value of the global UserAgent parameter in the [curn] section.

Configuring Output Handlers

Output Handler Sections

As curn processes each RSS feed, it parses the XML and loads the new items into internal data structures. When it has finished processing the XML, it hands the parsed data structures to one or more output handlers. Output handlers are so called because they generally produce output that's to be displayed or emailed to the user—generally, but not always. An output handler may choose to save its output to a file, but not send the output back to curn; each of the built-in output handlers does exactly that if its SaveAs configuration parameter is set and its SaveOnly configuration parameters is true. Alternatively, the output handler may choose to convert the internal data structures to output that it publishes somewhere (e.g., via a network connection to an HTTP server).

Each output handler is specified in its own section in the configuration file. The name of the section must start with the string "OutputHandler". If more than one output handler is present, then each section name must also have additional characters, to make the section name unique. The following section names are all valid for output handler sections.

If no OutputHandler sections are present in the configuration file, curn skips the RSS XML parsing phase. (There's not reason to parse the XML if there are no output handlers to process the parsed feed data.) If there are no output handlers, curn may or may not download individual feeds. If a given feed has no SaveAs setting, and there are no output handlers, then curn skips the feed entirely. After all, there's no sense wasting time downloading the feed, if the feed isn't being parsed or saved. However, if the feed does have a SaveAs setting, curn will download and save the XML (assuming it has changed) even if XML parsing is disabled.

All output handler sections take two variables. In addition, individual output handlers can require configuration items of their own. The two variables common to all output handlers are described below.

Variable Argument type Description Required? Default Value
Class String Identifies Java class that implements the output handler. (The class must implement the org.clapper.curn.OutputHandler interface. See Writing Your Own Output Handler for details.) Yes  
Disabled Boolean If true, the output handler is skipped. If false, the output handler is processed. This variable provides a simple way to disable an output handler without having to comment its entire section out. No false

There are some output handler examples following the next section.

Predefined Output Handlers

curn comes bundled with the following built-in output handlers.

FreeMarkerOutputHandler

Class
org.clapper.curn.output.freemarker.FreeMarkerOutputHandler
Purpose
Uses the FreeMarker template engine and a configured template to generate output.
Using the FreeMarker output handler

The FreeMarkerOutputHandler, introduced in curn version 2.6, is both simple and flexible. It uses the FreeMarker template engine to convert a template to an output file. FreeMarker templates can be used to generate nearly any kind of textual output file, from HTML and XML to simple text. In fact, the HTMLOutputHandler, TextOutputHandler, and SimpleSummaryOutputHandler have been reimplemented to use the FreeMarkerOutputHandler in conjunction with built-in templates that produce the appropriate kind of output.

Additional Configuration Items
Variable Argument type Explanation Required? Default value
AllowEmbeddedHTML Boolean Whether or not the specified template supports embedded HTML. If embedded HTML is found within an RSS item, it will be included in the generated output only if (a) this parameter is true, and (b) the AllowEmbeddedHTML parameter for the feed is also true. Otherwise, embedded HTML will be stripped from the item. No false
Encoding String Specify the character encoding to use when writing the output file. No "utf-8"
SaveAs File name or path name Save a copy of the generated HTML to the specified file. The argument is the path to the file. WARNING: The syntax of this parameter is different from the syntax of the SaveAs parameter for a feed. No None (i.e., no copy is saved)
SaveOnly Boolean If true and if SaveAs is defined, then save a copy of the generated HTML, but don't make it available to the user. (i.e., Don't display it on standard output, and don't email it.) No false
ShowCurnInfo Boolean Whether or not to display the curn version, curn configuration file path, and other curn-related information at the bottom of the generated HTML. No true
TemplateFile Two strings Specifies the location of the FreeMarker template file. The location is specified with three parameters:
  • the type, which may be file, classpath, url or builtin
  • an identifier string
  • a MIME type for the generated output. This parameter, if omitted, defaults to "text/plain"

The form of the identifier string depends on the type value.

builtin The identifier specifies one of the built-in curn FreeMarker templates that are bundled in the curn jar file. There are three legal values:
  • html: the HTML template, which generates HTML output
  • summary: a plain text template that generates a simple text summary
  • text: a template that generates output containing the same information as the HTML template, but in plain text form
Examples: