curn: Customizable Utilitarian RSS Notifier
User's Guide
curn is an RSS reader. It scans a configured set of URLs, each one representing an RSS feed, and summarizes the results. By default, curn keeps track of individual items within each RSS feed, using an on-disk cache; when using the cache, it will suppress displaying information for items it has already processed (though that behavior can be disabled).
Unlike many RSS readers, curn does not use a graphical user interface. It is a command-line utility, intended to be run periodically in the background by a command scheduler such as cron(8) (on UNIX-like systems) or the Windows Scheduler Service (on Windows).
curn can read RSS feeds from any URL that's supported by Java's runtime. When querying HTTP sites, curn uses the HTTP If-Modified-Since and Last-Modified headers to suppress retrieving and processing feeds that haven't changed (though a Force Feed Download Plug In, such as the Retain Articles, can override that capability). By default, it also requests that the remote HTTP server gzip the XML before sending it. (Some HTTP servers honor the request; some don't.) These measures both minimize network bandwidth and ensure that curn is as kind as possible to the remote RSS servers. (There are some additional steps you can take to be more bandwidth-friendly.)
curn comes with a built-in adapter for the ROME feed parser, but it can easily be extended to use any RSS parser. (curn uses ROME by default.) See the ParserClass configuration item for information on how to specify which parser curn should use. See the section entitled Using an Unsupported RSS Parser for more details on adapting curn to use other RSS parsers.curn supports a several output formats; you can configure one or more output handlers in curn's configuration file. In addition, someone conversant with Java programming or comfortable with a scripting language, such as Python or Ruby, can easily extend curn to handle a new output format. See the section entitled Writing Your Own Output Handler for more details. Finally, as of version 2.6, curn has a built-in template-driven output handler, based on the FreeMarker template engine; The FreeMarkerOutputHandler this handler uses a text template to generate output, so anyone conversant with FreeMarker can easily write his own template to generate custom output. See the section describing the FreeMarkerOutputHandler for more details.
curn's predefined output handlers can generate:
In addition, curn supports emailing its output. If email addresses are specified in the configuration file, then curn creates a MIME multipart/alternative email message [1], using the output of each output handler as one of the alternative attachments. (As of version 3.2, curn can also send individual email messages for each article; see the MailIndividualArticles parameter.)
Throughout this document, the following terms are used:
curn is invoked from the command line as follows:
curn
The curn graphical installer automatically creates a Unix shell script (called curn) or a Windows command file (curn.bat) in the bin directory beneath the curn installation directory. You must put the curn bin directory in your path.
Note: While it is possible to invoke curn via the java command, it's not recommended. For curn's plug-ins to work properly, curn must do some fancy class loader footwork. Basically, curn uses a special bootstrap class to find all plug-ins and create a special class loader that can load everything—plug-ins, core code, etc. If you don't invoke curn via the bootstrap class, the plug-ins don't load properly. The curn shell script and command file handle invoking curn so that plug-ins will work properly.
curn's command line uses a UNIX-like syntax. If you invoke
curn without any parameters, you get the following usage display.
|
Many of curn's command-line options simply override settings in the curn configuration file. Each option and argument is discussed in more detail, below.
| OPTIONS | ||
|---|---|---|
| Short Option | Long Option | Explanation |
| -B | --build-info | Display detailed information about how and when curn
was built, then exit without doing anything. Useful primarily
when debugging or submitting problem reports. For instance,
the command
products output similar to the following:curn -B curn, version 3.0 (build 20060608.185936.321) Build: 20060608.185936.321 Build date: 2006/06/08 14:59:36 EDT Built by: bmc on sunball.inside.clapper.org Built on: Linux 2.6.16-1.2122_FC5smp (i386) Build Java VM: Java HotSpot(TM) Client VM 1.5.0_07-b03 (Sun Microsystems Inc.) Build compiler: javac Ant version: Apache Ant version 1.6.5 compiled on June 2 2005 For a simple one-line version display, use the --version option. |
| -C | --no-cache | Run without a cache. Each RSS item curn encounters will appear to be new and will be passed to the output handlers. Also see the CacheFile configuration directive. |
| --logging | Enable logging via the java.util.logging
API. You will also have to specify a logging configuration file
via a -Djava.util.logging.config.file
system property. For instance,
See the section entitled Logging for more details on specifying logging parameters.java -Djava.util.logging.config.file=/tmp/logging.properties org.clapper.curn.Tool --logging ... |
|
| -t <time> | --time <time> | For the purposes of cache expiration, pretend the current time is <time>, instead of the wall clock time. <time> may be specified in one of the following formats:
|
| -u | --no-update | Load (and prune) the cache file before processing the RSS feeds, but do not save the modified in-memory cache back to disk. Useful primarily for debugging. |
| -v | --version | Show just the one-line version information, then exit. For more detailed curn build and version information, use the --build-info option. |
A list of curn's positional parameters follows.
| PARAMETERS | ||
|---|---|---|
| Positional Parameter | Explanation | |
| config | The path or URL to the curn configuration file. This parameter is required. | |
curn's configuration file controls all aspects of curn's behavior. The configuration file contains parameters that control curn's behavior, the output handlers, and the individual RSS feed sites. This section first describes the overall configuration file syntax, and then describes each curn configuration item in detail.
You can view a sample curn configuration file by following this link.
curn's configuration file is a simple text file. It resembles a standard Java properties file, but it is broken into individual sections, each of which has its own variable namespace. At a glance, the configuration file is reminiscent of a Windows .INI file, but there are quite a few differences. [2].
Like a .INI file, each section in the configuration file consists of a name surrounded by brackets. Each section contains variable assignments; the variable assignment syntax is similar to that of a Java properties file. For example:
[curn] CacheFile: /home/bmc/.curn/cache DaysToCache: NoLimit ParserClass: org.clapper.curn.parser.rome.RSSParserAdapter ... |
There can be any amount of whitespace before and after the brackets in a section name; the whitespace is ignored. That is. "[curn]", "[ curn]" and "[ curn ]" all specify a section named "curn".
Each section contains zero or more variable settings. Similar to a Java properties file, the variables are specified as name/value pairs, separated by an equals sign ("=") or a colon (":"). Variable names are case-sensitive and may contain any printable character (including white space), other than '$' '{', and '}'. Variable values may contain anything at all. The parser ignores whitespace on either side of the "=" or ":"; that is, leading whitespace in the value is skipped. The way to include leading whitespace in a value is escape the whitespace characters with backslashes. (See below).
Variable definitions may span multiple lines; each line to be continued must end with a backslash ("\") character, which escapes the meaning of the newline, causing it to be treated like a space character. The following line is treated as a logical continuation of the first line; however, any leading whitespace is removed from continued lines. For example, the following four variable assignments all have the same value:
[test] a: one two three b: one two three c: one two \ three d: one \ two \ three |
Because leading whitespace is skipped, all four variables have the value "one two three".
Only variable definition lines may be continued. Section header lines, comment lines (see below) and include directives (see below) cannot span multiple lines.
The configuration parser preprocesses each variable's value, expanding embedded metacharacter sequences and substituting variable references. (See below.) You can use backslashes to escape the special characters that the parser uses to recognize metacharacter and variable sequences; you can also use single quotes. See Suppressing Metacharacter Expansion and Variable Substitution, below, for more details.
Within a variable's value, Java-style ASCII escape sequences \t, \n, \r, \\, \", \', \ (a backslash and a space), and \uxxxx are recognized and converted to single characters. Note that metacharacter expansion is performed before variable substitution.
A variable's value can interpolate the values of other variables, using a variable substitution syntax reminiscent of the Unix shell (The syntax is also similar to the ant variable substitution syntax). The general form of a variable reference is ${sectionName:varName}. sectionName is the name of the section containing the variable to substitute; if omitted, it defaults to the current section. varName is the name of the variable to substitute. If the variable has an empty value, an empty string is substituted. If the variable (or the referenced section) does not exist, the curn will abort. If a variable reference specifies a section name, the referenced section must precede the current section. It is not possible to substitute the value of a variable in a section that occurs later in the file.
The section names "system", "env", and "program" are reserved for special "pseudosections."
The "system" pseudosection is used to interpolate values from Java's System.properties class. For instance, ${system:user.home} substitutes the value of the user.home system property (typically, the home directory of the user running curn). Similarly, ${system:user.name} substitutes the user's name.
The "env" pseudosection is used to interpolate values from the environment. On UNIX systems, for instance, ${env:HOME} substitutes user's home directory (and is, therefore, a synonym for ${system:user.home}. On some versions of Windows, ${env:USERNAME} will substitute the name of the user running curn. Note: On UNIX systems, environment variable names are typically case-sensitive; for instance, ${env:USER} and ${env:user} refer to different environment variables. On Windows systems, environment variable names are typically case-insensitive; ${env:USERNAME} and ${env:username} are equivalent.
The "program" pseudosection is a placeholder for various special variables provided by the Configuration class at runtime. Those variables are:
| "program" Section Variable | Explanation | ||||||
|---|---|---|---|---|---|---|---|
| cwd | The program's current working directory. Thus, ${program:cwd} will substitute the current working directory, with an appropriate path separator for the host operating system (e.g., "\" for Windows, "/" for UNIX.) | ||||||
| cwd.url | The program's current working directory, as a
file URL, without the trailing "/".
Useful when you need to create a URL reference to something
relative to the current directory. This is especially helpful
on Windows, where
produces an invalid URL, with a mixture of backslashes and
forward slashes. By contrast,
always produces a valid URL, regardless of the underlying host
operating system.
|
||||||
| now | The current time, formatted by calling java.util.Date.toString() with the default locale. The program's current working directory. For example, ${program:now} would produce something like "Fri Aug 20 15:18:56 EDT 2004" on a machine with a default English locale. | ||||||
| now delim fmt [delim lang delim country]] |
The current date/time, formatted with the specified
java.text.SimpleDateFormat
format string. If specified, the given locale and country code
will be used; otherwise, the default system locale will be
used. lang is a Java language code, such as "en", "fr",
etc. country is a 2-letter country code, e.g., "UK",
"US", "CA", etc. delim is a user-chosen delimiter that
separates the variable name ("now")
from the format and the optional locale fields. The delimiter
can be anything that doesn't appear in the format string, the
variable name, or the locale. For example:
Note: SimpleDateFormat requires that literal strings (i.e., strings that should not be processed as part of the format) be enclosed in quotes. For instance: yyyy.MM.dd 'at' hh:mm:ss z Because single quotes are special characters in configuration files, it's important to escape them if you use them inside date formats. So, to include the above string in a configuration file's ${program:now} reference, use the following:
See Suppressing Metacharacter Expansion and Variable Substitution, below, for more details. |
For example:
| Variable Reference | Explanation | Sample |
|---|---|---|
| ${system:user.home} | Substitutes the value of the system property "user.home" (usually set to the current user's home directory). |
[curn]
myCurnDir = ${system:user.home}/.curn
|
| ${curn:myCurnDir} | Substitutes the value of variable "myCurnDir" from section the [curn] section. | [Feed_Wired]
URL: http://www.wired.com/news_drop/netcenter/netcenter.rdf
SaveAs: ${curn:myCurnDir}/feeds/wired.rdf
|
| ${myCurnDir} | Substitutes the value of variable "myCurnDir" from the current section. | [curn]
myCurnDir = ${system:user.home}/.curn
CacheFile = ${myCurnDir}/cache
|
The configuration file also supports a simple conditional-substitution logic, which allows you to specify a default value to be substituted if a variable is empty or does not have a value. The general form of a conditional substitution is:
${var?some default value}
If ${var} does not have a value, or has an empty string as its value,
the string "some default value" will be substituted.
To prevent the parser from interpreting metacharacter sequences, variable substitutions and other special characters, enclose part or all of the value in single quotes. (See [3] for additional comments.) For example, suppose you want to set variable "prompt" to the literal value "Enter value. To specify a newline, use \n." The following configuration file line will do the trick:
prompt: 'Enter value. To specify a newline, use \n'
Similarly, to set variable "abc" to the literal string "${foo}" suppressing the parser's attempts to expand "${foo}" as a variable reference, you could use:
abc: '${foo}'
To include a literal single quote, you must escape it with a backslash.
Regardless of the underlying operating system, path names in the curn configuration file can always use Unix-style forward slash ("/") characters. At runtime curn will convert the path names to use the appropriate file separator (e.g., "\" on Windows). This capability provides two benefits:
A special include directive permits inline inclusion of another configuration file. The include directive takes two forms:
%include "path" %include "URL"
For example:
%include "/home/bmc/mytools/common.cfg" %include "file:///home/bmc/mytools/common.cfg"
The included file may contain any content that is valid for this parser. It may contain just variable definitions (i.e., the contents of a section, without the section header), or it may contain a complete configuration file, with individual sections. Since the parser recognizes a variable syntax that is essentially identical to Java's properties file syntax, it's also legal to include a properties file, provided it's included within a valid section.
Attempting to include a file from itself, either directly or indirectly, will cause curn to abort processing.
A comment line is a one whose first non-whitespace character is a "#" or a "!". This comment syntax is identical to the one supported by a Java properties file. A blank line is a line containing no content, or one containing only whitespace. Blank lines and comments are ignored. For example:
[curn] # --------------------------------------------------------------------------- # CacheFile: The full path to the file in which curn should cache URLs. # curn uses the cache file to keep track of which URLs it # has already received and displayed, and when it received them. # Under normal operation, curn won't display a URL it has # already displayed and cached. # # This path may contain the ~ metacharacter, to denote the # invoking user's home directory. # # The use of a cache can be disabled by omitting this parameter. # Use the "NoCacheUpdate" parameter to tell curn to read, # but not update, the cache. # # See also: Configuration parameter "NoCacheUpdate" # Command line parameter -C, --nocache # # OPTIONAL. Default: None CacheFile: test.cache |
curn's configuration file has three kinds of sections:
All other sections in the configuration file are parsed (and subject to syntactic constraints), but otherwise ignored. Thus, it's perfectly legal to have a separate section, e.g., "[var]", where you define variables that exist solely to be substituted into other sections.
Any boolean parameter (i.e., one documented as taking a true or false value) can also take a value of "0" (false), "1" (true), "no" (false) or "yes" (true).
This section contains variable global parameters. Each is described in detail, below. (Parameters marked with plug-in are handled by one of curn's stock plug-ins, rather than by the core code.)
| Variable | Argument type | Description | Required? | Default value | See also |
|---|---|---|---|---|---|
| AllowEmbeddedHTML plug-in |
Boolean | Default setting for whether or not to allow
embedded HTML in certain RSS feed elements, such as description,
author, etc. Some RSS formats permit embedded HTML. Setting this
parameter to true preserves any embedded
HTML markup within a feed; setting this parameter to
false causes embedded HTML to be stripped.
Note that certain output handlers will strip HTML regardless of this setting. An output handler that produces text, for instance, is not required to support embedded HTML. This global parameter can be overridden on a per-feed basis. Notes:
|
No | false | |
| CacheFile | File name or path name | The full path to the file in which curn
should cache feed item data. curn uses the cache file to
keep track of which feed items it has already received and
displayed, and when it received them. Under normal operation,
curn won't display a feed item it has already displayed
and cached.
The use of a cache can be disabled by omitting this parameter. Use the NoCacheUpdate parameter, or the --no-update command line option, to tell curn to read, but not update, the cache. The cache file is an XML file. However, since it is generated automatically, you should not edit it. |
No | None. (If not specified, no cache is used.) |
NoCacheUpdate CacheBackup --no-cache --no-update |
| CacheBackup | File name or path name. |
The full path to a cache backup file. If this
parameter is defined, curn will copy the cache to this
backup file before updating the cache on disk.
Warning: This parameter was replaced with TotalCacheBackups in curn version 2.6. |
No | None. |
CacheFile TotalCacheBackups |
| CommonXMLFixups plug-in |
Boolean | Enables or disables the Common XML Fixups plug-in,
which attempts to fix common syntax problems in downloaded XML feeds.
There is some XML badness that is surprisingly common across feeds,
including (but not limited to):
This global parameter can be overridden on a per-feed basis. This global setting defines the default value for all feeds that don't explicitly set it themselves. |
No | false | The per-feed CommonXMLFixups setting |
| DaysToCache | Positive integer | Default maximum number of days to cache an already-read item. This parameter is used when the configuration section for a particular site lacks its own DaysToCache value. Items older than this many days are tossed from the cache when it's read, which means curn forgets that it saw them before. A value of 0 renders the cache is essentially useless (i.e., 0 ensures that curn always forgets items that are cached). The special value "NoLimit" causes curn to leave items in the cache forever. | No | 365 (days) | Per-feed DaysToCache parameter |
| GzipDownload plug-in |
Boolean |
If set to true, this parameter directs curn to use the
"Accept-Encoding: gzip"
HTTP header when retrieving an RSS feed from an HTTP server.
Since RSS feeds are XML, they typically compress well;
retrieving gzipped data, rather than the uncompressed HTML, can
save a significant amount of time and network bandwidth. (Note,
however, that HTTP servers are not obligated to honor a request
to gzip the feed.) This parameter can be
overridden on a per-feed basis.
This global value sets the default value.
For backward compatibility, this parameter can also be specified as GetGzippedFeeds. |
No | true | |
| IgnoreArticlesOlderThan plug-in |
String | Provides a way to ignore articles that are
older than a certain interval. Intervals are
expressed in a natural language syntax. For
instance:
Valid interval names (in English) are:IgnoreArticlesOlderThan: 3 days IgnoreArticlesOlderThan: 1 week IgnoreArticlesOlderThan: 365 days IgnoreArticlesOlderThan: 12 hours, 30 minutes
"year" and "month" are not supported, to avoid the irregularity of leaps years and different month lengths, respectively. The actual conversion of the strings is done by the org.clapper.util library's Duration class. See that class for more details. This global value sets the default value. NOTE: The plug-in that implements this capability uses the timestamp in the XML to determine "older than", not the cached timestamp, because the intent is to weed old articles from a feed that you haven't processed in a while (or perhaps are processing for the first time.) If the article has no timestamp in the XML, it is assumed to be current, i.e., to have a date/time of "now". |
No | None (i.e., Articles are not ignored based on age) | Per-feed IgnoreArticlesOlderThan parameter |
| MailOutputTo plug-in |
String | One or more comma-separated email addresses to receive the output. This parameter is optional. If any email addresses are specified, then curn sends its generated output to those addresses. Depending on the setting of the MailIndividualArticles parameter, curn either sends a single MIME multipart/alternative email with all the output, or it sends one message per article found in the feeds. See MailIndividualArticles for details. | No | Output is not emailed. |
SMTPHost MailFrom MailSubject |
| MailFrom plug-in |
String | The email address to use as the sender, when mailing output. The address can be a full RFC 2822-compliant address (e.g., "Joe Blow <joe@example.org>") or just a simple address (e.g., "joe@example.org"). This parameter is only honored when at least one email address is specified via the MailOutputTo configuration parameter. | No | curn constructs its own "from" address from the user name associated with running process and the current host name. |
SMTPHost MailSubject MailOutputTo |
| MailSubject plug-in |
String | The subject line to use when mailing output. This parameter is only honored when at least one email address is specified via the MailOutputTo configuration parameter. | No | curn output |
SMTPHost MailFrom MailOutputTo |
| MailIndividualArticles plug-in |
Boolean | If set to true, this parameter instructs curn to send an
email per article; that is, instead of a single email containing
the output from all output handlers, curn will send one
individual email for each article. If curn finds 20
unread articles, it'll send 20 email messages, each with a single
article; if there are 100 unread articles, curn will send
100 separate email messages. If there are multiple output handlers
that actually produce output, then each article email will be a
MIME multipart/alternate email containing separate attachments from
each output handler for that article.
If this parameter is false or absent, curn will send one email containing the generated output for all feeds and items. If there are multiple output handlers that actually produce output, curn will combine all the outputs into a single MIME multipart/alternative email. Each output handler's output will be a separate multipart/alternative attachment. (curn assumes that each output handler is generating an alternate form of the same information.) Output handlers that don't generate output are skipped. If none of the configured output handlers generate any output, then curn doesn't send an email message. This parameter is ignored if no email addresses are specified by the MailOutputTo parameter. WARNINGS:
|
No | Output is not emailed. |
SMTPHost MailFrom MailSubject |
| MaxArticlesToShow
plug-in |
Integer | Sets an upper limit on the number of articles displayed for the feed. This maximum is applied after the articles are sorted (see SortBy) and after the ShowArticlesFor and IgnoreArticlesOlderThan policies are applied. This parameter can be overridden on a per-feed basis. This global parameter sets the default value. | No | None (i.e., no maximum) | |
| MaxSummarySize plug-in |
Positive integer | If an article has a summary, you can optionally set a maximum size for the summary. If a summary exceeds the maximum size, curn will truncate it and add a trailing ellipsis ("...") to indicate the truncation. A value of 0 effectively disables this option. This parameter can be overridden on a per-feed basis. This global parameter sets the default value. | No | 0 (i.e., no limit on summary size) | ReplaceEmptySummaryWith |
| MaxThreads | Positive integer | Defines the number of concurrent download threads. If this value is greater than 1, then curn will spawn that many worker threads to handle the downloading and parsing of the RSS feeds concurrently. If this value is 1, curn will process the feeds sequentially. If this value is greater than 1, but less than the total number of feeds, some of the worker threads will end up processing more than one feed (sequentially). Values less than 1 are illegal. | No | false | |
| NoCacheUpdate | Boolean | If set to true (and if a cache file is specified), this parameter tells curn to read the cache file and honor its contents, but not to save the modified in-memory cache back to disk. | No | false |
CacheFile --no-update |
| ParserClass | String |
The full name of the underlying RSS parser class to be used.
This class must implement the
org.clapper.curn.parser.RSSParser
interface. It can be a first-class parser of its own, or
it can be nothing more than an adapter for a third party
RSS parser class.
curn comes bundled with one parser:
Any class that implements org.clapper.curn.parser.RSSParser may be used as a value for ParserClass. |
No | org.clapper.curn.parser.rome.RSSParserAdapter | |
| Quiet | Boolean | Normally, if an RSS feed contains no new items, most curn output handlers display the site's name and URL, followed by something like "No new items." Similarly, if curn can't contact a feed site, or if the site's XML is unparseable, curn displays an error message. This option tells curn to silently ignore sites with no data or bad XML. Setting Quiet to true tells curn to suppress both of the above displays. | No | false |
--quiet --no-quiet |
| ReplaceEmptySummaryWith plug-in |
String |
Tells curn what to do when the summary for a feed
article is missing. Legal values:
|
No | nothing | Per-feed SortBy parameter |
| ShowArticlesFor | String |
How long to display show articles from feeds. If specified, this
parameter is only used when individual feeds don't specify a ShowArticlesFor
parameter if their own. The value is a time interval, expressed using the same natural
language strings supported by the IgnoreArticlesOlderThan
parameter. For instance:
Valid interval names (in English) are:ShowArticlesFor: 3 days ShowArticlesFor: 1 week ShowArticlesFor: 365 days ShowArticlesFor: 12 hours, 30 minutes
NOTE: The plug-in that implements this capability uses the timestamp in the curn cache when aging an article, not the timestamp in the feed's XML. That's because the intent of this configuration parameter is to permit you to keep showing an article for a certain amount of time after the article was first displayed. The article timestamp in the XML is the time that the article was published, not the time that curn first displayed it. The time in the curn cache represents the time that curn first saw (and presumably displayed) the article. WARNINGS:
|
No | 1 millisecond (i.e., show each article once) | Per-feed ShowArticlesFor parameter |
| ShowAuthors plug-in |
Boolean | If set to true, this configuration item instructs curn to display author version for each feed item, if available. This global value can be overridden on a per-feed basis. | No | false | |
| ShowDates plug-in |
Boolean | Some RSS feeds or the individual items within each feed contain dates (usually corresponding to the publication dates for the feed or item). If this option is set to true, then curn will display the date for each item that provides a date. This global value can be overridden on a per-feed basis. | No | false | |
| ShowRSSVersion | Boolean | Display the RSS version for each feed. | No | false | |
| SummaryOnly plug-in |
Boolean |
Some RSS feeds provide a description for each item, in addition
to the (brief) title. Setting SummaryOnly
to true suppresses display of the description. This parameter
can be overridden on a per-feed basis.
This global value sets the default value.
WARNING: This parameter is deprecated. Use the ReplaceEmptySummaryWith parameter, instead. |
No | false | ReplaceEmptySummaryWith |
| SMTPHost plug-in |
String | The SMTP host to use when mailing output. This parameter is only honored when at least one email address is specified via the MailOutputTo configuration parameter. | No | localhost | per-feed ReplaceEmptySummaryWith parameter |
| SortBy plug-in |
String |
Default method to use to sort items within each feed. This
parameter is used when the configuration section for a particular
site lacks its own SortBy
value. Legal values:
|
No | none | Per-feed SortBy parameter |
| TotalCacheBackups | Positive integer |
The total number of cache backup copies to keep. If this parameter
is greater than 0, then curn will keep that many numbered backups
of the cache. If the cache exists when curn attempts to update
it, curn will copy the existing cache to
cacheFile.0. If
cacheFile.0 exists, it will be moved to
cacheFile.1 first, and so on down the line,
until the maximum number of cache backup files exists.
The newest cache is always the one without a numeric extension.
the oldest file is the one with the largest numeric extension.
This parameter is useful if you want to roll back to a previous cache.
If this parameter is not specified, or is 0, then no cache backups are made. |
No | 0 | CacheFile |
| UserAgent plug-in |
String | Specifies the default HTTP User-Agent header to use. This configuration parameter permits you to have curn masquerade as a known browser, for sites that refuse access to robots and spiders and other unknown web clients. This global value is used when the section for a particular feed does not supply its own UserAgent value. | No | A string that identifies curn as the user agent. | Per-feed UserAgent parameter. |
| ZipOutputTo plug-in |
String | Path to a zip file to receive all output generated by output handlers. | No | None |
The curn configuration file also contains a list of RSS feeds to be polled. Each feed must be specified in its own section in the configuration file. The name of the section must start with the string "Feed". If more than one feed is present, then each section name must also have additional characters, to make the section name unique. The following section names are all valid for RSS feed sections.
Each feed section supports the following parameters. (Parameters marked with plug-in are handled by one of curn's stock plug-ins, rather than by the core code.)
| Variable | Argument type | Description | Required? | Default Value | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| AllowEmbeddedHTML plug-in |
Boolean | Whether or not to allow
embedded HTML in certain RSS feed elements, such as description,
author, etc, for this feed. Some RSS formats permit embedded HTML; setting this
parameter to true tells curn
output handlers that they should
preserve such embedded HTML markup, if possible. If this parameter
is false, any embedded HTML is stripped.
Note that certain output handlers will strip HTML regardless of this setting. An output handler that produces text, for instance, is not required to support embedded HTML. Notes:
|
No | false | ||||||
| ArticleFilter plug-in |
Strings | Specifies a set of filters to discard feed item (article)
content, based on regular expressions.
The filtering syntax is (shamelessly) adapted from the rawdog RSS reader's article-filter plug-in. A feed filter is configured by adding an ArticleFilter property to the feed's configuration section. The property's value consists of one or more filter command sequences, separated by ";" characters. (The ";" must be surrounded by white space; see below.) Each filter command sequence is of this form: show|hide [field 'regexp' [field 'regexp' ...]]field can be one of:
If the command is "hide", then the entry will be hidden if the
specified field matches the regular expression. If the command is
"show", then the entry will be shown if the field matches the regular
expression. If there are no fields or regular expressions, then the
command is a wildcard match. That is:
is equivalent to:hide and:hide any '.*' is equivalent to:show Wildcard matches are useful in situations where you want to hide or show "everything but ...". See the examples, below, for details.show any '.*' All filtering commands are processed, and the end result is what defines whether a given entry is suppressed or not. Regular expressions are matched in a case-blind fashion. The match logic also:
Examples Some examples will help clarify the syntax. For example, the following set of commands hide all articles with the phrase "mash-up" (because mash-ups bore me): The following, more complicated, entry hides everything by author "Joe Blow", unless the title has the word "rant" in it ('cause his rants are hilarious):ArticleFilter: hide any 'mash[- \t]?up'
Finally, this example hides everything except articles by Moe Howard:
ArticleFilter: hide ; show author '^moe *howard$' |
No | Articles are not filtered | ||||||
| CommonXMLFixups plug-in |
Boolean |
Enables or disables the Common XML Fixups plug-in, which attempts
to fix common syntax problems in downloaded XML feeds. Among the
corrections this plug-in makes:
|
No | The value of the global CommonXMLFixups parameter in the [curn] section, or false, if that value is not set. | ||||||
| DaysToCache | Positive integer | Maximum number of days to cache an already-read item for this feed. This value locally overrides the global DaysToCache default in the [curn] section. Items older than this many days are tossed from the cache when it's read, which means curn forgets that it saw them before. A value of 0 renders the cache is essentially useless for this feed (i.e., 0 ensures that curn always forgets items that are cached for this feed). The special value "NoLimit" causes curn to leave items in the cache forever. | No | The value of the global DaysToCache parameter in the [curn] section or 365 if that value is not set. | ||||||
| Disabled plug-in |
Boolean | If true, then the feed is skipped. If false, the feed is processed. This variable provides a simple way to disable a feed without having to comment its entire section out. | No | false | ||||||
| EditFeedURL EditItemURL plug-in |
String |
Apply the specified regular expression edit to the site's
feed URL (EditFeedURL) or to each of the
site's RSS item URLs (EditItemURL).
The value for this option consists of
a Perl 5-style substitution applied to the URL. For example:
Remove all the parameters from the URL: 's/?.*$//' (The PruneURLs parameter provides a simpler mechanism for this common operation.) Remove a "redirect" CGI from a site whose URLs look like: http://www.example.com/redir.cgi?http://... s+http://www.example.com/cgi-bin/redir.cgi?++ The substitution syntax supports perl's $1, $2, etc., grouping syntax. However, because the "$" character also introduces a configuration file variable reference, you must escape the "$" to use it in a regular expression. For instance, use either: s/^([a-z]+)foo(.*)\$/\$1bar\$2/ or 's/^([a-z]+)foo(.*)$/$1bar$2/' If there are backslashes in the string, you must escape them, as well, preferably by single-quoting the value. See Suppressing Metacharacter Expansion and Variable Substitution for more details. To get the equivalent of Perl 5 expression. s/^\*.*$// you must specify 's/^\*.*$//' This substitution syntax supports the following Perl-like modifiers, which are appended to the end of the substitution command:
The modifiers can be concatenated. Thus, 's/abc/xyz/ig' will match and replace all occurrences of the string "abc", whether upper-, lower- or mixed-case. Hint: When logging is enabled, curn will log the parsed expression at the "debug" log level. |
No | None | ||||||
| ForceEncoding | String | Force curn to ignore the character set
encoding advertised by the remote server (if any), and use the
character set specified by this configuration item, instead.
This is useful in the following cases:
This value should be a character set encoding that is recognized by the Java runtime environment. ForceCharacterEncoding is a synonym for this parameter, retained for backward compatibility. |
No |
|
||||||
| GzipDownload plug-in |
Boolean | If set to true, this parameter directs curn to use the "Accept-Encoding: gzip" HTTP header when retrieving this RSS feed from an HTTP server. Since RSS feeds are XML, they typically compress well; retrieving gzipped data, rather than the uncompressed HTML, can save a significant amount of time and network bandwidth. (Note, however, that HTTP servers are not obligated to honor a request to gzip the feed.) This parameter overrides the global GzipDownload. | No | true | ||||||
| IgnoreArticlesOlderThan plug-in |
String | Provides a way to ignore articles that are older than a certain interval. Intervals are expressed in a natural language syntax. Please see the documentation for the global IgnoreArticlesOlderThan parameter for a more complete description of this parameter. | No | The default, as defined by the global IgnoreArticlesOlderThan parameter. If no global IgnoreArticlesOlderThan value is set, then articles aren't ignored based on their age. | IgnoreDuplicateTitles plug-in |
Boolean |
If true, curn will ignore any item
whose title matches the title of another item in the feed. It
only compares titles within the feed itself; it does not
compare against titles of cached items.) Titles are compared
without regard to upper or lower case.
This feature (hack, really) is useful for sites whose feeds often contain duplicate items (with the same titles) that have different IDs and different URLs, and thus appear to be unique. (Yahoo! News feeds sometimes exhibit this trait.) |
No | false | |
| MaxArticlesToShow
plug-in |
Integer | Sets an upper limit on the number of articles displayed for the feed. This maximum is applied after the articles are sorted (see SortBy) and after the ShowArticlesFor and IgnoreArticlesOlderThan policies are applied. | No | The default, as defined by the global MaxArticlesToShow parameter. If no global MaxArticlesToShow value is set, then there is no maximum. | ||||||
| MaxSummarySize plug-in |
Positive integer | If an article has a summary, you can optionally set a maximum size for the summary. If a summary exceeds the maximum size, curn will truncate it and add a trailing ellipsis ("...") to indicate the truncation. A value of 0 effectively disables this option. This parameter overrides the global MaxSummarySize parameter. | No | 0 (i.e., no limit on summary size) | ||||||
| PreparseEditsuffix plug-in |
String |
A parameter in a Feed section that starts with
PreparseEdit (e.g.,
PreparseEdit1,
PreparseEditFoo, etc.)
defines a substitution to be applied to the downloaded XML
file before it is parsed. As with the
EditItemURL and
EditFeedURL
options, the value for this option this option consists of a
Perl 5-style substitution.
This capability is rarely needed, but it's sometimes useful for sites that serve unparseable, but easily fixed, XML. (Though the CommonXMLFixups capability covers a lot of these errors with less configuration.) For instance, one news site I read has an RSS channel whose title always contains an unescaped "&". The XML parser will not parse that feed; however, a simple preparse edit command of: 's/ & / \& /g' fixes the problem. (Again, this is one of the common XML syntax errors that CommonXMLFixups will correct.) Another use for PreparseEdit is fixing incorrectly formatted links in the RSS feed. Consider the following <link> element, for fictitious site news.example.com: <link>http://news.example.com&article=12573</link> This is a perfectly parseable URL, but it happens to be wrong. It's missing a "/" between ".com" and "&". It really ought to be: <link>http://news.example.com/&article=12573</link> A quick PreparseEdit rule can fix it, though: PreparseEdit: 's|(news.example.com)([^/]+)|$1/$2| Note the use of a different delimiter in the edit command ("|", instead of "/"). Any non-alphabetic character will work. Multiple instances of this parameter are permitted, as long as each instance's name begins with the string "PreparseEdit" and contains a unique suffix. The substitution syntax supports perl-style $1, $2, etc., grouping syntax. However, because the "$" character also introduces a configuration file variable reference, you must escape the "$" to use it in a regular expression. For instance, use either: s/^([a-z]+)foo(.*)\$/\$1bar\$2/ or 's/^([a-z]+)foo(.*)$/$1bar$2/' If there are backslashes in the string, you must escape them, as well, preferably by single-quoting the value. See Suppressing Metacharacter Expansion and Variable Substitution for more details. To get the equivalent of Perl 5 expression. s/^\*.*$// you must specify 's/^\*.*$//' This substitution syntax supports the following perl-like modifiers, which are appended to the end of the substitution command:
The modifiers can be concatenated. Thus, 's/abc/xyz/ig' will match and replace all occurrences of the string "abc", whether upper-, lower- or mixed-case. Hint: When logging is enabled, curn will log the parsed expression at the "debug" log level. |
No | None | ||||||
| PruneURLs plug-in |
Boolean | Specifies that all URLs should be pruned of their HTTP parameters. This action can also be accomplished with EditItemURL and EditFeedURL directives; PruneURLs is convenient shorthand for a common operation. | No | None | ||||||
| ReplaceEmptySummaryWith plug-in |
String |
Tells curn what to do when the summary for a feed
article is missing. Legal values:
|
No | nothing | ||||||
| SaveAs plug-in |
[options] Path | If set, this parameter specifies the path to
a file where curn should save the raw XML contents of
the feed, whenever it downloads the feed. This can be useful
if you have a master version of curn that downloads
a bunch of feeds, with multiple slave versions of curn
that then run against the downloaded files. (See
Being Bandwidth Friendly for a more
detailed discussion of this tactic.)
This configuration item takes a command line-style value: orSaveAs: [--backups total_backups] [--encoding encoding] path The parameters have the following meanings:SaveAs: [-b total_backups] [-e encoding] path
|
No | None | ||||||
| SaveAsEncoding plug-in |
String | If set, and if
SaveAs parameter is also
set, then this parameter specifies the character
encoding to use when saving the feed to the file.
If SaveAs is not set for the feed,
then any SaveAsEncoding parameter is
ignored. WARNING: This parameter is deprecated. Use the --encoding option to the SaveAs parameter, instead. | No | "utf-8". Note that this default value is the same as the default value of the ForceEncoding, for file URLs. This makes it easy to have one instance of curn save RSS feeds for other instances to parse. | ||||||
| SaveOnly plug-in |
Boolean | If set, and if SaveAs is also set, then the feed will be downloaded and saved, but not parsed and not included in the generated output. This parameter can be useful when Being Bandwidth Friendly. | No | false | ||||||
| SaveAsRSS plug-in |
[options] Path | If set, this parameter specifies that the feed should be rewritten in the specified
RSS format and saved to the specified file. This configuration item takes a command line-style value:
orSaveAsRSS: [--backups total_backups] [--type rsstype] [--encoding encoding] path The parameters have the following meanings:SaveAsRSS: [-b total_backups] [-t rsstype] [-e encoding] path
| No | None | ||||||
| SaveRSSOnly plug-in |
Boolean | If set, and if SaveAsRSS is also set, then the feed will be downloaded and parsed, and the RSS output will be generated, but the feed will not be passed to any output handlers (or, for that matter, any other plug-ins). | No | false | ||||||
| SavedBackups | Positive integer | Number of saved backups to keep. If this value is non-zero, the handler will back the SaveAs file up before overwriting it. Up to SavedBackups total backed-up files will be kept. A value of 0 disables the feature. | No | 0 | ||||||
| ShowArticles
plug-in |
String |
How long to display show articles from the feed. The value is a time
interval, expressed using the same natural language strings supported
by the IgnoreArticlesOlderThan
parameter. Please see the documentation for the global
ShowArticlesFor parameter for a more
complete description of this parameter.
This value overrides the global ShowArticlesFor parameter. |
No | The value of the global ShowArticlesFor parameter. | ||||||
| ShowAuthors
plug-in |
Boolean | If set to true, this configuration item instructs curn to display author version for this feed, if available. This value overrides the global ShowAuthors parameter. | No | The value of the global ShowAuthors parameter. | ||||||
| ShowDates
plug-in |
Boolean | If set to true, this configuration item instructs curn to display any dates associated with this feed, if available. This value overrides the global ShowDates parameter. | No | The value of the global ShowDates parameter. | ||||||
| SortBy plug-in |
String |
How to sort items in this feed. This value locally overrides the
global SortBy parameter
in the [curn] section.
Legal values:
|
No | The value of the global SortBy parameter in the [curn] section. | ||||||
| SummaryOnly plug-in |
Boolean |
Some RSS feeds provide a description for each item, in addition
to the (brief) title. Setting SummaryOnly
to true suppresses display of the description. This parameter
overrides the global
SummaryOnly parameter.
WARNING: This parameter is deprecated. Use the ReplaceEmptySummaryWith parameter, instead. |
No | The value of the global SummaryOnly parameter. | ||||||
| TitleOverride plug-in |
String | Specifies a string to be used as the site's title, instead of the title supplied in the RSS XML. Useful when the real site-supplied title is not suitable. | No | None | ||||||
| URL | String | The fully-qualified URL for the feed. For local files, use a "file:" URL. | Yes | None | ||||||
| UserAgent plug-in |
String | Specifies the HTTP User-Agent header to use when retrieving this feed. This local value overrides the global UserAgent parameter in the [curn] section. This configuration parameter permits you to have curn masquerade as a known browser, and it's useful for sites that refuse access to robots and spiders and other unknown web clients. | No | The value of the global UserAgent parameter in the [curn] section. |
As curn processes each RSS feed, it parses the XML and loads the new items into internal data structures. When it has finished processing the XML, it hands the parsed data structures to one or more output handlers. Output handlers are so called because they generally produce output that's to be displayed or emailed to the user—generally, but not always. An output handler may choose to save its output to a file, but not send the output back to curn; each of the built-in output handlers does exactly that if its SaveAs configuration parameter is set and its SaveOnly configuration parameters is true. Alternatively, the output handler may choose to convert the internal data structures to output that it publishes somewhere (e.g., via a network connection to an HTTP server).
Each output handler is specified in its own section in the configuration file. The name of the section must start with the string "OutputHandler". If more than one output handler is present, then each section name must also have additional characters, to make the section name unique. The following section names are all valid for output handler sections.
If no OutputHandler sections are present in the configuration file, curn skips the RSS XML parsing phase. (There's not reason to parse the XML if there are no output handlers to process the parsed feed data.) If there are no output handlers, curn may or may not download individual feeds. If a given feed has no SaveAs setting, and there are no output handlers, then curn skips the feed entirely. After all, there's no sense wasting time downloading the feed, if the feed isn't being parsed or saved. However, if the feed does have a SaveAs setting, curn will download and save the XML (assuming it has changed) even if XML parsing is disabled.
All output handler sections take two variables. In addition, individual output handlers can require configuration items of their own. The two variables common to all output handlers are described below.
| Variable | Argument type | Description | Required? | Default Value |
|---|---|---|---|---|
| Class | String | Identifies Java class that implements the output handler. (The class must implement the org.clapper.curn.OutputHandler interface. See Writing Your Own Output Handler for details.) | Yes | |
| Disabled | Boolean | If true, the output handler is skipped. If false, the output handler is processed. This variable provides a simple way to disable an output handler without having to comment its entire section out. | No | false |
There are some output handler examples following the next section.
curn comes bundled with the following built-in output handlers.
The FreeMarkerOutputHandler, introduced in curn version 2.6, is both simple and flexible. It uses the FreeMarker template engine to convert a template to an output file. FreeMarker templates can be used to generate nearly any kind of textual output file, from HTML and XML to simple text. In fact, the HTMLOutputHandler, TextOutputHandler, and SimpleSummaryOutputHandler have been reimplemented to use the FreeMarkerOutputHandler in conjunction with built-in templates that produce the appropriate kind of output.
| Additional Configuration Items | ||||
|---|---|---|---|---|
| Variable | Argument type | Explanation | Required? | Default value |
| AllowEmbeddedHTML | Boolean | Whether or not the specified template supports embedded HTML. If embedded HTML is found within an RSS item, it will be included in the generated output only if (a) this parameter is true, and (b) the AllowEmbeddedHTML parameter for the feed is also true. Otherwise, embedded HTML will be stripped from the item. | No | false |
| Encoding | String | Specify the character encoding to use when writing the output file. | No | "utf-8" |
| SaveAs | File name or path name | Save a copy of the generated HTML to the specified file. The argument is the path to the file. WARNING: The syntax of this parameter is different from the syntax of the SaveAs parameter for a feed. | No | None (i.e., no copy is saved) |
| SaveOnly | Boolean | If true and if SaveAs is defined, then save a copy of the generated HTML, but don't make it available to the user. (i.e., Don't display it on standard output, and don't email it.) | No | false |
| ShowCurnInfo | Boolean | Whether or not to display the curn version, curn configuration file path, and other curn-related information at the bottom of the generated HTML. | No | true |
| TemplateFile | Two strings | Specifies the location of the FreeMarker template file.
The location is specified with three parameters:
The form of the identifier string depends on the type value.
| ||