org.clapper.util.regex
Class RegexUtil

java.lang.Object
  extended by org.clapper.util.regex.RegexUtil

public class RegexUtil
extends java.lang.Object

This is a utility class implementing some common regular expression-based operations, using the java.util.regex classes. The various operations are briefly described here; see the individual methods for full details.

Substitution

The substitute(java.lang.String, java.lang.String) method implements Perl-like regular expression substitution. It takes an edit string representing the substitution, and a string to be edited. It returns the possibly edited string. The substitution syntax is similar to Perl:

s/regex/replacement/[g][i][m][o][x]

The regular expressions compiled once, and the compiled versions are cached in an internal LRU buffer. The buffer's size is fixed at the time of instantiation.

See the documentation for the substitute(java.lang.String, java.lang.String) method for full details.


Field Summary
static int DEFAULT_LRU_BUFFER_SIZE
          Default size of the internal LRU buffer that is used to hold compiled regular expressions.
 
Constructor Summary
RegexUtil()
          Allocate a new RegexUtil object.
 
Method Summary
 java.lang.String substitute(java.lang.String substitutionCommand, java.lang.String s)
          This method implements Perl-like regular expression substitution.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_LRU_BUFFER_SIZE

public static final int DEFAULT_LRU_BUFFER_SIZE
Default size of the internal LRU buffer that is used to hold compiled regular expressions. The buffer will never contain any more than this many compiled regular expressions. Once the buffer is full, any newly compiled regular expression (e.g., as a result of a substitute(java.lang.String, java.lang.String) call made with a new expression) will replace the oldest (least recently used) item in the buffer.

See Also:
Constant Field Values
Constructor Detail

RegexUtil

public RegexUtil()
Allocate a new RegexUtil object. The object's internal LRU buffer will cache up to 100 substitution regular expressions.

Method Detail

substitute

public java.lang.String substitute(java.lang.String substitutionCommand,
                                   java.lang.String s)
                            throws RegexException
This method implements Perl-like regular expression substitution. It takes an edit string representing the substitution, and a string to be edited. It returns the possibly edited string. The substitution syntax is similar to Perl:

s/regex/replacement/[g][i][m][o][x]

The regular expressions compiled once, and the compiled versions are cached in an internal LRU buffer. The buffer's size is fixed at the time of instantiation.

Any non-alphabetic, printing character may be used in place of the slashes. The modifiers generally have the same meanings as in Perl, though some of them aren't actually supported (but are present solely for syntactical compatibility).

Modifier Meaning
g Substitute for all occurrences of the regular expression. not just the first one.
i Do case-insensitive pattern matching. This modifier corresponds to the java.util.regex.Pattern.CASE_INSENSITIVE flag.
m Treat the string is consisting of multiple lines. This modifier corresponds to the java.util.regex.Pattern.MULTILINE flag. It changes the meaning of "^" and "$" so that they match just after or just before, respectively, a line terminator or the end of the input sequence. By default these expressions only match at the beginning and the end of the entire input sequence.
o Compile once. This modifier is ignored, since regular expressions are always compiled once and stored in the internal LRU buffer.
u Enables Unicode-aware case folding. This modifier corresponds to the java.util.regex.UNICODE_CASE flag. When this modifier is specified, case-insensitive matching, when enabled by the CASE_INSENSITIVE flag, is done in a manner consistent with the Unicode Standard. By default, case-insensitive matching assumes that only characters in the US-ASCII charset are being matched. Specifying this flag may impose a performance penalty.
x Permits whitespace and comments in a pattern. This modifier corresponds to the java.util.regex.Pattern.COMMENTS flag. When this mode is active, whitespace is ignored, and embedded comments starting with # are ignored until the end of a line.

Parameters:
substitutionCommand - the "s///" substitution command
s - string to edit
Returns:
the possibly edited string
Throws:
RegexException - bad expression, bad regular expression, etc.


Copyright © 2004-2007 Brian M. Clapper. All Rights Reserved.