org.clapper.util.misc
Class FileHashMap<K,V>

java.lang.Object
  extended by java.util.AbstractMap<K,V>
      extended by org.clapper.util.misc.FileHashMap<K,V>
All Implemented Interfaces:
java.util.Map<K,V>

public class FileHashMap<K,V>
extends java.util.AbstractMap<K,V>

FileHashMap implements a java.util.Map that keeps the keys in memory, but stores the values as serialized objects in a random access disk file. When an application attempts to access the value associated with a given key, the FileHashMap object determines the location of the serialized value object, seeks to that location in the random access file, and reads and deserializes the value object. In a sense, a FileHashMap is akin to a simple, classic indexed sequential file. This approach gives a FileHashMap object the following characteristics:

File Name Conventions

A FileHashMap is instantiated with a base file name (i.e., a file name without an extension). This base file name specifies the path to the file(s) that used to store the map's data; the FileHashMap tacks the following extensions onto the prefix to arrive at the actual file names.

Extension Meaning
.ix The saved in-memory index. This file is created only if the FileHashMap is not marked as transient. (See below.) The INDEX_FILE_SUFFIX constant defines this string.
.db The on-disk data (value) file, where the serialized objects are stored. The DATA_FILE_SUFFIX constant defines this string.

For instance, if you create a FileHashMap with the statement

 Map map = new FileHashMap("/tmp/mymap");
 

the serialized value objects will be stored in file "/tmp/mymap.db", and the index (if saved) will be stored in "/tmp/mymap.ix".

Transient versus Persistent Maps

A FileHashMap is persistent by default. When a FileHashMap object is finalized or explicitly closed (using the close() method), its in-memory index is saved to disk. You can reopen the saved map by instantiating another FileHashMap object, and specifying the same file prefix. The new FileHashMap object will load its initial in-memory index from the saved index; any modifications to the new object will be written back to the on-disk index with the new object finalized or closed.

A FileHashMap can be marked as non-persistent, or transient, by passing the TRANSIENT flag to the constructor. A transient map is not saved; its disk files are removed when the map is finalized or manually closed. You cannot create a transient map using a file prefix that specifies an existing, saved map. That is, the disk files for a transient map must not exist at the time the map is created.

Optimizations

The FileHashMap class attempts to optimize access to the disk-resident values and to conserve memory use. These optimizations include the following specific measures.

Sequential access to values

The iterators attempt to minimize file seeking while looping over the stored values. The values() method returns a Set whose iterator loops through the values in the order they were written to the data file. Traversing the value set via the iterator will access the FileHashMap's data file sequentially, from top to bottom. For example, the following code fragment loops through a FileHashMap's values; because the iterator returns the values in the order they appear in the data file, the code fragment accesses the data file sequentially:

 FileHashMap map = new FileHashMap("myfile");

 for (Iterator values = map.values().iterator(); values.hasNext(); )
 {
     Object value = it.next();
     System.out.println(value);
 }
 

Similarly, the Set objects returned by the keySet() and entrySet() methods use iterators that return the keys in the order that the associated values appear in the disk file. For example, the following code fragment also accesses the data file sequentially:

 FileHashMap map = new FileHashMap("myfile");

 for (Iterator keys = map.keySet().iterator(); keys.hasNext(); )
 {
     Object key   = it.next();
     Object value = map.get(key);
     System.out.println("key=" + key + ", value=" + value);
 }
 

Note that this optimization strategy can be foiled in a number of ways. For instance, the sequential access behavior can be thwarted if a second thread is accessing the map while the first thread is iterating over it, or if the thread that's iterating over the values is simultaneously inserting values in the table.

Accessing the keys, without reading the values, does not result in any file access at all, because the keys are cached in memory. See the next section for more details.

Memory Conservation

The values stored in the map are serialized and written to a data file; they are not cached in memory at all. Therefore, you can store a lot more objects in a FileHashMap before running out of memory (assuming you have enough disk space available).

The keys are stored in memory, however. Each key is associated with a special internal object that keeps track of the location and length of the associated value in the disk file. As you place more items in a FileHashMap, the index will grow, consuming memory. But it will consume far less memory than if you use an in-memory Map, such as a java.util.HashMap, which stores the keys and the values in memory.

The Iterator and Set objects returned by the entrySet() and values() methods contain proxy objects, not real values. That is, they do not actually contain any values. Instead, they contain proxies for the values, objects that specify the locations of the values in the disk file. A value is loaded from disk only when you actually attempt to retrieve it from the Iterator or Set.

Reclaiming Gaps in the File

Normally, when you remove an object from the map, the space where the object was stored in the data file is not reclaimed. This strategy allows for faster insertions, since new objects are always added to the end of the disk file. However, for long-lived FileHashMap objects that are periodically modified, this strategy may not be appropriate. For that reason, you can pass a special RECLAIM_FILE_GAPS flag to the constructor. If specified, the flag tells the object to keep track of "gaps" in the file, and reuse them if possible. When a new object is inserted into the map, and RECLAIM_FILE_GAPS is enabled, the object will attempt to find the smallest unused area in the file to accommodate the new object. It will only add the new object to the end of the file (enlarging the file) if it cannot find a suitable gap.

This mode is not the default, because it can add time to processing. However, it does not access the file at all; the file gap maintenance logic uses in-memory data only. So, while it adds a small amount of computational overhead, the difference between running with RECLAIM_FILE_GAPS enabled and running with it disabled should not be dramatic.

Restrictions

This class currently has the following restrictions and unimplemented behavior.

Version:
$Revision: 6984 $
Author:
Copyright © 2004-2007 Brian M. Clapper

Nested Class Summary
 
Nested classes/interfaces inherited from class java.util.AbstractMap
java.util.AbstractMap.SimpleEntry<K,V>, java.util.AbstractMap.SimpleImmutableEntry<K,V>
 
Nested classes/interfaces inherited from interface java.util.Map
java.util.Map.Entry<K,V>
 
Field Summary
static java.lang.String DATA_FILE_SUFFIX
          Data file suffix.
static int FORCE_OVERWRITE
          Constructor flag value: Ignored unless TRANSIENT is also specified, FORCE_OVERWRITE tells the constructor to overwrite the on-disk files if they already exist, instead of throwing an exception.
static java.lang.String INDEX_FILE_SUFFIX
          Index file suffix.
static int NO_CREATE
          Constructor flag value: If specified, the disk files will not be created if they don't exist; instead, the constructor will throw an exception.
static int RECLAIM_FILE_GAPS
          Constructor flag value: Tells the object to reclaim unused space in the file, whenever possible.
static int TRANSIENT
          Constructor flag value: If specified, the on-disk hash table is considered to be transient and will be removed when this object is closed or finalized.
 
Constructor Summary
FileHashMap()
          Create a new transient FileHashMap object.
FileHashMap(java.lang.String tempFilePrefix)
          Create a new transient FileHashMap object.
FileHashMap(java.lang.String pathPrefix, int flags)
          Create a new FileHashMap object that will read its data from and/or store its data in files derived from the specified prefix.
 
Method Summary
 void clear()
          Removes all mappings from this map.
 void close()
          Close this map.
 boolean containsKey(java.lang.Object key)
          Returns true if this map contains a mapping for the specified key.
 boolean containsValue(java.lang.Object value)
          Returns true if this map maps one or more keys that are mapped to the specified value.
 void delete()
          Deletes the files backing this FileHashMap.
 java.util.Set<java.util.Map.Entry<K,V>> entrySet()
          Returns a "thin" set view of the mappings contained in this map.
 boolean equals(java.lang.Object o)
          Compares the specified object with this map for equality.
protected  void finalize()
          Finalizer
 V get(java.lang.Object key)
          Returns the value associated with the the specified key.
 int hashCode()
          Returns the hash code value for this map.
 boolean isValid()
          Determine whether the FileHashMap is valid or not.
 java.util.Set<K> keySet()
          Returns a Set containing all the keys in this map.
 V put(K key, V value)
          Associates the specified value with the specified key in this map.
 V remove(java.lang.Object key)
          Removes the mapping for this key from this map, if present.
 void save()
          Save any in-memory index changes to disk without closing the map.
 int size()
          Returns the number of key-value mappings in this map.
 java.util.Collection<V> values()
          Returns a collection view of the values contained in this map.
 
Methods inherited from class java.util.AbstractMap
clone, isEmpty, putAll, toString
 
Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

INDEX_FILE_SUFFIX

public static final java.lang.String INDEX_FILE_SUFFIX
Index file suffix.

See Also:
Constant Field Values

DATA_FILE_SUFFIX

public static final java.lang.String DATA_FILE_SUFFIX
Data file suffix.

See Also:
Constant Field Values

NO_CREATE

public static final int NO_CREATE
Constructor flag value: If specified, the disk files will not be created if they don't exist; instead, the constructor will throw an exception. By default, if the files don't exist, the constructor creates them.

See Also:
Constant Field Values

TRANSIENT

public static final int TRANSIENT
Constructor flag value: If specified, the on-disk hash table is considered to be transient and will be removed when this object is closed or finalized. Note that this flag cannot be combined with NO_CREATE. Also, if TRANSIENT is specified to the constructor, but the on-disk files already exist, the constructor will throw an exception unless FORCE_OVERWRITE is also specified.

See Also:
Constant Field Values

FORCE_OVERWRITE

public static final int FORCE_OVERWRITE
Constructor flag value: Ignored unless TRANSIENT is also specified, FORCE_OVERWRITE tells the constructor to overwrite the on-disk files if they already exist, instead of throwing an exception.

See Also:
Constant Field Values

RECLAIM_FILE_GAPS

public static final int RECLAIM_FILE_GAPS
Constructor flag value: Tells the object to reclaim unused space in the file, whenever possible. If this flag is not specified, then the space occupied by objects removed from the hash table is not reclaimed. This flag is not persistent. It's possible to create a persistent FileHashMap without using this flag, and later reopen the on-disk map via a new FileHashMap object that does have this flag set. Whenever the flag is set, the in-memory FileHashMap object attempts to reuse gaps in the file. When the flag is not set, the in-memory FileHashMap object ignores gaps in the file.

See Also:
Constant Field Values
Constructor Detail

FileHashMap

public FileHashMap()
            throws java.io.IOException

Create a new transient FileHashMap object. The transient disk files will be in the normal temporary directory and will have the prefix "FileHashMap". The names of the associated disk files are guaranteed to be unique, and will go away when the object is finalized or when the Java VM goes away. Call this constructor is identical to calling FileHashMap(String) with a null filePrefix parameter.

Note that this constructor implicitly sets the TRANSIENT flag.

Throws:
java.io.IOException - Unable to create temp file
See Also:
FileHashMap(String,int), FileHashMap(String)

FileHashMap

public FileHashMap(java.lang.String tempFilePrefix)
            throws java.io.IOException

Create a new transient FileHashMap object. The transient disk files will be in the normal temporary directory and will have the specified temporary file prefix. If no temporary file prefix is specified (i.e., the tempFilePrefix parameter is null), then "fhm" will be used. The names of the associated disk files are guaranteed to be unique, and they will go away when the object is finalized or when the Java VM goes away.

Note that this constructor implicitly sets the TRANSIENT flag.

Parameters:
tempFilePrefix - the prefix to use with the temporary file, or null for default prefix "fhm"
Throws:
java.io.IOException - Unable to create temp file
See Also:
FileHashMap(String,int), FileHashMap()

FileHashMap

public FileHashMap(java.lang.String pathPrefix,
                   int flags)
            throws java.io.FileNotFoundException,
                   ObjectExistsException,
                   java.lang.ClassNotFoundException,
                   VersionMismatchException,
                   java.io.IOException

Create a new FileHashMap object that will read its data from and/or store its data in files derived from the specified prefix. The prefix is a file prefix onto which data and index suffixes will be appended.

Values for the flag parameter

The flags parameter consists of a set of logically OR'd constants.

If (flags & NO_CREATE) is zero, the constructor will attempt to create the index files if they don't exist. Otherwise, it expects to find a saved hash map, and it will throw an exception if the files are not present.

If (flags & TRANSIENT) is non-zero, the constructor will attempt to create a transient map. A transient map is not saved; its disk files are removed when the map is finalized or manually closed. You cannot create a transient map using a file prefix that specifies an existing, saved map. Specifying the TRANSIENT flag causes any NO_CREATE flag to be ignored.

For example, the following statement creates a transient FileHashMap:

 Map map = new FileHashMap ("/my/temp/dir", FileHashMap.TRANSIENT);
 

whereas this statements opens a persistent FileHashMap, creating it if it doesn't already exist:

 Map map = new FileHashMap ("/my/map/dir", FileHashMap.CREATE);
 

Parameters:
pathPrefix - The pathname prefix to the files to be used
flags - Flags that control the disposition of the files. A value of 0 means no flags are set. See above for details.
Throws:
java.io.FileNotFoundException - The specified hash files do not exist, and the NO_CREATE flag was specified.
java.lang.ClassNotFoundException - Failed to deserialize an object
VersionMismatchException - Bad or unsupported version stamp in FileHashMap index file
ObjectExistsException - One or both of the files already exist, but the TRANSIENT flag was set and the FORCE_OVERWRITE flag was not set.
java.io.IOException - Other errors
See Also:
NO_CREATE, TRANSIENT, FileHashMap(), FileHashMap(String)
Method Detail

finalize

protected void finalize()
                 throws java.lang.Throwable
Finalizer

Overrides:
finalize in class java.lang.Object
Throws:
throwable - on error
java.lang.Throwable

clear

public void clear()

Removes all mappings from this map. The data file is cleared by closing it, deleting it, and reopening it. If an I/O error occurs at any point, this object will be closed and marked invalid.

Specified by:
clear in interface java.util.Map<K,V>
Overrides:
clear in class java.util.AbstractMap<K,V>

close

public void close()
           throws java.io.NotSerializableException,
                  java.io.IOException

Close this map. If the map is not marked as transient, this method saves the index to disk. Otherwise, it removes both the index file and data file.

Throws:
java.io.NotSerializableException - Can't save index (bug)
java.io.IOException - Error writing index
See Also:
save()

containsKey

public boolean containsKey(java.lang.Object key)

Returns true if this map contains a mapping for the specified key. Since the keys are cached in an in-memory index, this method does not have to access the data file, and is fairly cheap.

Specified by:
containsKey in interface java.util.Map<K,V>
Overrides:
containsKey in class java.util.AbstractMap<K,V>
Parameters:
key - key whose presence in this map is to be tested
Returns:
true if this map contains a mapping for the specified key, false otherwise.

containsValue

public boolean containsValue(java.lang.Object value)

Returns true if this map maps one or more keys that are mapped to the specified value. Because it must iterate through some portion of the on-disk value set, this method can be slow.

Specified by:
containsValue in interface java.util.Map<K,V>
Overrides:
containsValue in class java.util.AbstractMap<K,V>
Parameters:
value - value whose presence in this map is to be tested.
Returns:
true if this map maps one or more keys to the specified value, false otherwise.

delete

public void delete()
Deletes the files backing this FileHashMap. This method implicitly calls close().


entrySet

public java.util.Set<java.util.Map.Entry<K,V>> entrySet()

Returns a "thin" set view of the mappings contained in this map. Each element in the returned set is a Map.Entry; each Map.Entry contains a key and a proxy for the associated value. The map's values themselves are not loaded into memory until requested via a call to Map.Entry.getValue().

The set is backed by the map, so changes to the map are reflected in the set. If the map is modified while an iteration over the set is in progress, the results of the iteration are undefined. Neither the set nor its associated iterator supports any of the set-modification methods (e.g., add(), remove(), etc). If you attempt to call any of those methods, the called method will throw an UnsupportedOperationException.

NOTE: Calling this method on an invalid or closed FileHashMap will result in an exception.

Specified by:
entrySet in interface java.util.Map<K,V>
Specified by:
entrySet in class java.util.AbstractMap<K,V>
Returns:
a set view of the mappings contained in this map.
See Also:
keySet(), values(), UnsupportedOperationException

equals

public boolean equals(java.lang.Object o)

Compares the specified object with this map for equality. Returns true if the given object is also a map and the two Maps represent the same mappings. More formally, two maps t1 and t2 represent the same mappings if t1.entrySet().equals(t2.entrySet()). This ensures that the equals method works properly across different implementations of the Map interface.

Warning:: Because this method must compare the actual values stored in the map, and the values in a file, this method can be quite slow.

Specified by:
equals in interface java.util.Map<K,V>
Overrides:
equals in class java.util.AbstractMap<K,V>
Parameters:
o - object to be compared for equality with this map.
Returns:
true if the specified object is equal to this map.

get

public V get(java.lang.Object key)

Returns the value associated with the the specified key. Returns null if the map contains no mapping for this key.

Specified by:
get in interface java.util.Map<K,V>
Overrides:
get in class java.util.AbstractMap<K,V>
Parameters:
key - key whose associated value is to be returned.
Returns:
the value to which this map maps the specified key, or null if the map contains no mapping for this key.
Throws:
java.lang.ClassCastException - if the key is of an inappropriate type for this map.
java.lang.NullPointerException - key is null
See Also:
containsKey(Object)

hashCode

public int hashCode()

Returns the hash code value for this map. The hash code of a map is defined to be the sum of the hash codes of each entry in the map's entrySet() view. This ensures that t1.equals(t2) implies that t1.hashCode()==t2.hashCode() for any two maps t1 and t2, as required by the general contract of Object.hashCode.

Warning:: The recommended hash code for each entry in the entrySet() view is a combination of hash codes for the entry's key and the entry's value. (See java.util.Map.Entry for details.) Because the values in a FileHashMap object are stored in a file, this method can be quite slow.

Specified by:
hashCode in interface java.util.Map<K,V>
Overrides:
hashCode in class java.util.AbstractMap<K,V>
Returns:
the hash code value for this map.
See Also:
Object.hashCode(), Object.equals(Object), equals(Object)

isValid

public boolean isValid()
Determine whether the FileHashMap is valid or not. Once a FileHashMap has been closed, it is invalid and can no longer be used.

Returns:
true if this object is valid, false if not

keySet

public java.util.Set<K> keySet()

Returns a Set containing all the keys in this map. Since the keys are cached in memory, this method is relatively efficient. The keys are returned in an order that optimizes sequential access to the data file; see the Optimizations section in the main class documentation for details.

The set is backed by the map, so changes to the map are reflected in the set. If the map is modified while an iteration over the set is in progress, the results of the iteration are undefined. Neither the set nor its associated iterator supports any of the set-modification methods (e.g., add(), remove(), etc). If you attempt to call any of those methods, the called method will throw an UnsupportedOperationException.

Specified by:
keySet in interface java.util.Map<K,V>
Overrides:
keySet in class java.util.AbstractMap<K,V>
Returns:
a set view of the keys contained in this map.

put

public V put(K key,
             V value)
      throws java.lang.ClassCastException,
             java.lang.IllegalArgumentException,
             java.lang.NullPointerException

Associates the specified value with the specified key in this map. If the map previously contained a mapping for this key, the old value is replaced. This map class does not permit a null value to be stored.

Specified by:
put in interface java.util.Map<K,V>
Overrides:
put in class java.util.AbstractMap<K,V>
Parameters:
key - key with which the specified value is to be associated.
value - value to be associated with the specified key.
Returns:
previous value associated with specified key, or null if there was no mapping for key.
Throws:
java.lang.ClassCastException - if the class of the specified key or value prevents it from being stored in this map.
java.lang.IllegalArgumentException - Value not serializable, or I/O error while attempting to serialize value.
java.lang.NullPointerException - the specified key or value is null.

remove

public V remove(java.lang.Object key)

Removes the mapping for this key from this map, if present. Note: The space occupied by the serialized value in the data file is not coalesced at this point.

Specified by:
remove in interface java.util.Map<K,V>
Overrides:
remove in class java.util.AbstractMap<K,V>
Parameters:
key - key whose mapping is to be removed from the map.
Returns:
previous value associated with specified key, or null if there was no mapping for key.

save

public void save()
          throws java.io.IOException,
                 java.io.NotSerializableException

Save any in-memory index changes to disk without closing the map. You can call this method even if the map is marked as temporary; on a temporary map, save() simply returns without doing anything.

Throws:
java.io.IOException - Error saving changes to disk.
java.io.NotSerializableException - Can't save index because it contains one or more objects that cannot be serialized.
See Also:
close()

size

public int size()

Returns the number of key-value mappings in this map. If the map contains more than Integer.MAX_VALUE elements, returns Integer.MAX_VALUE. This method queries the in-memory key index, so it is relatively efficient.

Specified by:
size in interface java.util.Map<K,V>
Overrides:
size in class java.util.AbstractMap<K,V>
Returns:
the number of key-value mappings in this map.

values

public java.util.Collection<V> values()

Returns a collection view of the values contained in this map. The returned collection is a "thin" view of the values contained in this map. The collection contains proxies for the actual disk-resident values; the values themselves are not loaded until a Collection method such as contains() is called.

The collection is backed by the map, so changes to the map are reflected in the set. If the map is modified while an iteration over the set is in progress, the results of the iteration are undefined.

The set does not support any of the add() methods.

Warning:: The toArray() methods can be dangerous, since they will attempt to load every value from the data file into an in-memory array.

Specified by:
values in interface java.util.Map<K,V>
Overrides:
values in class java.util.AbstractMap<K,V>
Returns:
a collection view of the values contained in this map.
See Also:
keySet(), values()


Copyright © 2004-2007 Brian M. Clapper. All Rights Reserved.