152 lines
6.5 KiB
Plaintext
152 lines
6.5 KiB
Plaintext
The LHASH library.
|
|
|
|
I wrote this library in 1991 and have since forgotten why I called it lhash.
|
|
It implements a hash table from an article I read at the
|
|
time from 'Communications of the ACM'. What makes this hash
|
|
table different is that as the table fills, the hash table is
|
|
increased (or decreased) in size via realloc().
|
|
When a 'resize' is done, instead of all hashes being redistributed over
|
|
twice as many 'buckets', one bucket is split. So when an 'expand' is done,
|
|
there is only a minimal cost to redistribute some values. Subsequent
|
|
inserts will cause more single 'bucket' redistributions but there will
|
|
never be a sudden large cost due to redistributing all the 'buckets'.
|
|
|
|
The state for a particular hash table is kept in the LHASH structure.
|
|
The LHASH structure also records statistics about most aspects of accessing
|
|
the hash table. This is mostly a legacy of my writing this library for
|
|
the reasons of implementing what looked like a nice algorithm rather than
|
|
for a particular software product.
|
|
|
|
Internal stuff you probably don't want to know about.
|
|
The decision to increase or decrease the hash table size is made depending
|
|
on the 'load' of the hash table. The load is the number of items in the
|
|
hash table divided by the size of the hash table. The default values are
|
|
as follows. If (hash->up_load < load) => expand.
|
|
if (hash->down_load > load) => contract. The 'up_load' has a default value of
|
|
1 and 'down_load' has a default value of 2. These numbers can be modified
|
|
by the application by just playing with the 'up_load' and 'down_load'
|
|
variables. The 'load' is kept in a form which is multiplied by 256. So
|
|
hash->up_load=8*256; will cause a load of 8 to be set.
|
|
|
|
If you are interested in performance the field to watch is
|
|
num_comp_calls. The hash library keeps track of the 'hash' value for
|
|
each item so when a lookup is done, the 'hashes' are compared, if
|
|
there is a match, then a full compare is done, and
|
|
hash->num_comp_calls is incremented. If num_comp_calls is not equal
|
|
to num_delete plus num_retrieve it means that your hash function is
|
|
generating hashes that are the same for different values. It is
|
|
probably worth changing your hash function if this is the case because
|
|
even if your hash table has 10 items in a 'bucked', it can be searched
|
|
with 10 'unsigned long' compares and 10 linked list traverses. This
|
|
will be much less expensive that 10 calls to you compare function.
|
|
|
|
LHASH *lh_new(
|
|
unsigned long (*hash)(),
|
|
int (*cmp)());
|
|
This function is used to create a new LHASH structure. It is passed
|
|
function pointers that are used to store and retrieve values passed
|
|
into the hash table. The 'hash'
|
|
function is a hashing function that will return a hashed value of
|
|
it's passed structure. 'cmp' is passed 2 parameters, it returns 0
|
|
is they are equal, otherwise, non zero.
|
|
If there are any problems (usually malloc failures), NULL is
|
|
returned, otherwise a new LHASH structure is returned. The
|
|
hash value is normally truncated to a power of 2, so make sure
|
|
that your hash function returns well mixed low order bits.
|
|
|
|
void lh_free(
|
|
LHASH *lh);
|
|
This function free()s a LHASH structure. If there is malloced
|
|
data in the hash table, it will not be freed. Consider using the
|
|
lh_doall function to deallocate any remaining entries in the hash
|
|
table.
|
|
|
|
char *lh_insert(
|
|
LHASH *lh,
|
|
char *data);
|
|
This function inserts the data pointed to by data into the lh hash
|
|
table. If there is already and entry in the hash table entry, the
|
|
value being replaced is returned. A NULL is returned if the new
|
|
entry does not clash with an entry already in the table (the normal
|
|
case) or on a malloc() failure (perhaps I should change this....).
|
|
The 'char *data' is exactly what is passed to the hash and
|
|
comparison functions specified in lh_new().
|
|
|
|
char *lh_delete(
|
|
LHASH *lh,
|
|
char *data);
|
|
This routine deletes an entry from the hash table. The value being
|
|
deleted is returned. NULL is returned if there is no such value in
|
|
the hash table.
|
|
|
|
char *lh_retrieve(
|
|
LHASH *lh,
|
|
char *data);
|
|
If 'data' is in the hash table it is returned, else NULL is
|
|
returned. The way these routines would normally be uses is that a
|
|
dummy structure would have key fields populated and then
|
|
ret=lh_retrieve(hash,&dummy);. Ret would now be a pointer to a fully
|
|
populated structure.
|
|
|
|
void lh_doall(
|
|
LHASH *lh,
|
|
void (*func)(char *a));
|
|
This function will, for every entry in the hash table, call function
|
|
'func' with the data item as parameters.
|
|
This function can be quite useful when used as follows.
|
|
void cleanup(STUFF *a)
|
|
{ STUFF_free(a); }
|
|
lh_doall(hash,cleanup);
|
|
lh_free(hash);
|
|
This can be used to free all the entries, lh_free() then
|
|
cleans up the 'buckets' that point to nothing. Be careful
|
|
when doing this. If you delete entries from the hash table,
|
|
in the call back function, the table may decrease in size,
|
|
moving item that you are
|
|
currently on down lower in the hash table. This could cause
|
|
some entries to be skipped. The best solution to this problem
|
|
is to set lh->down_load=0 before you start. This will stop
|
|
the hash table ever being decreased in size.
|
|
|
|
void lh_doall_arg(
|
|
LHASH *lh;
|
|
void(*func)(char *a,char *arg));
|
|
char *arg;
|
|
This function is the same as lh_doall except that the function
|
|
called will be passed 'arg' as the second argument.
|
|
|
|
unsigned long lh_strhash(
|
|
char *c);
|
|
This function is a demo string hashing function. Since the LHASH
|
|
routines would normally be passed structures, this routine would
|
|
not normally be passed to lh_new(), rather it would be used in the
|
|
function passed to lh_new().
|
|
|
|
The next three routines print out various statistics about the state of the
|
|
passed hash table. These numbers are all kept in the lhash structure.
|
|
|
|
void lh_stats(
|
|
LHASH *lh,
|
|
FILE *out);
|
|
This function prints out statistics on the size of the hash table,
|
|
how many entries are in it, and the number and result of calls to
|
|
the routines in this library.
|
|
|
|
void lh_node_stats(
|
|
LHASH *lh,
|
|
FILE *out);
|
|
For each 'bucket' in the hash table, the number of entries is
|
|
printed.
|
|
|
|
void lh_node_usage_stats(
|
|
LHASH *lh,
|
|
FILE *out);
|
|
This function prints out a short summary of the state of the hash
|
|
table. It prints what I call the 'load' and the 'actual load'.
|
|
The load is the average number of data items per 'bucket' in the
|
|
hash table. The 'actual load' is the average number of items per
|
|
'bucket', but only for buckets which contain entries. So the
|
|
'actual load' is the average number of searches that will need to
|
|
find an item in the hash table, while the 'load' is the average number
|
|
that will be done to record a miss.
|