Hash Functions
A hash function maps a key to an address. Ideally it should be very fast,
and produce well-scattered results for both random input data *and* for
non-random input data. See p.614-615 for a lengthy discussion of what
constitutes a good hash function.
Back to table of contents
Resolving Collisions
We have seen two collision resolution schemes:
- Open Addressing: if a data item needs to be inserted in a location
already taken, another location is found in the hash table for the item,
using linear/quadratic probing, double hashing, increasing the size of the
table, or another scheme.
- Use buckets: Instead of having just one data item at each location,
implement the table by having each location of the array point at a linked
list (or other variable-size data structure -- trees work well here), and
when collisions occur, just throw the new data item into the corresponding
list/tree.
Back to table of contents
Effeciency
Seperate chaining comes out on top in almost every respect when comparing
the different methods of collision resolution and their effeciency.
However, one must remember that there is storage overhead in maintaining the
lists/trees that is otherwise negligible for arrays. Make sure you
understand the "alpha" analysis of hashing/collision resolution schemes on
p.611-614.
Back to table of contents
Hash table Traversal: Just say no?
Any function that is going to need an operation of the type "give me the
next item in the table" is going to have very bad performance under a
hash-table implementation. This is because the keys used to hash the values
are not linearly stored in the array, and there is no way to recover this
information quickly. As a result, traversal, range extract,
smallest/largest value find, and other operations like these make using hash
tables much less attractive than they would otherwise be.
Back to table of contents