jlfcdb

jlfcdb is an re-implemenation of D.J. Bernstein's Constant Database (CDB).

You can check out the API here

Download

C++ version:
Latest stable version: v0.2 (sha1: 109954ed484e3ab39d32b9072e2eb9475e0c9334)
Git repository: git://git.fjellstad.org/jlfcdb.git

Java version:
Latest stable version: v0.2 (sha1: b508dd55b22a128920c822795825cd116dddce77)
Git repository: git://git.fjellstad.org/jlfcdb4java.git

Data structure

A cdb date structure is an associate array that maps strings (keys) to strings (value).

The whole structure is stored in one file, split into three parts. The first part contains the pointers to the hash arrays. The second part contains all the records. The third part contains the hash arrays.

It is laid out like this:

       +-------+---------+-------+
       |  ptrs | records | hashs |
       +-------+---------+-------+
	

There are 256 pointers, each pointing to a hash array table or nothing. The pointers are built up of a position of the hash table and the length of the hash table (number of slots).

pointers:

       +----+----+----+----+----+----+------+------+
       | p0 | n0 | p1 | n1 | .. | .. | p255 | n255 |
       +----+----+----+----+----+----+------+------+
	

Each record contains the size of the key, the size of the value, the key string and then the value string without alignments.

records:

       +---------+-----------+------+--------+----+----+----+----+
       | keylen0 | valuelen0 | key0 | value0 | .. | .. | .. | .. |
       +---------+-----------+------+--------+----+----+----+----+
	

Each hash table contains a number of slots. The number of slots are twice the number of records with the hash value of the given table. Each slot has the hash value and position of the given record for this key.

hashtables

       +------------+------------+-----+
       | hashtable0 | hashtable1 | ... |
       +------------+------------+-----+
	

hash table:

       +----------+------+---+---+----------+------+---+---+
       | keyhash0 | pos0 | 0 | 0 | keyhash1 | pos1 | 0 | 0 |
       +----------+------+---+---+----------+------+---+---+
	

Pointers, slot size, hash values, and string lengths are all 32-bits quantities.

To find a record do the following. Hash the key. The hash modulo 256 is the hash table. Read the position of the hash table from the pointer table. If the number of slots is 0, then there are no hash tables for that key. The key hash divided by 256 modulo the slot size is the slot number. Check this slot if the hashes match, if not go to next slot until you find the hash that matches or an empty slot is found. Read the offset of the record.

The hash function is given as "h = ((h << 5) + h) ^ c" with a starting has of 5381.