I Thought Redis Was Just a HashMap
4 June 2026
when using an in-memory db like redis or memcached i used to think — isn't this just a global hash table with set/get/del operations? why don't people just use their language's built-in map, host it in the cloud, and call it a day?
so i thought, let's actually explore the internals and attempt to build a bare-bones version from scratch. i chose golang because i was exploring golang and its concurrency model.
naive approach
started with go's native map with basic set/get/del over tcp. added goroutines per connection for concurrent requests. then the obvious problem — multiple goroutines reading and writing the same map. added a single mutex. works, but now every operation locks the entire table. not scalable if you want any real throughput.
also heard about go's gc overhead on heap allocations, so wanted to avoid native map, so decided to implement a custom hash table.
hash table 101
the basic idea is, hash the key, get an integer, use it as an array index. same key always hashes to the same index so avg-case lookups are O(1). but two different keys can hash to the same index, so we use chaining to store multiple values at the same index. like you can create object/struct to store key/value pairs with next pointer to next struct/object in chain. so now same index can store multiple key/value pairs via chaining (linked list). for collision resolution,there are other techniques as well like open addressing etc.
i went with a byte array approach instead. which will have two arrays:
bucket_array [index → offset]
data_array [keyLen | valLen | nextOffset | key | value | ...]
bucket[42] -> offset 128
data_array: [5|5|256|hello|world]
in this structure, we hash the key, get the bucket index, bucket stores an offset of the data in the data array. data array stores the actual key/value pairs with metadata. each entry stores the data with next offset to next entry in chain.
sharding
the whole hash table still shares a single lock , every operation locks the entire table. we can split the table into multiple arrays, each entry in the array will have its own bucket array and data array. this way, we can lock only the array index we are working on, not the entire table.
type shard struct {
mu sync.RWMutex
bucketArray []int
dataArray []byte
}
type DB struct {
shards [NumShards]shard
}
func (db *DB) getShard(hash uint32) *shard {
return &db.shards[hash%NumShards]
}
each shard is pre-allocated with a fixed size at startup. when a shard fills up , a background worker (goroutine) handles growing via channel queue so the whole operation does not block.
delete and compaction
the structure is append only in data array. so delete wont remove specific entry from chain. so i mark deleted entries with a tombstone value like valLen = 0. on get tombstone entries are skipped.
tombstone entries gonna accumulate and waste the memory space. added one compaction function which rebuilds the shard by skipping tombstone entries and this compaction is triggered when memory usage crosses a threshold. but it also locks the shard so it's a trade off.
conclusion
the single mutex was the biggest bottleneck — not the hash function or collision handling. sharding reduced contention more than anything else. built this purely for learning — but exploring memory management, concurrency, and go internals all at once was quite fun.
what's missing
- no persistent storage, data is lost on restart
- no eviction policy, memory usage grows without bound
- no crash recovery
- compaction pauses the shard while rebuilding
- no replication
- edges case handling
source code: https://github.com/mukul0x9/memdb
references: