๐งพ Goal
- Simulate how keys are mapped to partitions (nodes) using Consistent Hashing.
- Then simulate what happens when you add a new partition -> How only some keys are remapped (not all, unlike naive hash % N).
๐ถ๏ธ Context
- Problem: We have
N nodes and an amount of data. How can we store these data into N nodes. - Initial thinking: If using basic formula hash function
f(x) = x % N where x represents data then whenever adding nodes will calculate the whole system -> Idea: Set a number (min_value, max_value) as a range (we call hash ring) independently from the N nodes. We calculate the hash data f(x) to distribute them into hash ring
โ๏ธ Setup
- Hash Ring Range:
0 - 99 (for simplicity)
(Real systems may use a space like 0 to 2ยณยฒ or 2โถโด, but 0โ99 makes it easier to follow)
๐งฑ Initial Partitions (Nodes)
| Node | Hash Position |
|---|
| A | 10 |
| B | 40 |
| C | 70 |
๐ง Sample Keys (Users)
| Key | Hash(key) (simulated) |
|---|
| alice | 15 |
| bob | 43 |
| charlie | 67 |
| dave | 84 |
| eve | 5 |
โ
Step 1: Place Keys on Ring and Assign to Nodes
Rules
- Keys go to the next node clockwise.
- If a keyโs hash exceeds the highest node, it wraps around to node at the lowest position.
๐ Assignment
| Key | Hash | Assigned Node | Reason |
|---|
| eve | 5 | A (10) | 5 โ next โฅ 10 |
| alice | 15 | B (40) | 15 โ next โฅ 40 |
| bob | 43 | C (70) | 43 โ next โฅ 70 |
| charlie | 67 | C (70) | 67 โ next โฅ 70 |
| dave | 84 | A (10, wrap) | 84 โ wrap โ next โฅ 10 |
๐ Initial Mapping Summary
| Node | Keys Assigned |
|---|
| A | eve, dave |
| B | alice |
| C | bob, charlie |
โ Step 2: Add New Node D at Hash Position 50
๐ Updated Nodes
๐ Recalculate Assignments
| Key | Hash | New Assigned Node | Previously |
|---|
| eve | 5 | A (10) | A |
| alice | 15 | B (40) | B |
| bob | 43 | D (50) | C โ
moved |
| charlie | 67 | C (70) | C |
| dave | 84 | A (10, wrap) | A |
๐ Impact of Adding Node D
| Key | Moved? |
|---|
| bob | โ
Yes (C โ D) |
| Others | โ No |
Only one key (bob) was remapped โ this demonstrates the efficiency of consistent hashing.
๐ Visual Ring Representation (Simplified)
1
2
3
4
5
6
7
| 0 25 50 75 99
+-----------+------------+------------+-----------+
| | | | |
[A-10] [B-40] [D-50] [C-70] wrap
^ ^ ^ ^
| | | |
eve,dave alice bob charlie
|
๐งฎ How Do We Determine Node D at which Hash Position?
โ
Short Answer
The nodeโs hash position is calculated using the same hash function used for keys, applied to the nodeโs unique identifier (e.g., "NodeD").
๐ Step-by-Step
1. Choose an Identifier
"NodeD"- Or something like
"192.168.0.4:8080" or "server-004"
2. Apply the Hash Function
1
2
| # Simplified example
hash("NodeD") % 100 โ 50
|
This gives Node D a position of 50 on the ring.
3. Use Consistent Hash Function
1
2
3
4
5
6
7
8
| import hashlib
def get_ring_position(node_id, ring_size=100):
hash_bytes = hashlib.md5(node_id.encode()).digest()
hash_int = int.from_bytes(hash_bytes, byteorder='big')
return hash_int % ring_size
get_ring_position("NodeD") # โ e.g., 50
|
๐ง Why Consistent Hashing is Great
| Feature | Benefit |
|---|
Key hash independent of N | Stable unless key changes |
| Only a subset of keys move | Scales easily |
| Balanced load (with vnodes) | More even distribution |
๐ Real-World Uses of Consistent Hashing
Consistent hashing is widely used in modern distributed systems for scalability, fault tolerance, and load balancing.
โ
Distributed Databases & Caches
๐น Amazon DynamoDB
- Uses consistent hashing to partition data and support horizontal scaling.
๐น Apache Cassandra
- Each node owns a range on the ring.
- New nodes join by taking part of the range, minimal key movement.
๐น Memcached
- In multi-node deployments, consistent hashing ensures that only a small subset of keys are remapped when a node is added/removed.
๐ Content Delivery Networks (CDNs)
๐น Akamai, Cloudflare
- Use hashing to assign content or requests to edge servers.
- Minimal rebalance on server change.
โ๏ธ Load Balancers
๐น NGINX, Envoy
- Hash-based load balancing (e.g., hash by IP) supports sticky sessions.
- Adding/removing backends only affects affected client hashes.
๐๏ธ Object Storage Systems
๐น Ceph, OpenStack Swift
- Consistent hashing determines where objects are stored across many disks/nodes.
๐ Service Mesh / Routing
๐น Istio, Linkerd
- Traffic routing based on consistent hashing of keys (like user IDs) to maintain session stickiness or shard requests.
๐ฌ Pub/Sub and Messaging Systems
๐น Apache Kafka
- Partitions are often chosen by key-based hashing.
- Consumers scale predictably with minimal rebalance.
๐ง DNS Load Balancing
๐น Netflix, Google
- Use hash-based routing to send users to the same edge nodes or POPs based on IP or region.
๐ง Why Itโs So Popular
| Feature | Benefit |
|---|
| Minimal remapping | Only affected keys move |
| Scalable | Add/remove nodes easily |
| Predictable | Same key โ same node |
| Fast | Hashing is computationally cheap |
| Stateless | No central coordinator needed |
โ
Final Summary
- Each key is hashed into a number (0โ99).
- Itโs then assigned to the next node clockwise.
- When you add a new node, only some keys (between previous and new node) are remapped.
๐ฏ Bonus: Why Use Virtual Nodes?
| Node | Virtual Node Positions |
|---|
| Node D | 50, 73, 91 |
- Using virtual nodes (multiple positions per node) helps avoid uneven load.
- This is especially useful if nodes end up clustered in the hash space.
โก๏ธ Go to see some problem and introduce solution name virtual nodes