Post

Consistent Hashing In Distributed System (Part 1)

Consistent Hashing In Distributed System (Part 1)

๐Ÿงพ Goal

  • Simulate how keys are mapped to partitions (nodes) using Consistent Hashing.
  • Then simulate what happens when you add a new partition -> How only some keys are remapped (not all, unlike naive hash % N).

๐Ÿ•ถ๏ธ Context

  • Problem: We have N nodes and an amount of data. How can we store these data into N nodes.
  • Initial thinking: If using basic formula hash function f(x) = x % N where x represents data then whenever adding nodes will calculate the whole system -> Idea: Set a number (min_value, max_value) as a range (we call hash ring) independently from the N nodes. We calculate the hash data f(x) to distribute them into hash ring

โš™๏ธ Setup

  • Hash Ring Range: 0 - 99 (for simplicity)
    (Real systems may use a space like 0 to 2ยณยฒ or 2โถโด, but 0โ€“99 makes it easier to follow)

๐Ÿงฑ Initial Partitions (Nodes)

NodeHash Position
A10
B40
C70

๐Ÿง‘ Sample Keys (Users)

KeyHash(key) (simulated)
alice15
bob43
charlie67
dave84
eve5

โœ… Step 1: Place Keys on Ring and Assign to Nodes

Rules

  • Keys go to the next node clockwise.
  • If a keyโ€™s hash exceeds the highest node, it wraps around to node at the lowest position.

๐Ÿ”— Assignment

KeyHashAssigned NodeReason
eve5A (10)5 โ†’ next โ‰ฅ 10
alice15B (40)15 โ†’ next โ‰ฅ 40
bob43C (70)43 โ†’ next โ‰ฅ 70
charlie67C (70)67 โ†’ next โ‰ฅ 70
dave84A (10, wrap)84 โ†’ wrap โ†’ next โ‰ฅ 10

๐Ÿ“Š Initial Mapping Summary

NodeKeys Assigned
Aeve, dave
Balice
Cbob, charlie

โž• Step 2: Add New Node D at Hash Position 50

๐Ÿ†• Updated Nodes

  • A (10)
  • B (40)
  • D (50)
  • C (70)

๐Ÿ”„ Recalculate Assignments

KeyHashNew Assigned NodePreviously
eve5A (10)A
alice15B (40)B
bob43D (50)C โœ… moved
charlie67C (70)C
dave84A (10, wrap)A

๐Ÿ“‰ Impact of Adding Node D

KeyMoved?
bobโœ… Yes (C โ†’ D)
OthersโŒ No

Only one key (bob) was remapped โ€” this demonstrates the efficiency of consistent hashing.

๐Ÿ“Œ Visual Ring Representation (Simplified)

1
2
3
4
5
6
7
  0           25           50           75          99
   +-----------+------------+------------+-----------+
   |           |            |            |           |
 [A-10]     [B-40]      [D-50]       [C-70]        wrap
   ^           ^            ^            ^
   |           |            |            |
 eve,dave    alice         bob        charlie

๐Ÿงฎ How Do We Determine Node D at which Hash Position?

โœ… Short Answer

The nodeโ€™s hash position is calculated using the same hash function used for keys, applied to the nodeโ€™s unique identifier (e.g., "NodeD").

๐Ÿ” Step-by-Step

1. Choose an Identifier

  • "NodeD"
  • Or something like "192.168.0.4:8080" or "server-004"

2. Apply the Hash Function

1
2
# Simplified example
hash("NodeD") % 100  โ†’ 50

This gives Node D a position of 50 on the ring.

3. Use Consistent Hash Function

1
2
3
4
5
6
7
8
import hashlib

def get_ring_position(node_id, ring_size=100):
    hash_bytes = hashlib.md5(node_id.encode()).digest()
    hash_int = int.from_bytes(hash_bytes, byteorder='big')
    return hash_int % ring_size

get_ring_position("NodeD")  # โ†’ e.g., 50

๐Ÿง  Why Consistent Hashing is Great

FeatureBenefit
Key hash independent of NStable unless key changes
Only a subset of keys moveScales easily
Balanced load (with vnodes)More even distribution

๐ŸŒ Real-World Uses of Consistent Hashing

Consistent hashing is widely used in modern distributed systems for scalability, fault tolerance, and load balancing.

โœ… Distributed Databases & Caches

๐Ÿ”น Amazon DynamoDB

  • Uses consistent hashing to partition data and support horizontal scaling.

๐Ÿ”น Apache Cassandra

  • Each node owns a range on the ring.
  • New nodes join by taking part of the range, minimal key movement.

๐Ÿ”น Memcached

  • In multi-node deployments, consistent hashing ensures that only a small subset of keys are remapped when a node is added/removed.

๐ŸŒ Content Delivery Networks (CDNs)

๐Ÿ”น Akamai, Cloudflare

  • Use hashing to assign content or requests to edge servers.
  • Minimal rebalance on server change.

โš–๏ธ Load Balancers

๐Ÿ”น NGINX, Envoy

  • Hash-based load balancing (e.g., hash by IP) supports sticky sessions.
  • Adding/removing backends only affects affected client hashes.

๐Ÿ—„๏ธ Object Storage Systems

๐Ÿ”น Ceph, OpenStack Swift

  • Consistent hashing determines where objects are stored across many disks/nodes.

๐Ÿ” Service Mesh / Routing

๐Ÿ”น Istio, Linkerd

  • Traffic routing based on consistent hashing of keys (like user IDs) to maintain session stickiness or shard requests.

๐Ÿ“ฌ Pub/Sub and Messaging Systems

๐Ÿ”น Apache Kafka

  • Partitions are often chosen by key-based hashing.
  • Consumers scale predictably with minimal rebalance.

๐Ÿง  DNS Load Balancing

๐Ÿ”น Netflix, Google

  • Use hash-based routing to send users to the same edge nodes or POPs based on IP or region.

FeatureBenefit
Minimal remappingOnly affected keys move
ScalableAdd/remove nodes easily
PredictableSame key โ†’ same node
FastHashing is computationally cheap
StatelessNo central coordinator needed

โœ… Final Summary

  • Each key is hashed into a number (0โ€“99).
  • Itโ€™s then assigned to the next node clockwise.
  • When you add a new node, only some keys (between previous and new node) are remapped.

๐ŸŽฏ Bonus: Why Use Virtual Nodes?

NodeVirtual Node Positions
Node D50, 73, 91
  • Using virtual nodes (multiple positions per node) helps avoid uneven load.
  • This is especially useful if nodes end up clustered in the hash space.

โžก๏ธ Go to see some problem and introduce solution name virtual nodes

This post is licensed under CC BY 4.0 by the author.