DynamoDB Hot Partitions and Write Sharding: A Practical Guide
Each DynamoDB partition has fixed limits: 3,000 RCUs and 1,000 WCUs per second. If your traffic hits one partition key disproportionately, you hit that ceiling and start getting ProvisionedThroughputExceededException errors - even though the table as a whole has plenty of capacity left.
This is the hot partition problem. It’s the most common reason I see DynamoDB tables degrade at scale, and it’s also the most common reason teams blame DynamoDB when the actual problem is their schema.
When hot partitions actually happen
Most schemas don’t have a hot partition problem. The default - keying on <entity>#<id> - distributes evenly across millions of unique IDs. You only get into trouble when you do one of these things:
- Static partition keys:
PK: TENANT_LISTfor admin enumeration,PK: GLOBAL_LEADERBOARDfor a top-N list,PK: COUNTERfor a single global counter. Every write to that entity hits one partition. - Low-cardinality keys:
PK: STATUS#<status>where status has 5 values,PK: COUNTRY#<code>where 80% of users are in two countries,PK: DATE#<today>where everything written today goes to the same key. - One extremely popular item: a celebrity’s user record, a trending product page, the “general” channel in a chat app. Even a perfectly-designed key gets hot if one specific value is 1000x more popular than the rest.
The static-key case is usually a deliberate choice (you wanted a single partition for enumeration) and only becomes a problem at high write volume. The low-cardinality case is usually a schema mistake. The popular-item case is the hardest to fix because the data model doesn’t suggest the problem until production hits it.
How to know you have one
Three signals, in order of how early you’ll notice them:
ConsumedWriteCapacityUnits per partition in CloudWatch. AWS doesn’t show this by default - you have to look at the per-partition metrics. If one partition is consuming 800+ WCUs while the table average is 100, that partition is hot.
Throttling without exceeding table capacity. If WriteThrottleEvents is non-zero but your provisioned WCU is well below your set limit, the throttling is per-partition. Adaptive capacity helps but doesn’t eliminate this.
Latency spikes on specific keys. P99 write latency much higher than P50 is often a hot-partition signature. The hot partition’s writes queue up while everything else is fast.
The clearest test: pick the partition you suspect, and look at WCU consumption for it specifically. If it’s near 1,000 WCU/sec, you have a hot partition.
Production issue
You're likely losing money on this in production.
A wrong partition key or missing GSI is a live cost problem. Get a DynamoDB schema review before your next deploy — async, fixed price, 5 business days.
Write sharding: the technique
The fix is to artificially spread writes across more partition keys by appending a shard suffix:
PK: COUNTER#<shardNumber> // e.g. COUNTER#0, COUNTER#1, ..., COUNTER#19
20 shards = 20 partitions = 20,000 WCU/sec headroom instead of 1,000.
The shard number can be:
- Random (
Math.floor(Math.random() * 20)) - simple, even distribution, but reads must scatter-gather across all shards - Hash of some attribute (e.g.
hash(userId) % 20) - same shard for the same user, useful for read locality - Time-bucketed (e.g.
Math.floor(Date.now() / 1000) % 20) - shards rotate every second, evenly distributes time-correlated writes
Random shards are the most common choice for counters and queues. Hash-based shards are right when you want a specific key to land on a specific partition for read locality.
Read is now scatter-gather
The cost of write sharding is that reads have to query all shards and merge. For a 20-shard counter:
const counts = await Promise.all(
Array.from({ length: 20 }, (_, i) =>
CounterEntity.get({ counterId: "page_views", shard: i }).go()
)
)
const total = counts.reduce((sum, c) => sum + c.data.value, 0)
20 parallel queries, summed in memory. For a high-volume counter, this is the right trade: 20 cheap reads to handle 20,000 WCU/sec of writes.
The number of shards is a knob. Too few = still hot. Too many = expensive scatter-gather on every read. Common defaults:
- Counters with high write, low read: 10-100 shards
- Queues with medium write: 4-10 shards
- Hot leaderboards / popular item caches: 5-20 shards
Don’t pick this number arbitrarily. Estimate peak WCU/sec for the entity, divide by 1,000, multiply by ~1.5 for headroom. That’s your shard count.
Common patterns and their fixes
Static-key admin lists
PK: TENANT_LIST for an admin tenant browser. Fine if you have 10,000 tenants. Hot if you have 10 million and admins refresh constantly.
Shard on the first character of the tenant name, or hash of tenant ID: PK: TENANT_LIST#<shard>. Admin UI scatters across shards (or the shard is fine because admin reads are infrequent).
Time-bucketed feeds
PK: FEED#<date> for a daily public feed. Every event today writes to the same partition.
Add a shard or use a finer time bucket: PK: FEED#<date>#<shard> or PK: FEED#<date>#<hour>. Reads scatter across shards or hours. The analytics events pattern uses time-bucketing as the primary partitioning strategy.
Global counters
PK: COUNTER, SK: <metric> to track global page views. Hot the moment traffic gets serious.
Shard the counter: PK: COUNTER#<shard>, SK: <metric>. Increment a random shard on each write. Read sums all shards.
This is also where DynamoDB’s UpdateItem with ADD shines - atomic increments per shard, parallel across shards, no read-modify-write race conditions.
Popular items
A single product page with 100,000 RPM has a hot partition for reads, not writes. The standard fix is a cache layer (DAX, ElastiCache, CDN). Sharding the read side is possible but ugly - you’d have to write the same item to N shard partitions and pick one randomly on read.
For 99% of “hot read” cases, cache. Sharding is overkill.
Leaderboards
PK: GLOBAL_LEADERBOARD is hot if scores update faster than 1,000/sec. Two options:
- Shard the leaderboard. Each player’s score writes to one shard (e.g. by player ID hash). Top-N reads scatter across shards and merge in memory. Works well for top-100 lists.
- Don’t store the leaderboard in DynamoDB at all. Redis sorted sets are purpose-built for this. The gaming leaderboard pattern covers when each is the right call.
When NOT to shard
Sharding adds permanent complexity. Don’t do it preemptively.
- Don’t shard on suspicion. Measure first. Most “I think this might get hot” partitions never actually do at the volumes most apps see.
- Don’t shard write-once items. A user’s profile is written once at signup. Sharding
USER#<id>does nothing useful. - Don’t shard for evenly-distributed entity keys.
PK: USER#<userId>is already distributed across millions of unique values. Adding a shard does nothing. - Don’t shard read-hot keys. Cache them.
The reflex of “more shards = more scale” is wrong. More shards = more scatter-gather complexity. Use the minimum shard count that solves your specific hotspot.
Adaptive capacity (and why it isn’t enough)
DynamoDB has a feature called adaptive capacity that automatically reallocates throughput from cold partitions to hot ones. This helps with mild hotspots and reduces throttling for spiky traffic.
It does not help with sustained hotspots that exceed 1,000 WCU/sec on a single partition. The per-partition limit is hard. Adaptive capacity buys you headroom up to the limit, not past it.
Use it as a safety net, not a strategy. If your design relies on adaptive capacity to function, you have a hot partition problem you haven’t solved yet.
Detection workflow
When a write throttle alarm fires:
- Check
WriteThrottleEventsper partition in CloudWatch (Contributor Insights for DynamoDB makes this much easier). - Identify the throttled partition key.
- Decide if it’s:
- Static / low-cardinality → shard the key
- One popular item → cache the reads, or write through a queue to smooth writes
- Genuinely uneven traffic → consider whether a different access pattern would distribute better
Don’t just bump the table’s provisioned capacity. That doesn’t help - the bottleneck is per-partition.
Mental model
DynamoDB is a hash table built out of many smaller hash tables. The partition key picks which sub-table you’re hitting. If your key distribution is even, the sub-tables wear roughly equally. If it’s lopsided, one sub-table burns out while the others sit idle.
Sharding is how you turn a one-key access pattern into an N-key access pattern. You pay for it on the read side (scatter-gather), but you buy effectively unlimited write throughput on that access pattern. For high-write entities - counters, queues, leaderboards, time-bucketed feeds - it’s the standard tool, not an exotic one.
The only mistake is using it where you don’t need it.
Hot partitions are usually visible at design time if you know what to look for. Static PKs, low-cardinality keys, and time-bucketed feeds all wave the flag. The pattern library at singletable.dev flags these explicitly - the analytics events pattern and IoT time-series pattern both use sharding by design.