Skip to content

When Order Duplication Is and Isn’t Acceptable in DynamoDB

The e-commerce orders pattern writes each order to two places:

  • ORDER#<orderId> / #METADATA - for direct lookup by order ID
  • CUSTOMER#<customerId> / ORDER#<ulid> - for customer order history

Two writes per order. Two records per order. If you have a million orders, you have two million order records.

The r/webdev thread asked the obvious follow-up: is that always necessary? When is duplication the right call, and when is it just waste?

Production issue

You're likely losing money on this in production.

A wrong partition key or missing GSI is a live cost problem. Get a DynamoDB schema review before your next deploy — async, fixed price, 5 business days.

Why duplication exists in single-table design

DynamoDB doesn’t have joins. If you need to access the same data from two different starting points - by order ID and by customer ID - you have two options:

Option A is to store the data once and require context when fetching. Store orders only under CUSTOMER#<customerId>. To get an order by ID, you need to know the customer ID first.

Option B is to duplicate the data. Store orders in two places so both access patterns are O(1) without additional context.

Option A is cheaper and simpler. Option B is faster and more flexible. The right choice depends on whether you always have the required context.

When single storage is enough

If you always have the customer ID when you need to look up an order, you don’t need the duplicate.

In a typical e-commerce web app, the customer is in the authenticated session. Every request that needs order data includes the customer ID automatically. The order detail page URL is /orders/<orderId>, but the server knows who’s logged in and can query CUSTOMER#<customerId> / ORDER#<ulid> directly - it just needs to know the ULID for that order ID.

If your URL scheme includes the customer context (/customers/<customerId>/orders/<orderId>), you have everything you need from the URL alone.

This is the common case for customer-facing e-commerce. Single storage works, and you save the write cost.

// Single storage: orders only under customer partition
// AP1: Get order by ID - requires customerId
GetItem(pk=CUSTOMER#<customerId>, sk=ORDER#<ulid>)

// AP2: Get all orders for a customer
Query(pk=CUSTOMER#<customerId>, sk begins_with ORDER#, ScanIndexForward=false)

Both access patterns work. You need customerId for AP1, but you have it from the session.

When you need the duplicate

The duplicate becomes necessary when you need to access an order without knowing the customer. Three real scenarios:

Webhooks from payment processors: Stripe sends a webhook with order_id. No customer context. Your webhook handler needs to fetch the order to update its status. If the order lives only under CUSTOMER#<id>, you have to query a GSI to find the customer first - extra latency, extra cost, extra failure mode.

Admin and customer service tools: support staff look up orders by order ID, not by customer. They don’t necessarily know or have access to the customer’s ID. “Find order 01HVNR4Q3R” needs to work without customer context.

Order confirmation emails and receipts: background jobs that send emails often receive an orderId from a queue message. No customer context in the queue payload unless you include it explicitly.

For all three of these, ORDER#<orderId> / #METADATA makes the lookup O(1). Without it, you need a GSI on orderId - which costs the same in write amplification but adds index management overhead. This is the same class of tradeoff I document in what your schema explicitly doesn’t support: the duplicate record is the cost of making a specific access pattern explicitly supported, without requiring a GSI or additional context from the caller.

The cost of duplication

At scale, this matters. Here’s the actual cost breakdown:

Storage: DynamoDB charges $0.25/GB. An order record at 500 bytes costs ~$0.00000012 per record. At 1 million orders, duplication adds about $0.12/month in storage. Not a meaningful number.

Write cost: on-demand pricing charges per write request unit (1KB). Duplication doubles your write cost per order creation. At $1.25 per million writes, 1 million orders costs an extra $1.25. Still not meaningful for most applications.

Consistency: this is the real cost. If order data can be updated (status changes, address corrections), you’re responsible for keeping both records in sync. DynamoDB transactions help:

import { TransactWriteCommand } from '@aws-sdk/lib-dynamodb';

// Update both records atomically
await client.send(new TransactWriteCommand({
  TransactItems: [
    {
      Update: {
        TableName: 'MainTable',
        Key: { pk: `ORDER#${orderId}`, sk: '#METADATA' },
        UpdateExpression: 'SET #status = :status, updatedAt = :now',
        ExpressionAttributeNames: { '#status': 'status' },
        ExpressionAttributeValues: { ':status': 'shipped', ':now': new Date().toISOString() },
      },
    },
    {
      Update: {
        TableName: 'MainTable',
        Key: { pk: `CUSTOMER#${customerId}`, sk: `ORDER#${orderId}` },
        UpdateExpression: 'SET #status = :status, updatedAt = :now',
        ExpressionAttributeNames: { '#status': 'status' },
        ExpressionAttributeValues: { ':status': 'shipped', ':now': new Date().toISOString() },
      },
    },
  ],
}));

TransactWriteItems is atomic - both update or neither does. The cost is 2x write units (each transactional write costs 2 WRUs), but the consistency guarantee is worth it for mutable data.

If you’re not using transactions for updates to duplicated records, you will eventually have inconsistent state. That’s when duplication becomes a liability.

The practical decision

Duplicate when:

  • You need to access the record without knowing the parent entity’s ID (webhooks, admin tools, queue consumers)
  • The data is either immutable or you’ll use transactions for updates
  • The access pattern is genuinely frequent - direct order lookup is a core operation, not an edge case

Don’t duplicate when:

  • You always have the parent entity’s context (session, URL parameter, API design)
  • The data changes frequently and transaction overhead matters
  • The access pattern is rare enough to justify a GSI query instead

A middle path: include the parent entity’s ID in the queue message or webhook payload. If your Stripe webhook handler receives { orderId, customerId } instead of just { orderId }, you avoid the duplicate entirely. Sometimes the schema question is actually an API design question.

For most e-commerce applications, the duplicate order record pays for itself through simpler webhook handling alone. But it’s a choice you should make deliberately, not a default you accept without thinking about the alternatives.


The e-commerce orders pattern shows the full dual-write schema with sample data illustrating which record writes to the GSI and which doesn’t. I’m building singletable.dev to make duplication tradeoffs like this visible before they’re in production.

Tejovanth N

Tejovanth builds on DynamoDB in production: rasika.life, rekha.app, rrmstays. All single-table with ElectroDB.

LinkedIn codeculturecob.com

Related

Production issue

You're likely losing money on this in production.

A wrong partition key or missing GSI is a live cost problem. Get a DynamoDB schema review before your next deploy — async, fixed price, 5 business days.