Skip to content

DynamoDB Sparse Indexes: The Highest-Leverage Pattern Nobody Explains

Sparse indexes are the single most underrated tool in DynamoDB, and they’re criminally underexplained in the official docs. Once you get them, you’ll see uses for them everywhere - workflow queues, soft deletes, flagged content, “needs review” dashboards, abandoned carts. They’re cheap, simple, and often replace what would otherwise be a Scan with a FilterExpression.

A GSI only indexes items where the GSI key attributes are set. If you don’t write the GSI key, the item doesn’t appear in the index at all.

That sounds trivial. The implications are not.

How sparse indexes actually work

When you create a GSI on, say, gsi1pk, DynamoDB replicates an item to that GSI only if the item has a value in gsi1pk. Items missing the attribute are skipped entirely - not indexed, not stored, not charged for.

This is the default behavior. You don’t have to enable anything. A “sparse” index is just a GSI where most items in the base table don’t write the GSI key.

The leverage comes from controlling which items write the key. By writing the key only on items you care about, you turn a “find these items in a haystack” query into “list everything in the index.”

The canonical example: workflow queues

You have a table of orders. Most orders are in a terminal state (delivered, cancelled). A small subset are pending and need processing.

The naive approach is a GSI on status:

gsi1pk: STATUS#<status>     // every order writes this
gsi1sk: ORDER#<ulid>

Every order, all 10 million of them, writes a value to gsi1pk. The GSI is a full mirror of the table. Querying “pending orders” works, but you’re paying to index every delivered order forever.

The sparse alternative:

gsi1pk: PENDING_ORDER         // ONLY written when status = "pending"
gsi1sk: ORDER#<ulid>

When an order is created with status = "pending", you write gsi1pk = "PENDING_ORDER". When it transitions to confirmed, you remove gsi1pk entirely (delete the attribute, not set it to null). The order vanishes from the GSI.

The query becomes: Query(GSI1, gsi1pk=PENDING_ORDER) - returns only pending orders. The index contains 200 items instead of 10 million. Read cost is proportional to result size, not table size.

The e-commerce orders pattern uses exactly this for the order processing queue.

The code pattern in ElectroDB

ElectroDB handles this naturally because index keys are computed from attributes. If a required attribute is missing, the key isn’t written.

const OrderEntity = new Entity({
  attributes: {
    orderId: { type: "string", required: true },
    customerId: { type: "string", required: true },
    status: {
      type: ["pending", "confirmed", "shipped", "delivered"],
      required: true,
    },
    // sparse: only set when status === "pending"
    pendingFlag: {
      type: "string",
      watch: ["status"],
      set: (_, { status }) =>
        status === "pending" ? "PENDING" : undefined, // undefined = delete attribute
    },
  },
  indexes: {
    primary: { /* ... */ },
    pendingQueue: {
      index: "GSI1",
      pk: { field: "gsi1pk", composite: ["pendingFlag"], template: "${pendingFlag}_ORDER" },
      sk: { field: "gsi1sk", composite: ["orderId"], template: "ORDER#${orderId}" },
    },
  },
})

Setting pendingFlag to undefined removes the attribute from the item. ElectroDB doesn’t write the GSI key, and the item drops out of the index automatically.

If you’re using the raw SDK, the equivalent is UpdateExpression: "REMOVE gsi1pk, gsi1sk" when the status transitions out of pending.

Five places sparse indexes pay off

1. Workflow / queue items

Anything you process and then forget about. Pending orders, unread notifications, jobs in a queue, items awaiting review. Set the GSI key on creation, remove it on completion. The GSI shrinks to just the active queue.

2. Soft deletes (the “deleted in last 30 days” pattern)

Items aren’t truly removed - they get a deletedAt attribute and (optionally) a TTL for permanent purge. A sparse GSI keyed on DELETED#<deletedAt> lets you list recently-deleted items for restore UIs, without making every live item write a key.

deletedFlag: {
  type: "string",
  watch: ["deletedAt"],
  set: (_, { deletedAt }) =>
    deletedAt ? `DELETED#${deletedAt}` : undefined,
}

For the full TTL angle, see TTL as a soft-delete and expiry pattern.

3. Flagged or “needs attention” items

Comments awaiting moderation, abuse reports, billing failures, pages that need re-publishing. Set a flag attribute when the item enters the bad state, remove it when resolved. A sparse GSI gives you the entire moderation queue in one query.

If only 0.1% of your products are “featured”, you don’t need a GSI that indexes all of them. Write featuredFlag = "FEATURED" only on the featured ones. Listing featured products is one cheap query.

5. Items with optional attributes

If an attribute is only set on some items and you sometimes need to query by it, a sparse GSI on that attribute is essentially free. Examples: products with promotional codes, users with multi-factor auth enabled, articles that have been translated.

Common mistake: setting the attribute to “false” or null

This is the single most common bug. You think you’ve made a sparse index, but you wrote featured: false instead of leaving featured unset.

DynamoDB indexes the item if the attribute exists at all. featured: false is a valid value. Your GSI now has every item in the table.

Fixes:

  • In ElectroDB, return undefined from a set function (not null, not false).
  • With the raw SDK, use UpdateExpression: "REMOVE attrName" rather than SET attrName = :null.
  • The mental model is “absence vs falsy.” A sparse index keys on absence. Falsy is presence.

Combining with overloaded GSIs

Sparse indexes overload beautifully. Multiple sparse access patterns can share a single GSI as long as their PK prefixes don’t collide.

For example, on a single shared GSI1:

Entitygsi1pk (only set when…)gsi1sk
OrderPENDING_ORDER (status = pending)ORDER#<ulid>
CommentFLAGGED_COMMENT (flagged = true)COMMENT#<ulid>
ArticlePUBLISHED#<date> (status = published)ARTICLE#<ulid>

One GSI. Three independent sparse views. None of these access patterns step on each other because their PKs are namespaced. Each entity contributes only its “active subset” to the GSI - the GSI stays small even as the base table grows. This is the trick that lets you scale a single-table design to dozens of access patterns without an explosion of GSIs.

The SaaS multi-tenant pattern walks through GSI overloading in more depth - sparse indexes are the natural next layer on top.

When NOT to use a sparse index

Sparse indexes assume the indexed subset is small relative to the total. If most of your items have the attribute set, you’re not getting any savings - you’ve just built a regular GSI with extra steps.

Rough threshold: if more than ~30% of items would write the GSI key, just build a normal GSI. Sparse pays off in the 0.1% to 5% range.

The other case to avoid: don’t make a sparse index where the “absent” subset is what you query. You can’t query for items missing an attribute on an index - they’re not in the index at all. Sparse indexes are for “the small set I do want”, not “the small set I want to exclude.”

Quick mental check

For any new GSI you’re considering, ask:

  1. What fraction of items will this GSI contain? If small, use a sparse index by writing the key conditionally.
  2. Is there a state transition that takes items out of the queryable set? Use that transition to remove the GSI key.
  3. Could this share a partition prefix with an existing sparse access pattern? Overload them onto one GSI.

Most “queue”, “pending”, “active”, and “flagged” access patterns I see in the wild should be sparse indexes. Most aren’t. Switching is cheap and the savings compound forever.


The full schema patterns at singletable.dev all use sparse indexes where they help - the e-commerce orders pending queue, the content management draft list, the workflow state machine running-executions feed. Hard-to-spot at design time, easy to see in a visual schema designer - which is what I’m building.

Tejovanth N

Tejovanth builds on DynamoDB in production: rasika.life, rekha.app, rrmstays. All single-table with ElectroDB.

LinkedIn codeculturecob.com

Related

Schema review

Want a second pair of eyes before you ship?

Async DynamoDB schema review. PK/SK design, GSI strategy, ElectroDB entity code. Fixed price, 5 business days.