Skip to content

DynamoDB Tenant Isolation: Application-Layer vs IAM LeadingKeys

When I published the SaaS Multi-Tenant schema pattern, the richest discussion in the r/aws thread wasn’t about key design - it was about isolation.

User u/finitepie laid out a full IAM role assumption + dynamodb:LeadingKeys approach with STS credential caching at 55 min TTL on Lambda. The implication: application-layer scoping (what I described in the pattern) isn’t real isolation. A bug in your middleware leaks one tenant’s data to another. IAM LeadingKeys enforces isolation at the database level.

They’re right that IAM LeadingKeys is stronger. They’re also right that it’s more complex. The question is when the complexity is worth it.

Here’s the full breakdown of both approaches, when each makes sense, and how to implement them.

The problem both approaches solve

In single-table DynamoDB with TENANT#<tenantId> as your partition key, all tenant data is naturally co-located but not naturally isolated. There’s nothing in DynamoDB itself that prevents Tenant A’s Lambda from querying Tenant B’s partition - if it constructs the right key.

Your application code is what enforces the boundary. The question is whether you enforce it in your application logic or in AWS IAM policy. A third architectural option - table-per-tenant design - changes this question entirely by physically separating tenant data at the table level, at the cost of operational complexity.

Approach 1: application-layer scoping

The standard approach. Your auth middleware extracts the tenantId from the authenticated session and scopes every DynamoDB call to that tenant’s partition.

Here’s what this looks like in a tRPC context, which is what I use in rasika.life:

// middleware/tenant.ts
import { TRPCError } from '@trpc/server';
import { middleware } from '../trpc';

export const tenantMiddleware = middleware(async ({ ctx, next }) => {
  const { session } = ctx;

  if (!session?.user?.tenantId) {
    throw new TRPCError({ code: 'UNAUTHORIZED' });
  }

  return next({
    ctx: {
      ...ctx,
      tenantId: session.user.tenantId,
      // Every DynamoDB call in downstream resolvers uses this
    },
  });
});

// Usage in a protected procedure
export const tenantProcedure = publicProcedure.use(tenantMiddleware);

Every downstream resolver that touches DynamoDB uses ctx.tenantId to build the partition key:

// routers/projects.ts
export const projectsRouter = router({
  list: tenantProcedure.query(async ({ ctx }) => {
    return ProjectEntity.query.primary({
      tenantId: ctx.tenantId  // enforced by middleware, not by caller
    }).go();
  }),

  get: tenantProcedure
    .input(z.object({ projectId: z.string() }))
    .query(async ({ ctx, input }) => {
      const result = await ProjectEntity.get({
        tenantId: ctx.tenantId,  // caller can't override this
        projectId: input.projectId,
      }).go();

      if (!result.data) throw new TRPCError({ code: 'NOT_FOUND' });
      return result.data;
    }),
});

The tenantId is never user-supplied directly - it always comes from the verified session. A user can’t request data from another tenant’s partition because they can’t construct a request that reaches the DynamoDB layer with a different tenantId.

What this protects against: A malicious user crafting a request to access another tenant’s data through your API.

What this does not protect against: A bug in your middleware that allows tenantId to be spoofed or bypassed. An internal Lambda (like a background job or webhook handler) that accidentally queries without the tenant scope. A developer with direct DynamoDB access in the AWS console.

Strengths

  • Simple to implement and review
  • Works with any DynamoDB client - no AWS credential gymnastics
  • Easy to audit: tenant scoping is visible in the application code
  • No performance overhead from STS calls

Weaknesses

  • Isolation is only as strong as your middleware discipline. One misconfigured route leaks data.
  • Doesn’t protect against AWS-level access (console, CLI, other Lambda functions with table-level permissions)
  • No cryptographic proof of isolation for compliance purposes

Approach 2: IAM LeadingKeys

IAM dynamodb:LeadingKeys is a condition key in DynamoDB resource-level policies. It restricts which partition key values an IAM principal can access. At the database level - not the application level.

The mechanism: at request time, each tenant gets a temporary IAM role scoped to their TENANT#<tenantId> partition. Every DynamoDB call uses credentials from that role. If a call tries to access a different partition, IAM rejects it before DynamoDB even sees the request.

How it works

Step 1: Create a per-tenant IAM role

You create one IAM role per tenant (or one role with a trust policy that lets you assume it per-tenant) with a dynamodb:LeadingKeys condition:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:UpdateItem",
        "dynamodb:DeleteItem",
        "dynamodb:Query"
      ],
      "Resource": [
        "arn:aws:dynamodb:us-east-1:123456789:table/MainTable",
        "arn:aws:dynamodb:us-east-1:123456789:table/MainTable/index/*"
      ],
      "Condition": {
        "ForAllValues:StringLike": {
          "dynamodb:LeadingKeys": ["TENANT#${aws:PrincipalTag/tenantId}*"]
        }
      }
    }
  ]
}

The ${aws:PrincipalTag/tenantId} substitution pulls the tenantId from the principal’s tags at the time of role assumption. DynamoDB evaluates this condition on every API call - there’s no way to bypass it from application code.

Step 2: Assume the role per request

When a request comes in, your Lambda assumes the tenant-scoped role via STS:

import { STSClient, AssumeRoleCommand } from '@aws-sdk/client-sts';
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';

const sts = new STSClient({});

async function getTenantClient(tenantId: string): Promise<DynamoDBClient> {
  const { Credentials } = await sts.send(new AssumeRoleCommand({
    RoleArn: `arn:aws:iam::123456789:role/TenantRole-${tenantId}`,
    RoleSessionName: `tenant-session-${tenantId}`,
    DurationSeconds: 3600,
    Tags: [{ Key: 'tenantId', Value: tenantId }],
  }));

  return new DynamoDBClient({
    credentials: {
      accessKeyId: Credentials!.AccessKeyId!,
      secretAccessKey: Credentials!.SecretAccessKey!,
      sessionToken: Credentials!.SessionToken!,
    },
  });
}

Step 3: Cache the credentials on Lambda

STS calls cost ~50ms and have API rate limits. You don’t want one STS call per DynamoDB request. The standard pattern is to cache credentials per tenantId for the duration of the Lambda execution context - typically until credentials expire (~55 min TTL if you set DurationSeconds: 3600).

// Credential cache lives in Lambda's module scope (survives warm invocations)
const credentialCache = new Map<string, {
  client: DynamoDBClient;
  expiresAt: number;
}>();

async function getTenantClient(tenantId: string): Promise<DynamoDBClient> {
  const cached = credentialCache.get(tenantId);
  const now = Date.now();

  // Refresh 5 minutes before expiry
  if (cached && cached.expiresAt - now > 5 * 60 * 1000) {
    return cached.client;
  }

  const { Credentials } = await sts.send(new AssumeRoleCommand({
    RoleArn: `arn:aws:iam::123456789:role/TenantRole-${tenantId}`,
    RoleSessionName: `tenant-${tenantId}`,
    DurationSeconds: 3600,
    Tags: [{ Key: 'tenantId', Value: tenantId }],
  }));

  const client = new DynamoDBClient({
    credentials: {
      accessKeyId: Credentials!.AccessKeyId!,
      secretAccessKey: Credentials!.SecretAccessKey!,
      sessionToken: Credentials!.SessionToken!,
    },
  });

  credentialCache.set(tenantId, {
    client,
    expiresAt: Credentials!.Expiration!.getTime(),
  });

  return client;
}

On a warm Lambda that serves the same tenant repeatedly, you pay the STS cost once per hour. On a cold start or first request for a tenant, you pay ~50ms.

Strengths

  • Isolation is enforced by AWS IAM, not your application code. A bug in middleware can’t bypass it.
  • Works even for direct console/CLI access - the role constraint applies everywhere
  • Auditable in CloudTrail: every cross-tenant attempt generates an IAM denial event
  • Meaningful for compliance: SOC 2 auditors recognize AWS-level isolation as a control

Weaknesses

  • Significant operational overhead: one IAM role per tenant, or complex trust policy management
  • STS latency and rate limits add complexity even with caching
  • GSI access requires careful policy tuning - LeadingKeys applies to the base table; cross-partition GSI queries need separate handling
  • Harder to reason about in code reviews - the isolation guarantee is invisible in application code

The decision framework

ConditionApplication-LayerIAM LeadingKeys
Compliance requirement (SOC 2, HIPAA, FedRAMP)
Team has AWS IAM expertiseEither
Tenant count > 1,000Either⚠️ Role proliferation risk
Multi-service access to same table
Lambda cold start sensitivity⚠️ STS adds ~50ms cold
Rapid development / early stage
Breach blast radius must be cryptographically bounded

Use application-layer scoping when:

  • You’re pre-SOC2 and moving fast
  • Your DynamoDB table is only accessed through one API service
  • Your team can enforce middleware discipline through code review

Use IAM LeadingKeys when:

  • Compliance requires cryptographic proof of isolation
  • Multiple services or AWS principals access the same table
  • A data breach in one tenant being contained to that tenant is a hard requirement, not just a best practice

A pragmatic middle path: Start with application-layer scoping. Add comprehensive integration tests that verify tenant isolation - tests that deliberately attempt cross-tenant access and assert it’s blocked. When you reach compliance requirements, migrate to IAM LeadingKeys. The schema doesn’t change; only the credential strategy does. Compliance requirements that force cryptographic isolation are also one of the signals covered in when NOT to use single-table design - sometimes the right answer is physical table separation.

One thing IAM LeadingKeys doesn’t cover

Cross-partition GSI queries. If you have a GSI with USER_EMAIL#<email> as the partition key (like the SaaS Multi-Tenant pattern’s login flow), that query doesn’t have a leading key in the traditional sense - it’s crossing tenant boundaries by design (you’re looking up a user without knowing their tenant yet).

You need to either exclude that access pattern from LeadingKeys enforcement (allow the login GSI lookup with a separate policy statement) or handle the login flow in a separate service that holds elevated credentials before scoping down after tenant identification.

This is the operational nuance that makes IAM LeadingKeys harder than it first appears. The gain is real, but so is the surface area.


The SaaS Multi-Tenant pattern covers the key structure this post assumes - TENANT#<id> partitioning, GSI overloading, and the full entity model. I’m building singletable.dev to make tenant isolation strategies visible at design time - so you see the tradeoff before it’s in production.