Envelope Encryption: Designing Tenant-Level Database Security That Actually Scales

Here's a thought experiment that starts simple and gets uncomfortable fast.

You're building a multi-tenant SaaS. Security matters. You decide every tenant's data should be encrypted at rest, with their own key — so a breach that exposes one tenant's key doesn't compromise everyone else's data. Sensible. Responsible. Correct.

Now: you have 10,000 tenants. Each with their own encryption key. A new row gets written to the database. Before it lands, you need to encrypt it with the right tenant's key. Before it's read, you need to decrypt it. This happens on every single query, across every table, for every tenant, under production load.

Where do you keep 10,000 keys? How do you rotate them without taking down the platform? What happens to query performance when every read requires a decrypt operation? What do you tell the enterprise customer who wants to supply their own key and revoke your access to their data at will?

Per-tenant encryption sounds like a checkbox. At scale, it's an architecture problem. And the solution that actually works — the one AWS, GCP, and every serious security-conscious SaaS eventually lands on — is called envelope encryption.

Why Naive Approaches Break

Before the solution, the failure modes. They're instructive.

Approach 1: One key for everyone. Encrypt all tenant data with a single application-level key stored in an environment variable. Simple, fast, and completely wrong from a security isolation standpoint. A single compromised key means all tenant data is compromised. Key rotation requires re-encrypting every row in the database. A breach exposes everything.

Approach 2: One key per tenant, stored in the database. Store each tenant's encryption key in a tenant_keys table. Use it to encrypt their data. This is the answer a junior engineer gives — and the problem is immediately obvious once you say it out loud: you're protecting data with a key that lives in the same database as the data. A database breach gets you both.

Approach 3: One key per tenant, stored in a secrets manager. Better. Each tenant's key lives in AWS Secrets Manager or HashiCorp Vault. You fetch the key, decrypt the data, done. The key isn't co-located with the data. The problem surfaces under load: every database read now requires a secrets manager round-trip. At thousands of queries per second, you've replaced a database bottleneck with a KMS bottleneck. Latency climbs. Costs climb. Rate limits become an operational concern.

Approach 4: Cache the keys in memory. Fetch the key once, cache it in the application. Now performance is fine — until the cache grows to hold 10,000 keys and the application's memory footprint becomes unreasonable, or a cache flush triggers a thundering herd against your KMS, or you can't reason about which keys are in memory at any given moment during an incident.

Each of these approaches solves one problem by creating another. They're all missing the same architectural insight.

The Envelope Insight

Envelope encryption solves this by separating the key hierarchy into two distinct layers with completely different properties.

Layer 1 — The Data Encryption Key (DEK). A symmetric key (AES-256-GCM) generated fresh for each tenant. This key encrypts the actual data. It lives close to the data — in fact, it lives in the database, alongside the rows it protects. But here's the critical part: it's stored encrypted.

Layer 2 — The Key Encryption Key (KEK). A master key managed by a KMS (AWS KMS, GCP Cloud KMS, HashiCorp Vault). This key never touches your application directly — you never download it, never store it. You ask the KMS to use it on your behalf. The KEK's only job is to encrypt and decrypt DEKs.

The "envelope" is the DEK wrapped inside a layer of KEK encryption. Each tenant has a DEK. Each DEK is encrypted with the KEK. Both sit in the database. Neither is useful without KMS access, and KMS access is fully audited and controllable.

The elegant part: to read a row, you fetch the encrypted DEK, ask KMS to decrypt it, use the plaintext DEK to decrypt the row, then discard the plaintext DEK. The DEK lives in memory only for the duration of that operation — or cached briefly, as we'll cover. The KEK never leaves the KMS. The data in the database is doubly useless without both.

The Key Lifecycle

Tenant Provisioning

When a new tenant is created, the encryption setup runs as part of onboarding:

Notice what KMS's GenerateDataKey does: it generates a fresh AES-256 key, encrypts it with your KEK, and returns both the plaintext and encrypted versions in a single call. You use the plaintext immediately, store the encrypted version, and the plaintext disappears. KMS never had to see your data — it only touched the key.

Encrypting Data

Every sensitive field write goes through the same pattern:

Important

Always use a fresh, random IV per encryption operation with AES-256-GCM. Reusing an IV with the same key is catastrophic — it allows an attacker to XOR two ciphertexts and cancel out the keystream, exposing both plaintexts. Generating 16 random bytes per encrypt call is not optional.

The Performance Problem and Its Real Solution

The naive flow — fetch encrypted DEK from DB, call KMS to decrypt, encrypt/decrypt data, done — adds a KMS round-trip to every query. AWS KMS latency is typically 5–20ms. At scale, this is untenable.

The solution is a DEK cache — but a carefully designed one with security properties intact.

The cache is LRU-bounded — at most 500 DEKs in memory at once, evicting the least-recently-used when the limit is reached. A 5-minute TTL means that if a DEK is rotated or a tenant's access is revoked, the application catches up within five minutes without a deployment.

The KMS call only happens on a cache miss: on the first request for a tenant's data after startup, or after the TTL expires. A tenant with active users generates at most one KMS call per five minutes, regardless of query volume.

Note

The DEK cache holds plaintext keys in application memory. This is a conscious, accepted tradeoff — in-memory data is not encrypted, and a memory dump of the process would expose cached DEKs. The mitigation is the LRU bound (limiting blast radius to active tenants) and short TTL (limiting exposure window). The alternative — no caching — trades memory security for KMS dependency and latency that quickly becomes operational.

Key Rotation: Where Envelope Encryption Pays Its Biggest Dividend

Key rotation is where most encryption schemes become impractical. If you have one key for all tenant data and you need to rotate it, you have to re-encrypt every row in the database. At any meaningful scale, that's a migration project that takes hours, involves downtime risk, and terrifies everyone involved.

Envelope encryption changes this completely. There are two kinds of rotation, and they're independent.

DEK Rotation (Per-Tenant)

Rotating a tenant's DEK means: generate a new DEK, re-encrypt the tenant's data with the new DEK, store the new encrypted DEK, discard the old one.

This is a tenant-scoped migration — it touches only that tenant's rows. It can be done live, incrementally, without touching any other tenant's data. For a single tenant it might take seconds or minutes, not hours.

KEK Rotation (Platform-Wide, Cheap)

Here's the part that surprises most engineers. Rotating the KEK — the master key in KMS — does not require re-encrypting any tenant data. It only requires re-encrypting the DEKs.

You're re-encrypting 10,000 small blobs of ciphertext — the DEKs — not 10,000 tables of tenant data rows. The operation that takes hours with a single shared key takes minutes with envelope encryption, and the tenant data itself is never touched.

Rotation type	What gets re-encrypted	Scope	Time at 10K tenants
Naive single key	Every row of every tenant	Platform-wide, blocking	Hours
DEK rotation	All rows for one tenant	Per-tenant, live	Seconds–minutes
KEK rotation	Only the encrypted DEKs	Platform-wide, non-blocking	Minutes
Both	DEKs + one tenant's rows	Targeted	Minutes

Bring Your Own Key: The Enterprise Feature That Falls Out for Free

Here's the commercial benefit that makes envelope encryption worth pitching to your CTO as a revenue decision, not just a security one.

Enterprise customers — financial institutions, healthcare companies, government contractors — frequently require BYOK: Bring Your Own Key. They want to supply their own KEK. They want to be able to revoke your access to their data at will, without your involvement, by disabling their key in their own KMS account.

In a naive encryption scheme, this is a custom engineering project for each enterprise customer. In an envelope encryption scheme, it's a configuration option.

When the enterprise customer wants to offboard or revoke your access to their data, they disable their KMS key. Your next DEK decrypt call fails. Their data becomes inaccessible to your application — cryptographically, without any action required from you. This is the property that enterprise security teams actually want, and envelope encryption delivers it at the cost of a config field and an IAM cross-account trust policy.

Tip

Price BYOK as an enterprise tier feature. The incremental engineering cost is low — you've already built the infrastructure. The compliance and procurement value for enterprise customers is high. This is one of the cleanest examples of architecture translating directly into commercial differentiation.

Field-Level vs. Row-Level vs. Table-Level Encryption

One more architectural decision you'll face: what granularity do you encrypt at?

Table-level encryption (encrypting entire tables or database files) is what your cloud provider's "encryption at rest" gives you. It protects against someone stealing physical disk, not against a compromised application layer. Not what we're talking about here.

Row-level encryption encrypts the entire serialized row as a single blob. Simpler to implement but means you can't query on encrypted fields — you decrypt on read. Works well when you always fetch by tenant ID and don't need to filter by encrypted fields.

Field-level encryption encrypts specific sensitive columns individually, leaving identifiers and non-sensitive metadata unencrypted for queryability.

Field-level is more work but far more flexible in production. You retain the ability to filter by status, sort by timestamp, and index on non-sensitive columns — without touching the sensitive fields. Most real systems end up here.

Warning

Do not encrypt fields you need to query, filter, or sort on — unless you're implementing deterministic encryption (using the same IV each time for the same plaintext). Deterministic encryption trades some security guarantees for queryability. It reveals when two rows have the same plaintext value. Only use it for fields where that leakage is acceptable — never for truly sensitive values like PII.

Operational Checklist

Envelope encryption done right requires more than code. Before considering this production-ready:

KMS access control — Your application's IAM role should have the minimum KMS permissions: kms:GenerateDataKey, kms:Decrypt, kms:DescribeKey. It should not have kms:DeleteKey, kms:DisableKey, or kms:ScheduleKeyDeletion. A compromised application credential should not be able to destroy keys.

KMS audit logging — Enable CloudTrail logging for all KMS API calls. Every DEK decryption is logged with a timestamp, caller identity, and key ID. This log is your forensic trail for a security incident.

DEK cache security — The plaintext DEK cache should never be serialized to disk, never logged (ensure your logging framework doesn't serialize the cache), and never sent across a network boundary. It lives in application memory and nowhere else.

Key rotation schedule — Define a rotation policy before you go live: DEK rotation cadence (quarterly or on tenant request), KEK rotation schedule (annual minimum, more frequent if required by compliance framework). Automate it — manual rotation schedules get skipped.

Deletion is permanent — Ensure your team understands that deleting a KEK or DEK without first decrypting the data it protects means the data is gone. Permanently. KMS allows a 7–30 day deletion pending window — use it. Make key deletion require multiple approvals.

Caution

Test your key rotation procedure in staging before you need it in production. Rotation that looks correct in code but has a transaction ordering bug can leave rows encrypted with an old DEK while the new DEK is stored — rendering those rows permanently unreadable. The test: rotate a key, verify all data decrypts correctly with the new key, verify the old key can no longer decrypt anything. Do this before your first production tenant.

Why This Architecture Ages Well

Envelope encryption doesn't just solve today's security requirements — it creates a foundation that gets stronger as the system grows.

Adding a new tenant doesn't touch the KEK. The new DEK is generated, encrypted, stored. Isolated. Adding a new region means adding a regional KMS key as a KEK option — DEKs can be re-encrypted for whichever regional KEK governs their data residency. A security incident at one tenant requires rotating only their DEK — the platform and every other tenant is unaffected.

The system maps naturally to your business model: free tenants get platform-managed keys, enterprise tenants get BYOK, compliance-sensitive tenants get regional keys. Each tier is the same underlying architecture with a different configuration.

And when your enterprise customer's security team asks — "how do you ensure our data is isolated from other tenants at the cryptographic level?" — you have an answer that isn't a policy document. It's an architecture diagram.

The thought experiment at the beginning of this post has an answer: envelope encryption handles 10,000 tenants the same way it handles 10. The key count grows; the complexity doesn't.

Helpful?