HASSAN.
EngineeringApr 2026

The $100 Question: Batching Jobs, Taming 60K Daily API Calls, and the Idempotency Tradeoff

BullMQQueue ArchitecturePerformanceCost OptimizationDistributed SystemsNode.js9 min read read
The $100 Question: Batching Jobs, Taming 60K Daily API Calls, and the Idempotency Tradeoff

The rotation proxy bill arrived and it said $120.

For a staging environment.

I stared at it for a moment, did the mental math forward — we haven't even launched yet, this volume is going to keep growing, what does this look like in six months? — and decided that before we went live, something about the pipeline had to fundamentally change.

But the $120 wasn't even the first problem. The first problem was quieter, less urgent-looking, and more dangerous: we were heading toward a system that would hammer our database with tens of thousands of individual writes per day. It worked fine at current volume. It would not work fine at the volume we were planning for.

This is the story of a pipeline that processed 30,000 jobs a day, made 60,000 external API calls to do it, and how we restructured it from first principles to be both cheaper and more resilient — without throwing away the properties that made a message queue worth using in the first place.


The Doctrine Every Queue Tutorial Teaches You

Open any serious writing on distributed systems and message queues and you'll find the same principle repeated with the confidence of a natural law:

Jobs should be atomic and idempotent.

One job, one unit of work. If it fails, retry just that job. If it succeeds, it's done — running it again changes nothing. The queue becomes a reliable delivery mechanism for independent, self-contained operations. Beautiful. Clean. Correct.

The principle exists for good reasons. Atomicity means your retries don't partially re-apply work. Idempotency means a duplicated delivery — which distributed queues guarantee can happen — doesn't corrupt your data. When every job is independent, you can scale workers horizontally without coordination. Failure blast radius is a single job, not a batch.

This is sound architecture. I believed it. I still believe it.

And then reality showed up with 30,000 jobs a day and a growing infrastructure bill, and the doctrine needed a conversation.


The Problem With Pure Idempotency at Volume

The worker's job was straightforward: pull a job from the queue, make a couple of HTTP calls to an external source via a rotation proxy, process the result, write to the database.

At 100 jobs a day, this is a non-issue. At 30,000 jobs a day — all processed in a concentrated daily window — we were looking at:

  • 30,000 individual database writes, each a separate round-trip
  • 60,000 external HTTP calls through the rotation proxy (2 calls per job), every one of them billable
  • A queue with 30,000 entries that the worker churned through one at a time

The database concern wasn't a crisis at 30K. Modern Postgres handles this without flinching. But this wasn't a system built for today — the job volume was growing every week, reliably. At some multiple of current volume, 30,000 individual writes become an I/O problem. Designing around that before it becomes an incident is how you avoid the 2 a.m. call later.

The rotation proxy bill was more immediate. $120 a month in staging — before full production load — was a loud signal that the economics weren't going to work.

Something needed to change.


The Decision: Break the Doctrine, Deliberately

After mapping out the problem, I made the call to move to batched jobs.

Instead of one job = one unit of work, the new model became: one job = a batch of 100 sub-jobs.

Queue depth: 30,000 → 300. DB writes: 30,000 individual → 300 bulk upserts. The total external API call count doesn't change — 60,000 HTTP calls still happen — but now they're organized, controllable, and easier to optimize.

The tradeoff being made here deserves honesty: a batch job is less purely idempotent than the one-job-one-unit model. It's a larger unit with internal state. If it crashes midway, you have to manage which sub-jobs completed and which didn't. You're taking on more complexity inside the processor in exchange for fewer round-trips, lower infrastructure cost, and better throughput characteristics.

That's a real cost. It was worth paying. Paying it with eyes open is different from paying it accidentally.


Building Sub-Job Retry Logic Inside the Processor

The reason pure idempotency is emphasized in queue systems is simple: the queue handles retries for you. A failed job gets retried automatically, in isolation, without affecting anything else. The moment you batch, you give that up — and you have to build the equivalent yourself.

So that's what I did. The retry logic was engineered in three tiers, each one a safety net for the tier above it.

The first tier was in-code retries — each sub-job was attempted immediately on transient failures before anything else escalated. If that didn't resolve it, the sub-job moved to the second tier: a dedicated retry queue, where it would be picked up on the next processing cycle with its own isolated retry budget. If it exhausted that too, it graduated to the dead letter queue — preserved with its full payload for manual inspection, replay, or investigation without blocking anything else.

The result: a single failing sub-job has no leverage over the 99 that succeeded. Blast radius is sub-job, not batch. The bulk write still happens for everything that completed cleanly, and the failures are visible, addressable, and recoverable — not silently dropped or holding up the queue.

Concurrency across this whole flow was explicit and bounded. Sub-jobs within a batch ran in parallel, but with a hard concurrency ceiling derived from the rotation proxy's per-IP rate limit and validated in staging. Throughput stayed high; the proxy saw steady, predictable load instead of spikes.

Note

When batching, the question isn't just "does this complete faster?" — it's "when something goes wrong, what's the smallest unit that fails?" Design your retry tiers before you write the batch logic. The failure model should be the first decision, not the last.


The 60K API Call Problem

Restructuring into batches gave us better DB characteristics. It did not reduce the rotation proxy bill. We were still making 60,000 external HTTP calls per day — just now in organized groups instead of one-at-a-time jobs.

The $120 bill was still coming.

To understand where the cost was really sitting, I had to understand what those HTTP calls per sub-job were actually doing.

The optimization came from questioning what we actually needed from each call.

Some initial calls were fetching a full resource object when we only needed few fields. The external API supported field projection — specifying exactly which fields to return in the response. We hadn't been using it.

This alone didn't eliminate calls — but it had a less obvious second-order effect. With smaller, faster first-call responses, we could cheaply inspect the result before deciding whether the second call was necessary. And a meaningful portion of our sub-jobs were resources that hadn't changed since they were last processed.

The result: the second HTTP call — the more expensive one — was eliminated for every sub-job where the resource hadn't changed. On a pipeline where a large portion of the daily job set is re-processing existing resources, this cut deep.

MetricBeforeAfter
Daily external API calls~60,000~22,000
Rotation proxy cost / month$120~$20
DB writes per day~30,000 rows~300 bulk upserts
Queue depth at run time30,000 jobs300 batch jobs
Sub-job concurrency1 at a time10 per batch

A 63% reduction in proxy calls. An 83% drop in cost. The same pipeline, the same external source, the same output — just not fetching data we didn't need, and not fetching it at all when nothing had changed.


Concurrency: The Dial You Actually Control

One thing the batch model gave us that the individual-job model never had: a meaningful concurrency dial that doesn't require adding more workers.

In the original model, the only lever was "how many worker processes do we run in parallel?" More workers meant more parallelism but also more simultaneous outbound connections, more load on Redis, more pressure on the proxy's IP pool.

In the batch model, the concurrency knob lives inside the processor. We can run one batch job at a time while still executing 10 HTTP calls in parallel inside it. Throughput stays high. Total simultaneous connections stay predictable. The proxy sees steady, controlled traffic rather than spikes proportional to however many worker instances someone spun up.

Tip

When using a rotation proxy, your effective rate limit often isn't global — it's per exit IP. Uncontrolled concurrency can exhaust an IP's quota faster than you expect, triggering bans that then force the proxy to burn through its pool. Explicit concurrency limits inside your processor give you predictable proxy utilization that you can tune against your billing tier without touching your deployment config.


What the Idempotency Principle Is Actually Protecting

The principle of idempotent, atomic queue jobs exists to protect two things: partial state corruption on retry, and duplicate delivery causing unintended side effects.

In the batch model we built, both are still protected — just by different mechanisms:

Partial state on retry is handled by the bulk upsert being idempotent. Re-running a batch that partially completed writes the same records again. Postgres ON CONFLICT DO UPDATE means re-inserting a row that already exists is safe and silent.

Duplicate delivery is handled by BullMQ's job deduplication on job ID. The same batch can't be enqueued twice.

The retry and dead letter routing inside the processor restores per-sub-job retry granularity that the queue would have given for free in the pure model. It's more code. It's also code that's explicitly tested, fully observable (every batch logs its own aggregate outcome), and under your control.

Warning

Don't batch for aesthetic reasons or just to reduce queue depth numbers. Batch when the cost of individual operations is measurable and the compensating sub-job failure handling is properly built and tested. Batching without proper retry granularity trades the queue's biggest strength — independent, automatic retries — for complexity you now own entirely.


The Observability Bonus Nobody Plans For

Something unexpected happened when we moved to batches: the pipeline got more observable, not less.

In the original model, understanding how the daily run went meant aggregating 30,000 individual job completion logs and calculating your own throughput metrics. In the batch model, every job completion carries a built-in summary:

300 of these per daily run gives you a real-time throughput dashboard for free. Success rate per batch, skip rate (how many sub-jobs were unchanged), proxy call efficiency — all of it visible without a separate analytics pipeline.

We also had alerts missing before the zombie worker incident taught us to add them. The batch model made success alerting trivial: if the daily run completes fewer than 250 of its 300 expected batches by a certain time, fire an alert. A simple threshold on a number we now had naturally.


The Lesson Underneath the Lesson

Architectural principles in distributed systems aren't laws of physics. They're tools designed to solve specific problems in specific contexts. When your context diverges enough from that original context, applying them without adaptation creates new problems.

The lesson isn't "idempotent jobs are wrong." The lesson is: understand what the principle is protecting, then decide whether your architecture protects the same things by other means.

We broke the letter of the idempotency doctrine and preserved its spirit. The sub-jobs have independent retry paths. Duplicate delivery is handled. Partial writes are safe to re-run. The queue still does what message queues are for — reliable, durable, distributed work coordination.

It just does it 100 units at a time instead of one.

The $120 staging bill was the best money we never spent. It forced a conversation about the pipeline economics before production made that conversation much more expensive.

Share
Helpful?

Continue reading