Skip to content

Performance

ArdiQ’s edge isn’t a single number — it’s the balance. Because the worker loop and every Redis round-trip run in Rust, off the GIL, ArdiQ delivers near-top throughput at the lowest memory of any fast queue, which gives it the best throughput-to-memory ratio in the field.

The numbers below come from an apples-to-apples suite that runs six Redis-backed Python queues through the same scenarios on the same machine. It’s open and reproducible: python-task-queue-benchmarks.

The metric that captures the whole trade-off is how much work a queue does per megabyte of memory it holds. ArdiQ leads it.

QueueI/O (tasks/s per MB)CPU (tasks/s per MB)
ArdiQ 🦀2.911.2
arq2.910.5
Streaq1.97.2
Taskiq1.14.1
Celery1.40.3
Dramatiq1.70.2

ArdiQ does the most work per megabyte of any queue tested — roughly 2.7× Taskiq’s I/O efficiency. (arq matches it on I/O efficiency, but only by running ~10% slower; ArdiQ stays this light while sitting near the throughput ceiling.)

  • 1,000 tasks, 1 worker process, 10 concurrent tasks, 3 iterations — metrics reported as mean ± std.
  • Two scenarios:
    • io_task — a 100 ms sleep (asyncio.sleep for async libs, time.sleep for sync).
    • cpu_task — 1,000 SHA-256 hashes over 1 KiB inputs per task.
  • Machine: 8-core / 16-thread x86-64, 15 GB RAM, CPython 3.13, Redis 7.4.
  • Versions: ArdiQ 0.1.1, arq 0.28, Taskiq 0.12.4, Streaq 6.5.0, Celery 5.5.3, Dramatiq 2.1.0.

The io_task scenario is the realistic one for these libraries — async-native queues multiplex the 10 sleeps on one event loop. With 1,000 tasks at concurrency 10 and a 100 ms sleep, the theoretical ceiling is 100 tasks/s, so anything near it is essentially network-bound.

QueueThroughput (tasks/s)Memory
ArdiQ 🦀98.634 MB 🪶
Taskiq97.992 MB
Dramatiq93.556 MB
Streaq93.448 MB
arq87.730 MB
Celery71.751 MB

ArdiQ runs within ~1% of the fastest queue, practically hitting the network ceiling — at roughly a third of that queue’s memory. It’s the lightest of every queue that clears 90% of the ceiling.

The cpu_task scenario hashes under the GIL, so for every single-process queue the task body is serial on one core. What this measures is really per-task framing overhead (serialization, broker round-trips, bookkeeping) on top of the constant hashing cost.

QueueThroughput (tasks/s)Memory
ArdiQ 🦀389.334 MB 🪶
Taskiq388.194 MB
Streaq353.849 MB
arq317.630 MB
Celery13.852 MB
Dramatiq13.856 MB

Again ArdiQ is effectively tied for the lead on throughput, at a third of the leader’s memory. (Celery and Dramatiq sit far lower here because their thread pools serialize on the GIL for this workload — see the caveats.)

  • Best throughput-to-memory ratio — ArdiQ does the most work per megabyte of any queue in the suite.
  • 🪶 Lightest of the fast queues — ~34 MB, the lowest footprint of anything at its performance level. (arq is marginally lighter in absolute terms but meaningfully slower.)
  • 🏆 Among the fastest — within ~1% of the leader on both workloads.
  • 📈 Near the theoretical ceiling on I/O work — practically network-bound, with nothing lost to scheduling.
  • 🎯 Rock-steady — negligible variance run to run (low std).
  • CPU parallelism isn’t measured here. All libraries run one worker; to scale CPU work you’d run multiple worker processes (Celery’s prefork, Dramatiq’s --processes N, or several async workers). This suite measures per-task overhead, not multi-core scaling.
  • Each queue uses its idiomatic dispatch path and the same Redis instance, one at a time. Latency, raw per-iteration samples, and the full methodology — including how tail-latency is measured — live in the benchmark repo.