A single razor-check costs you about 367 milliseconds of wall-clock time. dccproc costs four. Now picture those calls running one after another inside your rspamd worker, the same single-threaded event loop that’s supposed to be scanning the next thousand messages. You just turned a 4-millisecond spam check into a third of a second of dead air, and every message behind it in the queue waited for the privilege. Do that under real load and you don’t have a mail scanner anymore. You have a very expensive way to make Postfix time out.
This is the trap nobody warns the new sysadmin about. The collaborative-filtering networks that catch the spam your Bayes classifier misses, DCC, Razor and Pyzor, all ship as command-line tools that block while they talk to a remote server. rspamd is asynchronous to its bones. Bolt one onto the other naively and you get the worst of both: a fast scanner held hostage by a slow shell-out. So I built a small thing to fix it, and this is the story of why it looks the way it does.
The thing is rspamd-dcc-razor-pyzor: a standalone Docker backend that runs all three networks in-process in one Go binary and answers rspamd over one HTTP endpoint, plus the Lua plugin that talks to it. The image runs no rspamd of its own. Yours stays exactly where it is.
Why rspamd needs a sidecar for Razor and Pyzor
rspamd has a built-in DCC module. Good. It has nothing for Razor, and nothing for Pyzor. That’s the gap.
You could close it the obvious way. Write a tiny Lua rule that shells out to razor-check when a message arrives. It even works in testing, which is exactly how it lures you in. Then you ship it, traffic ramps, and rspamd’s worker process, the one event loop doing all the scanning, parks itself on a blocking read() waiting for a Razor server in another country to answer. While it waits it does nothing else. Not your message, not anyone else’s. The OS scheduler isn’t going to save you here; the worker chose to block, and blocking is blocking.
rspamd’s whole design is the opposite of that. It fires DNS lookups, RBL queries, and module checks concurrently and stitches the answers back together as they land. The moment you introduce one synchronous shell-out, you’ve punched a hole straight through that model. The fix is old and boring: don’t do slow work in the hot path. Move the CLIs to their own process, put an HTTP boundary between them and the scanner, and let the plugin make a normal non-blocking request like every other rspamd module already does.
That’s the entire reason this project exists. One request from the plugin, three networks queried behind it, and the event loop never stops turning. rspamd logs it as a “slow asynchronous rule,” which sounds alarming and is actually the system telling you it’s working: slow, yes, but asynchronous, so nothing waits on it.
DCC, Razor and Pyzor, and what each one actually knows
If you’ve never met these three, here’s the short version. They’re collaborative filtering networks, and they all answer a different flavour of the same question: “have a lot of other people seen this exact message too?” Spam is bulk by nature. The thing that makes it profitable, sending one message to ten million inboxes, is also the thing that gives it away.
DCC (Distributed Checksum Clearinghouse) counts. It hashes the fuzzy “bulk” body of a message and asks the network how many times that checksum has been reported. A personal email between two humans has a count of one. A newsletter blasted to a million people has a count in the millions. DCC doesn’t say “this is spam,” it says “this is bulk,” and bulk plus other signals is what gets you a verdict. It’s also the fastest of the three by a mile, a single lean UDP round-trip, so it is typically among the quickest of the three when a server is close.
Razor (Vipul’s Razor) is signatures. It computes a fuzzy fingerprint of the message and checks it against a collaboratively maintained catalogue of known spam. People report spam, the signature lands in the database, and the next copy gets caught. It’s the slowest of the trio in my benchmarks, the slowest of the three, because a Razor check is a multi-step conversation — server discovery, greeting, server state, then the signature lookup.
Pyzor is the same idea wearing different clothes: a digest of the message, a public server that counts sightings and whitelist hits — and in practice the quickest of the three to answer. Where Razor leans on signature matching, Pyzor leans on a simpler digest and a community count.
None of these replaces your Bayes classifier, your neural net, or your RBLs. They’re a separate layer of evidence, and they catch a specific thing the statistical filters are weakest on: brand-new spam that’s textually clean but going out in enormous volume. If you want the full picture of where these sit in a modern stack, I wrote the whole map up in Rspamd Explained. This post is about wiring three of those layers in without setting your scanner on fire.
One Go binary, three protocols, zero forks

Open the box and it’s deliberately unexciting inside — and it got a lot smaller. The first version was a Python shim, spamcheck_shim.py, that forked the perl razor-check, the python pyzor and the C dccproc once per message. It worked, but it dragged a perl + python + dcc toolchain and an s6 supervisor into the image, and every check paid an interpreter start (plus a set-UID dccproc).
So all three clients were rewritten from scratch in Go and linked straight into the backend, which is now a single static binary called gozer:
- gdcc — a clean-room Go DCC client; computes the message checksums byte-for-byte identically to
dccproc. - gazor — a Go Razor2 client; speaks the discovery/signature protocol of the perl
razor-agents. - gyzor — a Go Pyzor client; reproduces the SHA1 digest of the python client.
Each speaks its wire protocol byte-for-byte compatibly with the reference perl/python/C client — every one is gated by parity tests against the real razor, pyzor and dccproc in its own CI — so the servers see identical fingerprints and the switch is invisible on the wire. gozer imports all three, listens on :8077 as a non-root user, and on each /check runs the three networks concurrently in-process: no subprocess, no fork, no stdin pipe. A single /check still costs you roughly the slowest backend, not the sum. Razor sets the pace; DCC and Pyzor finish in its shadow.
What disappeared is the entire scaffolding. No s6 supervisor, no shell, no Perl or Python runtime, no dcc package and no set-UID dccproc, no dccifd daemon (gozer’s DCC client talks to the servers directly). The image is FROM a distroless/static base with gozer as the entrypoint, and it dropped from about 268 MB to roughly 19 MB — with no interpreter left in the message-parsing path, and nothing running as root over attacker-controlled bytes.
The design rule I care most about here is best-effort. Every backend can fail on its own without taking the others, or the container, down with it. A dead network degrades your scoring; it doesn’t degrade your availability. If Razor’s servers fall off the internet at 3 a.m., gozer shrugs, returns whatever DCC and Pyzor had, and the container’s healthcheck stays green because it only depends on gozer’s own /health (probed by gozer health, since the distroless image ships no shell or curl). Your mail keeps flowing. You find out about the Razor outage from your dashboards in the morning, not from a pager and an inbox full of bounces.
One more thing the architecture buys you. Because the backend is a separate container, your rspamd image stays clean — and so does this one. No Perl runtime for Razor, no Python for Pyzor, no set-UID DCC binaries living anywhere near your scanner: all three are just Go code linked into one static binary. The only thing crossing the boundary is an HTTP request you control.
The message never touches disk
This is the part I’d actually argue with you about if you tried to talk me out of it. gozer keeps the message in memory and computes every checksum in-process. Nothing gets written to a temp file. Not in /tmp, not in a tmpfs, not anywhere.
People assume mail-scanning tools spool to disk because that’s how it was done back when procmail ran the world, and a lot of them still do. Every spool file is a tiny liability: a window where the plaintext of someone’s email sits on a filesystem, waiting for a backup job to copy it somewhere, a log to mention it, or a forensic tool to find it after you thought it was deleted. The cleanest way to never leak a temp file is to never write one.
The cache follows the same rule. It stores sha256(body) → verdict and nothing else. The hash, not the body. So a cache hit tells you “I’ve seen this exact message and it scored bulk” without keeping a single byte of the message around to prove it. Point the cache at Redis instead of memory and that property holds: the shared cache stores hashes too, never content.
Here’s the bit people get wrong, so let me say it flat. A tmpfs overlay would do absolutely nothing for performance here. There is no per-message disk write to accelerate, because there is no per-message disk write. The latency is network round-trips to DCC, Razor and Pyzor, full stop. If someone tells you to “put the spool on tmpfs to speed it up,” they’re optimising a write that doesn’t exist.
The only thing that ever leaves the container is what collaborative filtering fundamentally needs: content fingerprints. DCC checksums, Razor signatures, Pyzor digests. Not the message. The networks get a hash of what your mail looks like, never the mail itself, and on a spam report they get a submission, which is the entire point of reporting.
Hardening, because the defaults will get you fired
Every default in container-land is tuned for “does the demo work,” not “will this survive a hostile internet.” So the bundled compose ships the opposite of the defaults, and you should keep it that way.
gozer runs non-root. It has bounded concurrency (GOZER_MAX_CONCURRENT defaults to 8) so a flood of requests can’t fork-bomb the box by spawning unbounded CLI processes. Every POST is token-authenticated: send the secret as Authorization: Bearer <token> or X-DRP-Token: <token>, and get a 401 if it’s wrong. Here’s the detail I’m proud of: if you forget to set a token at all, gozer doesn’t fail open. It returns 503 to every POST and refuses to do anything. A spam backend that accepts unauthenticated “report this as spam” calls from anywhere on your network is a poisoning vector with a bow on it. This one won’t even start being useful until you’ve given it a secret.
The compose file runs the container read-only, with cap_drop: ALL, no-new-privileges, and no published host port. Read that last one twice. The backend is reachable only by containers on the same Docker network, by service name. It is not bound to a host interface, which means it is not one iptables mistake away from the public internet. If you’ve ever found a Redis or an Elasticsearch open to the world because someone published a port “just to test,” you know exactly why this matters. The test port is forever.
None of this is exotic. It’s the baseline every container that touches untrusted input should run with. The short version: the defaults are a loaded gun pointed at your job, and turning the safety on costs you four lines of YAML.
Performance and the cache that earns its keep
Let’s talk numbers, because “it’s fast” is a thing salespeople say. Measured from the build host against the public servers (anonymous, your mileage varies with network distance): Pyzor about 50 ms, DCC about 170 ms, Razor about one second — Razor’s multi-step discovery handshake dominates. gozer queries all three concurrently, in-process, so a cold /check costs you roughly the Razor figure. As an asynchronous rspamd rule, that latency never blocks the scanner; it just means the symbol arrives a few hundred milliseconds into the scan instead of instantly.
But here’s where it gets good, and it gets good precisely because spam is bulk. The same message hits your server over and over: the same campaign, the same body, a thousand recipients in an hour. gozer caches verdicts keyed on sha256(body), with a 300-second TTL (GOZER_CACHE_TTL) and a 4096-entry LRU (GOZER_CACHE_SIZE) by default. The first copy of a bulk body pays the full cold cost (about a second here, Razor-bound). Every copy after it, within the TTL, comes back in about 0.4 milliseconds. No fork, no CLI, no network round-trip, no concurrency slot consumed. You can watch it happen: a cache hit comes back with X-DRP-Cache: hit in the response headers.
That’s a roughly 2000-fold speedup on exactly the traffic you get the most of. Bulk mail is repetitive by definition, which is the same property that makes it spam and the same property that makes it cacheable. The thing that makes the spammer’s life cheap makes your scanner’s life cheap too. I find that pleasing.
Running more than one scanner? Point GOZER_REDIS_URL at a shared Redis or Valkey (something like redis://valkey:6379/5) and every scanner shares one cache. The first box to see a campaign warms the cache for all of them. The /report and /revoke calls are never cached, for the obvious reason: reporting is a side effect, and you don’t want to silently swallow the second report of a message because you “already saw” the first.
Wiring it up: backend, plugin, and Dovecot feedback
Three pieces. The backend container, the rspamd plugin, and the optional Dovecot reporting glue. Take them in order.
The backend
gozer refuses every POST until it has a token, and it isn’t published to the host, so the only sane way to run it is with the bundled compose:
cd docker
mkdir -p secrets && openssl rand -hex 32 > secrets/drp_token.txt
docker compose up -d
Containers on the same Docker network now reach it at http://rspamd-drp:8077. Image’s on Docker Hub as eilandert/rspamd-dcc-razor-pyzor if you’d rather pull than build.
Test it from another container on the network (or the host, if you published the port) by POSTing a raw message. --data-binary keeps the bytes intact, since the fingerprints are computed over them:
TOKEN=$(cat docker/secrets/drp_token.txt)
# scan a message
curl -s --data-binary @message.eml \
-H "Authorization: Bearer $TOKEN" http://rspamd-drp:8077/check
# {"dcc":{"action":"unknown","bulk":null},"razor":{"hit":false},"pyzor":{"count":42,"wl":0}}
# user feedback — X-DRP-Token works in place of the Bearer header
curl -s --data-binary @spam.eml -H "X-DRP-Token: $TOKEN" http://rspamd-drp:8077/report
curl -s --data-binary @ham.eml -H "X-DRP-Token: $TOKEN" http://rspamd-drp:8077/revoke
The plugin
The Lua plugin is not baked into the backend image, on purpose: it belongs in your rspamd, not in this one. Drop it in and tell it where the backend lives:
cp rspamd/plugins/dcc_razor_pyzor.lua /etc/rspamd/plugins/
cp rspamd/local.d/dcc_razor_pyzor.conf /etc/rspamd/local.d/
cp rspamd/local.d/groups.conf /etc/rspamd/local.d/
echo 'dofile("/etc/rspamd/plugins/dcc_razor_pyzor.lua")' >> /etc/rspamd/rspamd.local.lua
Then set the backend URL and the same token in local.d/dcc_razor_pyzor.conf:
url = "http://rspamd-drp:8077/check";
token = "the-shared-secret";
Restart rspamd and you get three new symbols, scored in groups.conf so you can tune them to taste: DRP_DCC_BULK when DCC says the body is bulk, DRP_RAZOR on a Razor signature match, and DRP_PYZOR when Pyzor sightings clear the threshold.
One gotcha that will eat your afternoon if you skip it. rspamd resolves URLs through its own configured resolver, not the system one. If you’ve pointed rspamd at an RBL-only unbound that can’t see Docker service names, then rspamd-drp won’t resolve and you’ll get nothing, with a confusing silence rather than a clean error. The fix is to put the backend’s IP address in url instead of the service name. It’s always DNS. It’s genuinely always DNS.
Dovecot feedback
/check is for scanning. The other two endpoints, /report and /revoke, are for human feedback: someone drags a message into Junk (that’s spam, report it) or rescues one back out (that’s ham, revoke it). This is how you train the networks on your users’ actual decisions, and it’s worth wiring up because real human corrections are the best signal you’ll ever get.
Sieve can’t speak HTTP, so a little wrapper called drp-report bridges the gap, triggered by imapsieve. The eilandert/dovecot image already bakes this in. On any other Dovecot host you copy three files, compile two sieve scripts, and drop the URL and token into an env file (sieve scrubs the environment, so it has to come from a file, not a shell variable). Move into Junk fires POST /report; move out of Junk fires POST /revoke. And drp-report always exits 0, on purpose, so a reporting hiccup never bounces a message or blocks the IMAP move. The day your DCC report fails, the worst that happens is one un-reported spam. Not a stuck mailbox. That tradeoff is the whole philosophy of the project in one design decision.
The packages underneath
The dcc, razor and pyzor binaries come from my own Debian packages on deb.myguard.nl; the apt repo and signing key are already baked into the eilandert/debian-base image, so the build just installs them. DCC isn’t in Debian proper for licence reasons (its redistribution terms don’t fit the DFSG), which is exactly why a self-built package that ships dccifd, dccproc and cdcc earns its place.
Where this fits with the rest of the spam stack
Collaborative filtering is one layer. It is not the layer. It answers “is this bulk” and “has the crowd flagged this,” and it’s brilliant at catching high-volume campaigns the instant they start. It’s nearly useless against a targeted message sent to you and only you, because there’s no crowd to have seen it yet.
That’s why it pairs so well with rule-based scoring, which catches the textual tells a single message gives off regardless of volume. If you’re running rspamd seriously, you probably want both, and the other half of my setup is rspamd-kam-rules, a native-Lua converter that brings 3,200-odd SpamAssassin KAM.cf rules into rspamd without dragging the Perl spamassassin module along for the ride. I wrote that one up too, in KAM.cf in Rspamd. Rules catch the content signature, DCC/Razor/Pyzor catch the volume, Bayes catches the statistical drift, and between them very little gets through. No single layer does it alone, and anyone selling you a one-trick spam filter is selling you a future outage.
Everything’s open: source on GitHub, image on Docker Hub, both tracked in the dockerized monorepo. The full list of repos and images lives on the where to find us page.
Frequently asked questions
Do I need to run rspamd inside this container?
No, and that’s the whole point. The image runs no rspamd of its own. Your rspamd stays in its own container or on its own host; you drop the shipped Lua plugin into it and point the plugin’s url at this backend over HTTP. The backend is a sidecar that does one job, scoring mail against DCC, Razor and Pyzor, and answers your existing scanner over port 8077.
Will this slow down my mail scanning?
Not in a way that blocks anything. The plugin makes a non-blocking asynchronous request, so the rspamd event loop never waits on it. A cold check costs roughly the slowest backend (Razor, about a second — its discovery handshake dominates) because all three run concurrently in-process, but that latency runs alongside the rest of the scan rather than holding it up. And because bulk mail repeats, the verdict cache turns most checks into an about 0.4 ms lookup with an X-DRP-Cache: hit header.
Is my email content sent to these networks or written to disk?
Neither. The raw message never leaves the container and never touches disk. gozer holds the message in memory and computes every checksum in-process (Go, no subprocess), so there is no temp file. The only data that leaves are content fingerprints: DCC checksums, Razor signatures and Pyzor digests, which is exactly what collaborative filtering needs to work. The cache stores only a sha256 hash of the body, never the body, whether it is in memory or Redis.
What happens if DCC, Razor or Pyzor is down?
Each backend is best-effort and independent. If one network is unreachable it simply doesn’t contribute a score, and the other two still answer. The container’s healthcheck depends only on gozer’s own /health endpoint, so a dead upstream degrades your scoring without affecting availability. Your mail keeps flowing; you just lose one source of evidence until the network comes back.
How do I report spam and ham back from my users?
Use the /report and /revoke endpoints, wired through Dovecot with imapsieve. When a user moves a message into Junk, a sieve script calls POST /report to flag it as spam across all three networks; moving it back out calls POST /revoke to mark it ham (Razor and Pyzor support un-reporting; DCC does not). The eilandert/dovecot image bakes the glue in, and the drp-report wrapper always exits 0 so a failed report never blocks a mailbox move.
Why not just use the rspamd built-in DCC module and skip this?
If you only want DCC, you can. rspamd ships a DCC module and it’s fine. The reason this project exists is Razor and Pyzor, which rspamd has no native support for, and which you cannot safely shell out to from inside the worker without blocking the event loop. This backend runs all three out-of-process behind one async HTTP call, so you get the two missing networks without trading away the responsiveness that makes rspamd worth running.
Can multiple rspamd instances share one backend?
Yes. Point each scanner’s plugin at the same backend URL, and set GOZER_REDIS_URL on the backend to a shared Redis or Valkey so all of them share one verdict cache. The first instance to see a campaign warms the cache for the rest, which matters a lot when you’re scanning the same bulk run across several nodes.