KAM.cf in Rspamd: 3,200 SpamAssassin Rules, Native Lua, No Perl

KAM.cf is roughly 6,500 lines of SpamAssassin rules, and on a busy mail server SpamAssassin will happily burn 2 to 4 CPU cores chewing through them in Perl while a message sits in the queue going nowhere. Rspamd does the same work in C and Lua and barely notices. So the obvious move is to run KAM.cf on Rspamd. The catch nobody tells the juniors: the obvious way to do that is also the slow, leaky way.

This is the story of rspamd-kam-rules, a small converter that takes Kevin McGrail’s famous SpamAssassin ruleset and turns it into a single native Rspamd Lua plugin. No Perl. No compatibility shim parsing 6,500 lines on every reload. Just 3,248 rules that actually resolve against the symbols your Rspamd already has, with the dead weight stripped out before it ever hits production.

Two spam fighters, one ruleset

SpamAssassin is the elder, and KAM.cf is one of the best community rulesets ever written for it: 3,000-plus patterns hunting phishing, malware droppers, and the endless tide of “RE: your invoice” that turns out to be an XLSM with a macro that wants to be friends with your domain controller. It’s maintained by Kevin A. McGrail with help from Joe Quinn, Karsten Bräckelmann, Bill Cole, and Giovanni Bechis. It’s genuinely good. It’s also written in SpamAssassin’s dialect, which means it expects SpamAssassin to run it.

Rspamd is the younger, faster animal. Event-driven C core, Lua for the logic, and a regexp engine that compiles every pattern once at startup and scans each message in a single pass with Hyperscan. A SpamAssassin box doing KAM.cf might average 800 ms per message under load; the same logic on Rspamd lands closer to 40 ms. That’s the whole reason people want KAM.cf on Rspamd. If you want the full tour of how Rspamd decides what’s spam, I covered that in how modern spam filtering works. This piece is about feeding it KAM.cf without doing something you’ll regret.

The trap: just point Rspamd at the .cf file

Rspamd ships a spamassassin module. Drop your .cf into the config, point the module at it, reload. And it works. Sort of. Here’s the part the tutorials skip.

It parses the entire ruleset on every config load, all 6,500 lines, including the hundreds of rules it can’t run: eval: calls into SpamAssassin plugins Rspamd never implemented, askdns lookups it handles its own way, metas referencing symbols from plugins you don’t have. But the one that pages you later is subtler: symbol names don’t match. SpamAssassin calls a passing SPF check SPF_PASS; Rspamd calls it R_SPF_ALLOW. SpamAssassin says DKIM_VALID; Rspamd says R_DKIM_ALLOW. A KAM.cf meta like (FREEMAIL_FROM && SPF_PASS && BITCOIN_ADDR) references a symbol your Rspamd never raises under that name, so the meta silently never fires. It compiles fine. It costs you CPU at startup. It catches nothing. No error. Just a rule that looks active and is functionally dead, which is the worst kind, because you’ll trust it.

You find out the hard way, of course. A spam wave gets through, you grep the logs, and the rule you were counting on never raised a single hit in three months.

The fix: transpile, don’t interpret

rspamd-kam-rules takes the other road. It reads KAM.cf with a proper parser, works out which rules can genuinely run on your Rspamd, rewrites the SA symbol names to their Rspamd equivalents, throws away everything that can never fire, and emits one self-contained kam.lua. That file is the only thing that touches your Rspamd. It carries a SHA-256 of the exact KAM.cf it came from, and it registers its regexes straight into Rspamd’s native regexp cache.

The current run converts 3,248 rules: 1,179 body, 1,116 header, 690 meta, 156 URI, 67 rawbody, 38 MIME header, and 2 full-message. Another 179 meta rules get dropped on the floor, on purpose, because they depend on symbols that don’t exist on the target. More on that fight in a minute. It’s the most interesting part.

Conversion pipeline from KAM.cf through the converter to native Rspamd kam.lua
KAM.cf goes in, the converter maps symbols and prunes dead metas, and one native kam.lua comes out.

Parsing KAM.cf properly means handling the bits that bite. Conditional blocks: it tracks the ifplugin / if !plugin(...) / else / endif stack and only emits rules whose entire chain is satisfiable, while an unbalanced endif raises an error instead of silently corrupting the set. Regex extraction: SpamAssassin patterns come as /pattern/flags or m{pattern}flags with arbitrary delimiters, so the extractor walks the string tracking escapes rather than naively splitting on a slash that a character class would laugh at. And replace_tag / replace_rules: KAM.cf writes /foo<LETTER>/ and defines <LETTER> once, so the converter resolves those tags to a fixpoint and expands them inline. It also preserves tflags multiple maxhits=N, so a rule that hits five times scores five times, exactly like the original.

The symbol map: speaking Rspamd’s language

Here’s where the converter earns its keep. It carries a translation table from SpamAssassin symbol names to Rspamd native ones: SPF_PASS to R_SPF_ALLOW, DKIM_VALID to R_DKIM_ALLOW, the whole URIBL_* family onto Rspamd’s SURBL and DBL symbols. A meta rule is only as good as the symbols it references. Map SPF_FAIL to R_SPF_FAIL and the meta fires; leave it as SPF_FAIL and it sits there forever, compiled and useless. The map is what turns a pile of inert metas into rules that actually vote.

At runtime the generated Lua keeps that table too. When a meta references a symbol that isn’t a KAM rule itself, the plugin calls task:has_symbol() on the mapped Rspamd name, reading the result your existing modules already computed. The converter isn’t reimplementing SPF. It’s wiring KAM’s logic into the checks Rspamd already runs.

Pruning the dead: meta resolution as a fixpoint

This is the bit I’d put on the whiteboard. A meta depends on other symbols, which might be regex rules, external Rspamd symbols, or other metas that depend on yet more symbols. You can’t judge a meta in isolation. You have to know whether everything it transitively depends on is reachable.

So the converter iterates to a fixpoint. Start with the known-good set: every regex rule that parsed, plus the external Rspamd symbols you fed it. Walk every meta; if all its dependencies are in the good set, mark it good and add it. Repeat until a pass adds nothing. Anything still unresolved depends on a symbol nobody can provide, and out it goes, with its missing dependencies recorded in the report so you can see exactly why. That’s the 179 dropped metas: not bugs, just rules referencing symbols this Rspamd doesn’t have. Shipping them would mean compiling expression trees that can never be true.

You feed the target’s symbol set in via two text files. external-symbols.txt is dumped straight from your production Rspamd’s /symbols endpoint, the real list of everything your instance can raise. unavailable-symbols.txt is the explicit blocklist of KAM symbols you know aren’t registered on your stack. Change stacks, regenerate the dump, rebuild. The output adapts to the box it’s actually going to run on, which is more than the SA module ever does for you.

Why the Lua is fast: register once, scan once

At config load, the plugin registers each pattern with rspamd_config:register_regexp, tagged by type: sabody for body, sarawbody for rawbody, message for full, url for URI, and the header variants for the rest. That tag tells Rspamd which slice of the message the pattern wants to see. Then the regexp cache does its thing: every pattern of a given type gets compiled into one combined Hyperscan database, and a message is scanned once to find every match in a single pass. Compare that to SpamAssassin’s per-rule, per-message Perl loop that pins your cores. Same patterns, completely different cost model. That’s why the converter bothers to register properly instead of calling string.match in a loop like a tutorial would.

Metas compile through rspamd_expression.create into proper expression trees that short-circuit. Results are cached per-task, so a symbol referenced by ten metas is evaluated once. There’s even a recursion guard: the cache entry is set to zero before evaluation starts, so a pathological meta cycle returns zero instead of blowing the Lua stack. Small detail, big difference between a bad rule scoring nothing and a bad rule taking down your scanner.

Installing it, and the auto-update that won’t wake you up

The build runs in GitHub Actions daily at 3 a.m. UTC. It checks the Last-Modified header on KAM.cf and does nothing if upstream hasn’t changed: no rebuild, no commit, no churn. When it has changed, it runs the tests, regenerates kam.lua, and commits the new file with its fresh SHA-256. Deploying is three boring lines:

sudo wget -O /etc/rspamd/plugins.d/kam.lua \
  https://raw.githubusercontent.com/eilandert/rspamd-kam-rules/main/dist/kam.lua

rspamadm configtest && systemctl restart rspamd

Run rspamadm configtest first. Always. I’ll say it twice because someone reading this is going to skip it and restart straight into a syntax error at 2 a.m. with the queue backing up. The generated Lua compiles cleanly, but you still validate before you reload, because the one time you don’t is the one time the download truncated on a full disk.

One honest caveat. The big “800 ms versus 40 ms” number compares SpamAssassin-the-Perl-daemon against Rspamd. It is not a benchmark of Rspamd’s own SA compatibility module against this converter; both run inside the same regexp cache, so they’re in the same ballpark on raw scan speed. What the converter buys you over the SA module is correctness and hygiene: symbols that resolve, dead metas pruned, one reviewable file pinned to a known KAM.cf hash, and no 6,500-line parse on every reload. Anyone selling you a 20x speedup from the converter specifically is selling you the wrong number, and you should always check who’s holding the stopwatch.

The converter is MIT-licensed. The generated kam.lua is a derivative work of KAM.cf and inherits its Apache-2.0 license and authorship, so the credit stays with McGrail and the SpamAssassin crew where it belongs.

Frequently asked questions

What is KAM.cf?

KAM.cf is a large community SpamAssassin ruleset maintained by Kevin A. McGrail, with contributions from Joe Quinn, Karsten Brackelmann, Bill Cole, and Giovanni Bechis. It contains over 3,000 patterns for catching phishing, malware, and scam email, and is distributed under Apache-2.0 from mcgrail.com. It is one of the most widely used add-on rulesets for SpamAssassin.

Why not just load KAM.cf with Rspamd’s spamassassin module?

You can, and it works, but Rspamd parses the whole 6,500-line file on every config load, carries hundreds of rules it cannot run, and never remaps SpamAssassin symbol names like SPF_PASS to Rspamd’s R_SPF_ALLOW. Meta rules that reference unmapped symbols compile but never fire. The converter pre-resolves all of that, maps symbols, drops dead rules, and emits one tight Lua file instead.

How many rules does the converter actually produce?

The current run converts 3,248 rules out of roughly 6,500 lines: 1,179 body, 1,116 header, 690 meta, 156 URI, 67 rawbody, 38 MIME header, and 2 full-message rules. A further 179 meta rules are deliberately dropped because they depend on symbols the target Rspamd does not provide.

How do I install the generated kam.lua?

Download dist/kam.lua from the GitHub repo into /etc/rspamd/plugins.d/kam.lua, run rspamadm configtest to validate, then restart Rspamd with systemctl restart rspamd. Always run configtest before reloading a production mail filter.

Does it update automatically when KAM.cf changes?

Yes. A GitHub Actions workflow runs daily at 3am UTC, checks the Last-Modified header on upstream KAM.cf, and only rebuilds and commits a new kam.lua when the source has actually changed. Each generated file carries the SHA-256 of the exact KAM.cf it was built from.

Is the converter faster than running SpamAssassin?

Running KAM.cf inside Rspamd is far faster than running SpamAssassin’s Perl daemon, often roughly 40ms per message versus several hundred. The converter’s specific advantage over Rspamd’s built-in SA compatibility module is correctness and hygiene rather than raw speed: mapped symbols, pruned dead metas, and a single auditable, version-pinned plugin.

Related reading

Anyway. Go dump your Rspamd’s /symbols endpoint into external-symbols.txt before you build anything, because a converter that doesn’t know what your server can do will cheerfully drop half of KAM.cf and never tell you it mattered.