KAM.cf In Rspamd: 3,668 Rules As Native Lua

KAM.cf is north of 10,000 lines of SpamAssassin rules, and on a busy mail server SpamAssassin will happily burn 2 to 4 CPU cores chewing through them in Perl while a message sits in the queue going nowhere and a user files a ticket about it. Rspamd does the same work in C and Lua and barely notices. So the obvious move is to run KAM.cf on Rspamd. The catch nobody tells the juniors: the obvious way to do that is also the slow, leaky way, and you won’t find out until the quarter’s most interesting phishing wave sails through.

This is the story of rspamd-kam-rules, a converter that takes Kevin McGrail’s famous SpamAssassin ruleset and turns it into a native Rspamd Lua plugin. No Perl. No compatibility shim re-parsing five digits of ruleset on every reload. Just 3,668 rules that actually resolve against the symbols your Rspamd already has, with the dead weight stripped out before it ever hits production. The output is two files: a tiny kam.lua runtime (about 18 KB) that Rspamd loads once, and a kam_rules.map data file (about 800 KB) it compiles into a single native Hyperscan database at config load.

Table of Contents

Two spam fighters, one ruleset

SpamAssassin is the elder, and KAM.cf is one of the best community rulesets ever written for it: thousands of patterns hunting phishing, malware droppers, and the endless tide of “RE: your invoice” that turns out to be an XLSM with a macro that wants to be friends with your domain controller. It’s maintained by Kevin A. McGrail with help from Joe Quinn, Karsten Bräckelmann, Bill Cole, and Giovanni Bechis. It’s genuinely good. It’s also written in SpamAssassin’s dialect, which means it expects SpamAssassin to run it.

Rspamd is the younger, faster animal: event-driven C core, Lua for the logic, and a regexp engine that compiles every pattern once at startup and scans each message in a single pass with Hyperscan. A SpamAssassin box doing KAM.cf might average 800 ms per message under load; the same logic on Rspamd lands closer to 40 ms. If you want the full tour of how Rspamd decides what’s spam, I covered that in how modern spam filtering works. This piece is about feeding it KAM.cf without doing something you’ll regret in front of the change advisory board.

The trap: just point Rspamd at the .cf file

Rspamd ships a spamassassin module. Drop your .cf into the config, point the module at it, reload. And it works. Sort of. Here’s the part the tutorials skip, usually because the person writing the tutorial never ran it past week two.

It parses the entire ruleset on every config load — all 10,000-plus lines — including the hundreds of rules it can’t run: eval: calls into SpamAssassin plugins Rspamd never implemented, metas referencing symbols from plugins you don’t have.

But the one that pages you later is subtler: symbol names don’t match. SpamAssassin calls a passing SPF check SPF_PASS; Rspamd calls it R_SPF_ALLOW. A KAM.cf meta like (FREEMAIL_FROM && SPF_PASS && BITCOIN_ADDR) references a symbol your Rspamd never raises under that name, so the meta silently never fires. It compiles fine. It costs you CPU at startup. It catches nothing. No error. Just a rule that looks active and is functionally dead, which is the worst kind, because you’ll trust it.

You find out the hard way, of course. A spam wave gets through, you grep the logs, and the rule you were counting on never raised a single hit in three months. The user who reported it gets a very calm email. The tutorial author does not.

The fix: transpile, don’t interpret

rspamd-kam-rules takes the other road. It reads the KAM channel — KAM.cf plus the companion KAM_redirectors.cf and nonKAMrules.cf, some 11,400 lines across the three files — with a proper parser, rewrites SA symbol names to their Rspamd equivalents, throws away everything that can never fire, and emits the runtime plus the map. The map carries a SHA-256 of the exact KAM.cf it came from, and rule names are re-validated on load so a tampered map can inject no Lua and no rogue symbol.

The current run converts 3,668 rules: 1,269 body, 1,186 header, 862 meta, 211 URI, 82 rawbody, 54 MIME header, and 4 full-message. Another 67 meta rules get dropped on the floor, on purpose, because they depend on symbols that cannot exist on the target. More on that fight in a minute — it used to be a much bigger number.

Conversion pipeline from KAM.cf through the converter to native Rspamd kam.lua — KAM.cf goes in, the converter maps symbols and prunes dead metas, and a thin kam.lua plus its kam_rules.map come out.

The parser handles the bits that bite, so you don’t have to care: version-aware ifplugin / else / endif gates evaluated with SpamAssassin 4.0 semantics, Perl’s entire zoo of regex delimiters including the paired-bracket kind, replace_tag templates resolved to a fixpoint, and tflags multiple maxhits=N preserved so a rule that hits five times scores five times. Dull, load-bearing stuff. The kind that’s invisible right up until it isn’t.

The symbol map: speaking Rspamd’s language

Here’s where the converter earns its keep. It carries a translation table of 55 remaps from SpamAssassin symbol names to Rspamd native ones: SPF_PASS to R_SPF_ALLOW, DKIM_VALID to R_DKIM_ALLOW, the whole URIBL_* family onto Rspamd’s SURBL and DBL symbols. The near-miss remaps are flagged as approximations right in the source, because pretending a fuzzy mapping is exact is how you end up explaining false positives to management. Map SPF_FAIL to R_SPF_FAIL and the meta fires; leave it unmapped and it sits there forever, compiled and useless.

At runtime the plugin calls task:has_symbol() on the mapped Rspamd name, reading the result your existing modules already computed — it isn’t reimplementing SPF, it’s wiring KAM’s logic into the checks Rspamd already runs. And for a handful of SpamAssassin eval: atoms Rspamd has no symbol for — HTML_MESSAGE, three body-length thresholds, a tag-presence check — the runtime implements them natively in Lua instead of declaring them impossible.

Pruning the dead: 172 dropped metas down to 67

A meta depends on other symbols, which may be regex rules, external Rspamd symbols, or other metas. So the converter resolves the whole dependency graph to a fixpoint: anything that transitively bottoms out in a real symbol ships, anything that doesn’t gets dropped — with its missing dependencies recorded in the report so you can see exactly why. Shipping the rest would mean compiling expression trees that can never be true.

An earlier build dropped 172 metas. The current one drops 67, and the difference wasn’t luck — it was an audit of every dropped rule to see whether the missing symbol was genuinely impossible or merely unmapped. The recovery came in layers: more symbol remaps; 185 supporting rule definitions lifted from SpamAssassin’s own rule source (Apache-2.0, provenance recorded in the map header, because licence hygiene is cheaper than lawyers); the native Lua evals above; and teaching the build to read the companion channel files instead of KAM.cf alone.

The 67 that remain are the irreducible core — rules wanting headers only a commercial appliance stamps on mail, upstream-internal DNS lists. Those stay dead, and they stay dead visibly, in the report, instead of quietly costing CPU while catching nothing.

You feed the target’s symbol set in via external-symbols.txt — the shipped set is the stock rspamd 4.1 inventory, dumped from the /symbols endpoint. Change stacks, regenerate the dump, rebuild. The output adapts to the box it’s actually going to run on, which is more than the SA module ever does for you.

Why the Lua is fast: register once, scan once

At config load, the plugin registers each pattern with rspamd_config:register_regexp, tagged by message slice — body, rawbody, URL, header and friends. Every pattern of a given type gets compiled into one combined Hyperscan database, and a message is scanned once to find every match in a single pass. Compare that to SpamAssassin’s per-rule, per-message Perl loop that pins your cores. Same patterns, completely different cost model.

Metas compile through rspamd_expression.create into expression trees that short-circuit, with results cached per-task. There’s even a recursion guard, so a pathological meta cycle scores zero instead of taking down your scanner at 4 a.m., which is my time, not yours.

Installing it, and the auto-update that won’t wake you up

Install is two files, the runtime plus the rule map:

# the thin runtime goes in plugins.d
sudo wget -O /etc/rspamd/plugins.d/kam.lua \
  https://raw.githubusercontent.com/myguard-labs/rspamd-kam-rules/main/dist/kam.lua

# the rule map, at the path the plugin reads at startup
sudo wget -O /etc/rspamd/kam_rules.map \
  https://raw.githubusercontent.com/myguard-labs/rspamd-kam-rules/main/dist/kam_rules.map

sudo chmod 0644 /etc/rspamd/plugins.d/kam.lua /etc/rspamd/kam_rules.map

Why download the map by hand if the plugin updates itself? Because that first copy is a seed, and you only plant it once. Rspamd registers the symbols and builds the Hyperscan database at config load, reading the map straight off disk — an empty path means a plugin that loads zero rules and feels very smug about it. After that first start you never touch the map again.

A file in plugins.d stays inert until its module is configured, so add this once to /etc/rspamd/rspamd.conf.local:

kam {
    enabled = true;
    # map_path defaults to /etc/rspamd/kam_rules.map
}

Ready-made snippets ship under examples/ in the repo, including groups.conf if you’d rather cap how hard the whole ruleset can push one message:

group "KAM" {
    max_score = 100;   # ceiling for the whole ruleset
}

Then validate and reload:

sudo rspamadm configtest
sudo systemctl reload rspamd   # full reconfigure: re-runs plugin init, rebuilds the Hyperscan DB

Run rspamadm configtest first. Always. Someone reading this is going to skip it and reload straight into a syntax error at 2 a.m. with the queue backing up, and then that someone is going to call me.

Updates, build side. The rebuild runs in GitHub Actions daily at 3 a.m. UTC, and it starts with a single DNS query, not a download. The KAM channel publishes its release serial as a TXT record — dig +short TXT 0.0.4.kam.sa-channels.mcgrail.com — and the build compares that against dist/kam.serial, the serial it last built from. Only when the channel serial is strictly newer does it fetch KAM.cf at all, and a SHA-256 of the download decides whether anything actually changed. One UDP packet a day instead of pulling 600 KB to compare hashes. Your grandfather called that good manners; I call it not being a bandwidth parasite.

Updates, your side. The plugin keeps itself current: rspamd polls the published map every five minutes and, when it changes, atomically stages the new copy in its own data directory, /var/lib/rspamd — the one place the dropped-privilege rspamd user can actually write, unlike root-owned /etc/rspamd. But symbols, scores, and the Hyperscan database are built at config load, so a staged map goes live only on the next systemctl reload rspamd.

Your only moving part is a dumb daily reload timer with no fetch logic in it at all. The reload is graceful — workers drain, Bayes lives in Redis — so at worst it’s a no-op. If your mail host can’t resolve raw.githubusercontent.com, set map_url = "" and pull the map yourself.

One honest caveat. The “800 ms versus 40 ms” number compares SpamAssassin-the-Perl-daemon against Rspamd, not Rspamd’s SA module against this converter — those two run in the same regexp cache and are in the same ballpark on raw speed. What the converter buys you is correctness and hygiene: symbols that resolve, dead metas pruned, one reviewable file pinned to a known KAM.cf hash. Anyone selling you a 20x speedup from the converter specifically is selling you the wrong number, and you should always check who’s holding the stopwatch.

The converter is MIT-licensed. The generated output is a derivative work of KAM.cf and inherits its Apache-2.0 license and authorship — the credits travel inside the map header itself, so attribution stays with McGrail and the SpamAssassin crew wherever the file ends up.

Frequently asked questions

What is KAM.cf?

KAM.cf is a large community SpamAssassin ruleset maintained by Kevin A. McGrail, with contributions from Joe Quinn, Karsten Brackelmann, Bill Cole, and Giovanni Bechis. Together with its companion channel files it is over 11,000 lines of patterns for catching phishing, malware, and scam email, distributed under Apache-2.0 from mcgrail.com. It is one of the most widely used add-on rulesets for SpamAssassin.

Why not just load KAM.cf with Rspamd’s spamassassin module?

You can, and it works, but Rspamd parses the whole 10,000-plus-line file on every config load, carries hundreds of rules it cannot run, and never remaps SpamAssassin symbol names like SPF_PASS to Rspamd’s R_SPF_ALLOW. Meta rules that reference unmapped symbols compile but never fire. The converter pre-resolves all of that, maps symbols, drops dead rules, and emits a thin Lua runtime plus a data map instead.

How many rules does the converter actually produce?

The current run converts 3,668 rules from the KAM channel files: 1,269 body, 1,186 header, 862 meta, 211 URI, 82 rawbody, 54 MIME header, and 4 full-message rules. A further 67 meta rules are deliberately dropped because they depend on symbols the target Rspamd cannot provide, such as commercial-appliance headers and upstream-internal DNS lists.

How do I install the generated kam.lua?

Download dist/kam.lua into /etc/rspamd/plugins.d/kam.lua and dist/kam_rules.map into /etc/rspamd/kam_rules.map, enable the kam module in rspamd.conf.local, run rspamadm configtest to validate, then reload with systemctl reload rspamd. Both files must be present before Rspamd starts, because it reads the map and builds the rule set at config load. Always run configtest before reloading a production mail filter.

Does it update automatically when KAM.cf changes?

The build does. A GitHub Actions workflow runs daily at 3am UTC and checks the KAM channel’s release serial with a single DNS TXT lookup (dig +short TXT 0.0.4.kam.sa-channels.mcgrail.com), compared against the serial tracked in dist/kam.serial. Only when the serial is strictly newer does it download KAM.cf, and a SHA-256 check then decides whether a rebuild is really needed. On your server the plugin updates itself: rspamd polls the published map every map_watch_interval (5 min default) and stages a fresh copy under /var/lib/rspamd when it changes. Because symbols and the Hyperscan database are built only at config load, the staged map goes live on the next systemctl reload rspamd — so a dumb daily reload timer is all you need, no fetch script. Set map_url to an empty string to disable the poll and pull the map yourself instead.

Is the converter faster than running SpamAssassin?

Running KAM.cf inside Rspamd is far faster than running SpamAssassin’s Perl daemon, often roughly 40ms per message versus several hundred. The converter’s specific advantage over Rspamd’s built-in SA compatibility module is correctness and hygiene rather than raw speed: mapped symbols, pruned dead metas, natively implemented eval checks, and a single auditable, version-pinned plugin.

KAM.cf in Rspamd: 3,668 SpamAssassin Rules, Native Lua, No Perl