Rspamd Explained: Modern Spam Filtering (Bayes, Neural, RBLs)

Okay, gather round, ladies. Pour the rosé. Lock the cat out of the room. Today we are going to talk about the single most universally hated thing on the internet, and no, I don’t mean your ex’s new girlfriend on Instagram. I mean spam email. And we are going to learn how rspamd, the smartest, fastest, sassiest spam filter on the planet, kicks it straight in the inbox. We will cover Bayesian classifiers, neural networks, greylisting, RBLs, Pyzor, Razor, OLEFY, DCC, and a glorious romp through the entire history of spam, from a guy named Gary in 1978 all the way to that “Dear Beneficiary” email your aunt definitely clicked on. Yes, really. Buckle up.

Rspamd spam filter neural network classifying email — Rspamd is basically a tiny robot bouncer for your inbox, and it never gets tired.

Table of Contents

First things first, what even is rspamd?

Imagine you run a hotel. Every day, three thousand strangers show up at the front desk demanding a room. Some are lovely guests with reservations. Some are conmen with fake passports. Some are wearing trench coats and clearly trying to sell you knock-off Rolexes. You need a bouncer. A very, very smart bouncer who can read body language, sniff out fake IDs, remember faces, and ban the same scammer who tried it last Tuesday.

That is rspamd. It is a fast, open-source spam filter, written in C and Lua, that sits in front of your mail server (usually Postfix paired with Dovecot) and judges every single email that tries to walk through your door. It scores each one, decides if it is a guest or a grifter, and acts accordingly: deliver, tag, quarantine, or slam the door shut. And unlike its older, slower cousin SpamAssassin (we love you, Spammy, but it is 2026), rspamd does this in milliseconds, in parallel, with a gorgeous web interface and machine learning baked right in.

It was created in 2008 by Vsevolod Stakhov, a Russian engineer who looked at SpamAssassin and went, “love the idea, hate the speed, let me rewrite this from scratch.” The “r” stands for “rapid.” It is, and honestly, the man delivered.

A brief, slightly unhinged history of spam

Before we praise rspamd, we have to understand the enemy. And the enemy has a long, embarrassing résumé.

1978: The original sin, Gary Thuerk

The very first spam email was sent on 3 May 1978 by a marketing manager at Digital Equipment Corporation named Gary Thuerk. He blasted a product announcement to 393 people on ARPANET, the proto-internet that connected universities and the US Department of Defense. Reception was, shall we say, chilly. People were furious. One government rep called him personally to yell at him. Gary defended himself for the rest of his life. He claims he generated $13 million in sales. He also claims to be “the father of spam,” which is genuinely the worst LinkedIn bio of all time.

1993: The word “spam” is born

The term itself comes from a Monty Python sketch where Vikings sing “spam, spam, spam, spam” over and over until you cannot hear anything else. In 1993, a Usenet admin named Joel Furr used the word for an accidental mass-post on the alt.religion newsgroup. It stuck. So yes, every time you say “spam,” you are quoting British comedians in horned helmets. The internet is a beautiful place.

1994: The Canter and Siegel incident

Two American lawyers, Laurence Canter and Martha Siegel, posted an advertisement for their immigration services to every single Usenet newsgroup. Six thousand of them. People lost their minds. They received death threats, their fax machine was overwhelmed (people would dial it and leave the line open for hours, the original DDoS), and their internet provider’s servers crashed. They wrote a book about it afterwards. Of course they did.

2003: The CAN-SPAM Act

The US Congress finally passed the CAN-SPAM Act (Controlling the Assault of Non-Solicited Pornography And Marketing, yes that is the real name, government acronyms are a hate crime). It required commercial emails to have an unsubscribe link, a real sender address, and no deceptive subject lines. Critics called it “the You Can Spam Act” because it actually legalised a lot of bulk marketing as long as you ticked the boxes. Hard agree, honestly.

The lawsuits and the lawsuits and the lawsuits

Spammers have been sued into oblivion repeatedly. Some greatest hits:

MySpace v. Wallace (2008): Sanford Wallace, “the Spam King,” was hit with a $234 million judgment.
Facebook v. Wallace (2009): same guy, again, $711 million.
Facebook v. Guerbuez (2008): Adam Guerbuez fined $873 million for spamming Facebook. He never paid. He was banned from Facebook for life, which is honestly the better punishment.
Microsoft v. Soloway (2007): Robert Soloway, the original “Spam King,” went to prison for nearly four years.

None of this actually stopped spam. It just moved offshore. Today the spam industry is run from data centres in Eastern Europe, Southeast Asia, and increasingly from poorly secured smart fridges. Yes, your fridge. We will get to that.

What’s actually in your spam folder (and why it’s terrifying)

Spam is not just annoying anymore, it is the front line of every major cybercrime category. Here is the modern menu of threats:

Phishing: “Your Netflix account has been suspended, click here.” Designed to steal passwords. Targets your mum, succeeds depressingly often.
Spear phishing: Like phishing, but personalised. “Hi Sarah, attached is the Q2 invoice as discussed.” Sarah does not remember discussing it. Sarah clicks anyway.
Business Email Compromise (BEC): Hackers impersonate your CEO and tell finance to wire $80,000 to a vendor account. The FBI says BEC has caused over $50 billion in losses since 2013. With a B.
Malware droppers: Innocent-looking PDFs and Word docs full of macros that install ransomware. Pays the spammer’s mortgage.
Sextortion: “I hacked your webcam and recorded you, send Bitcoin.” Usually a complete bluff, but the volume is so high that even a 0.01% success rate is profitable.
Pump-and-dump: Fake stock tips designed to inflate penny-stock prices so the spammer can sell. Still alive and well in 2026.
The classic 419 scam: A Nigerian prince needs help moving $25 million. He will give you 30%. Spoiler: he will not.

This is why a good spam filter is not “a nice to have.” It is the seatbelt of the internet. Strap in.

Meet Spamhaus, the spam police

Before we get into rspamd’s clever tricks, you need to know about Spamhaus. Founded in 1998 in London by Steve Linford, Spamhaus is a non-profit, slightly mysterious, weirdly powerful organisation that maintains the world’s most-used blocklists. They track spammers, botnets, and hijacked IP addresses, and publish lists like:

SBL (Spamhaus Block List): known spam-source IPs.
XBL (Exploits Block List): hijacked machines and open proxies.
PBL (Policy Block List): IPs that should never be sending mail directly (think: residential broadband).
DBL (Domain Block List): spammy domains.
ZEN: the combined “give me everything” list. Used by basically every major mail provider.

Roughly 80% of the world’s email servers consult Spamhaus before delivering a message. If you end up on their list, your mail does not get delivered. Anywhere. To anyone. They are the bouncer’s bouncer.

Naturally, spammers hate them. In 2013, a Dutch hosting company called Cyberbunker launched what was at the time the largest DDoS attack in history, peaking at 300 Gbps, against Spamhaus. The internet itself noticeably slowed down. Spamhaus survived. The Cyberbunker guy went to prison. Iconic.

How rspamd thinks: the scoring system

Here is the secret sauce. Rspamd doesn’t say “spam!” or “not spam!”, it says “this email looks kind of spammy, give it a score.” Each suspicious trait adds (or subtracts) points. At the end, if the total crosses a threshold, the email is rejected, tagged, or quarantined.

     Email arrives
          │
          ▼
  ┌───────────────┐
  │  rspamd runs  │
  │  ~200 checks  │
  └───────┬───────┘
          │
          ▼
   ┌──────────────┐
   │  Final score │
   └──────┬───────┘
          │
   ┌──────┼───────┬─────────────┐
   ▼      ▼       ▼             ▼
 < 4   4 to 6   6 to 15        > 15
DELIVER  TAG    GREYLIST    REJECT
 ✅ 💌  📬 [SPAM]  ⏸️ wait    🚫 bye

The thresholds are configurable. Every check, DNS blacklist hit, weird headers, suspicious URL, Bayesian probability, contributes a few points. It is essentially a giant, fast, parallel jury deliberation. Beautiful, right?

The Bayesian classifier, rspamd’s nosy housekeeper

Rspamd Bayesian classifier machine learning illustration — Bayes: the OG of machine learning, still pulling its weight in 2026.

Let’s talk about Bayesian filtering, because it is genuinely beautiful maths dressed up as common sense. The idea was popularised for spam by Paul Graham in his 2002 essay “A Plan for Spam,” and the maths goes back to Reverend Thomas Bayes in the 1700s. Yes, a vicar invented modern spam filtering. Christianity wins.

Here is how it works, in girls-night-in language: every word in an email gets a probability. The word “Viagra” appears in 92% of spam and 1% of real mail, so its spam-score is very high. The word “meeting” appears in 5% of spam and 60% of real mail, so its spam-score is very low. Rspamd multiplies up all the probabilities (using a clever combining function called Robinson’s method, then Fisher’s chi-square test), and out pops a final number between 0 and 1: how likely this email is spam.

The genius part: it learns from you. Every time you mark an email as spam (or as not spam), rspamd updates its word statistics. Within a few hundred messages, it knows your inbox like a slightly creepy housekeeper. The word “rspamd” in your inbox? Probably ham (the official opposite of spam, naming geniuses we are). The word “rspamd” in a stranger’s inbox? Might be spam. Personalised. Adaptive. Gorgeous.

The neural network, the new kid with the cool sneakers

Bayes is great, but it only looks at words in isolation. Spammers caught on and started writing emails that are structurally weird but word-wise innocent. So rspamd added a neural network module (rspamd-neural) around version 1.7 and it has gotten dramatically smarter every release since.

The neural net doesn’t look at words, it looks at the scores of all the other rspamd checks. So if your email has SPF=pass, DKIM=pass, but the body has 14 links, a base64-encoded PDF, comes from a brand-new domain, and the Bayesian score is 0.6… each of those is an input feature, and the neural net learns the patterns that correlate with spam. Multi-layer perceptron under the hood, trained per-server on your own traffic.

This is the part where rspamd genuinely surpasses everything else on the market. It is not just “AI-washing”, it is real, locally-trained, explainable machine learning, and it catches the weird stuff Bayes misses. Think of it as Bayes’s gen-Z niece who notices everything.

Blacklists, whitelists, and the DNS magic show

Rspamd greylisting and RBL blacklist filtering concept — RBLs: real-time blocklists are the global gossip network of the email world.

Rspamd checks every incoming email’s sending IP against dozens of DNS-based block lists (DNSBLs), also called RBLs, “Real-time Blackhole Lists.” How? With a delightfully clever DNS trick: it reverses the IP, prepends it to the blocklist’s domain, and does a normal DNS lookup. If a record exists, the IP is listed.

Example: sender IP is 1.2.3.4, list is zen.spamhaus.org. Rspamd queries 4.3.2.1.zen.spamhaus.org. Gets a reply? Bad IP. No reply? Probably fine. The whole thing takes milliseconds.

The lists rspamd ships with out of the box include:

Spamhaus ZEN: the OG combined list.
SORBS: Spam and Open Relay Blocking System, run by Proofpoint.
SpamCop: community-reported spam sources.
Barracuda BRBL: corporate-grade list.
URIBL, SURBL: these list domains in URLs found inside spam bodies, not IPs. Brilliant trick.

And the opposite: whitelists (DNSWLs, like dnswl.org) list trusted senders, your bank, mailing lists, Google, etc., so rspamd can give them a bonus score and never flag them by accident.

Greylisting, the velvet-rope nightclub trick

This one is genius and so simple it makes me giggle every time. When a brand-new sender shows up that rspamd has never seen, it says: “Sorry, server’s busy, try again in 5 minutes.”

Real mail servers retry, that is literally what SMTP is designed to do. Five minutes later the email arrives, gets delivered, the sender is whitelisted forever. Spam-sending botnets, on the other hand, fire-and-forget, they have a million addresses to hit and no time to retry. They never come back. Bye, Felicia. This single trick blocks an enormous percentage of low-effort spam at essentially zero cost.

The hash-sharing networks: Pyzor, Razor, DCC

Now we get to my favourite chapter, collaborative anti-spam. Picture this: a million people get the same scammy email. If one person reports it, everyone benefits. That is the whole idea behind these three legendary hash-sharing networks, and rspamd talks to all of them.

Pyzor

Pyzor is a Python-based, open-source collaborative spam database. It takes the body of every email, calculates a “digest” (a normalised hash that survives small changes, typo fixes, name substitutions), and looks it up against a public server. If thousands of other people have already received the same body and reported it as spam, Pyzor returns a high “count” and rspamd adds points. Crowd-sourced trust. Honestly, beautiful.

Razor (Vipul’s Razor)

Created by a guy named Vipul Ved Prakash in the early 2000s and now run by Cloudmark, Razor does the same trick but with a smarter fingerprinting algorithm called “Nilsimsa.” Nilsimsa is a “locality-sensitive hash”, meaning emails that are almost the same produce almost the same hash. So spammers who randomise a few words to dodge filters? Razor still catches them. Take that, Brad.

DCC

DCC stands for Distributed Checksum Clearinghouses. Created in 2000 by Rhyolite Software, DCC is the biggest of the three by volume. It works by counting how many copies of an email have been seen across all participating servers, bulk mail is suspicious by definition. Even if every recipient consented (a legit newsletter), DCC will see “5 million copies sent, 0 reported as spam” and let it through. The wisdom of the crowd at internet scale.

OLEFY, the macro-malware detector

Last but not least: OLEFY. This one is a special-purpose tool that scans Microsoft Office attachments (the OLE file format, .doc, .xls, the old binary formats your boomer uncle still uses) for malicious macros. It is built on top of oletools and tells rspamd: “this Word document contains a VBA macro that auto-runs PowerShell on open.” That is a “burn the email” signal if I have ever seen one. Rspamd happily complies.

SPF, DKIM, DMARC, ARC, the alphabet soup of trust

Rspamd also runs all the modern email-authentication checks:

SPF (Sender Policy Framework): asks: “Is this IP actually allowed to send mail for this domain?” Domain owners publish a list in DNS. Rspamd checks it.
DKIM (DomainKeys Identified Mail): the sending server cryptographically signs the email. Rspamd verifies the signature against the public key in DNS. If it has been tampered with, BOOM, points.
DMARC: combines SPF and DKIM, and tells receivers what to do if both fail (“reject”, “quarantine”, “report”). It is the policy layer.
ARC (Authenticated Received Chain): for forwarded mail. Lets each hop add its own signature so the chain of trust survives. Beautifully nerdy.

Together, these four are the reason a spammer can no longer trivially forge “from: ceo@yourcompany.com.” Rspamd uses all of them. The result? Phishing that pretends to be from your bank gets caught at the door.

The web UI, and yes, it is genuinely cute

Most mail tools are aggressively ugly. SpamAssassin’s “interface” was a config file. Postfix is just logs. Rspamd, on the other hand, ships with a full web interface at http://localhost:11334 that shows real-time stats, lets you scan an email by pasting it into a textbox, train Bayes by clicking buttons, view history, edit symbols and scores, and visualise traffic. It is genuinely pleasant. The first time I saw it I emailed Vsevolod a thank-you. I am not exaggerating.

Installing rspamd on Debian/Ubuntu (the easy version)

If you already set up our Postfix + Dovecot guide, adding rspamd is a 5-minute job:

# Add the official rspamd repository
apt install -y lsb-release wget gpg
wget -O- https://rspamd.com/apt-stable/gpg.key | \
  gpg --dearmor > /etc/apt/keyrings/rspamd.gpg
echo "deb [signed-by=/etc/apt/keyrings/rspamd.gpg] \
  http://rspamd.com/apt-stable/ $(lsb_release -cs) main" \
  > /etc/apt/sources.list.d/rspamd.list

apt update && apt install -y rspamd redis-server

# Plug it into Postfix as a milter
postconf -e "smtpd_milters = inet:localhost:11332"
postconf -e "milter_protocol = 6"
postconf -e "milter_mail_macros = i {auth_type} {auth_authen}"
postconf -e "milter_default_action = accept"

systemctl restart rspamd postfix

That’s it. Open http://your-server:11334, set a password in /etc/rspamd/local.d/worker-controller.inc, and you have a fully working modern spam filter with Bayes, neural networks, all the major RBLs, SPF/DKIM/DMARC, Pyzor/Razor/DCC (after a small extra install), and OLEFY. The defaults are sane. Honestly, this is the most “it just works” piece of mail-server software I have ever installed.

Frequently asked questions

Is rspamd really better than SpamAssassin?

For modern workloads, yes, by a country mile. Rspamd is written in C with Lua plugins; SpamAssassin is Perl. On the same hardware, rspamd is typically 5–10× faster, uses less memory, has built-in neural-network support, and a real web UI. SpamAssassin still has the bigger rule community, but rspamd has caught up and is now the default in iRedMail, Mailcow, Mail-in-a-Box, and most modern mail stacks.

Do I have to train the Bayesian classifier myself?

Yes, eventually, but only a little. Rspamd’s defaults catch 90% of obvious spam without any training. Train Bayes when you have ~200 messages each of ham and spam to feed it, and accuracy will climb above 99%. You can train via the web UI, by piping messages to rspamc learn_spam, or by configuring Dovecot to auto-train when you drag an email into the Junk folder. The third option is the chef’s kiss.

Will rspamd ever quarantine legitimate email by accident?

Out of the box, false positives are very rare, somewhere around 0.1% in our experience. The scoring system is forgiving (you have to cross multiple thresholds to be rejected), and DNSWL/whitelists protect the big legitimate senders. If you do get a false positive, you can retrain Bayes against it, add the sender to a local whitelist, or lower a specific symbol’s weight in /etc/rspamd/local.d/.

How much RAM does rspamd need?

For a small home or hobby server, 512 MB is plenty. For a busy mailing-list operator handling a million emails a day, plan for 2–4 GB plus Redis. Compared to SpamAssassin’s classic ~150 MB per spamd child process, rspamd’s memory model is dramatically more efficient because it uses async I/O and a fixed number of workers.

Does rspamd help with outgoing spam too?

Yes, and this is hugely underrated. You can configure rspamd to scan outbound mail as well, which catches compromised accounts on your server before they ruin your reputation and get you on Spamhaus. A single hacked WordPress account can blacklist your entire domain in 24 hours. Outbound scanning is the seatbelt for that scenario. Set actions = { reject = 15; soft_reject = "Rate limit exceeded"; } and you are golden.

Can I use rspamd without Redis?

Technically yes, but you’d be giving up Bayesian classifier persistence, greylisting state, neural-network learning, and rate-limiting. Just install Redis. It is one apt command. Or even better, install our hardened Valkey package, Redis-compatible, BSD-licensed, and faster.

Rspamd Explained: How Modern Spam Filtering Actually Works (Bayes, Neural Nets, RBLs and All the Cool Tricks)