Comment Spam

Also known as: Blog spam, Forum spam

Automated posting of promotional or malicious messages to blog comment sections, forums, guestbooks, and other user-generated-content fields, usually to manipulate SEO or distribute links.

Last updated:

What is comment spam?

Comment spam is the flood of machine-posted messages that accumulate on any public-facing blog, forum, or guestbook. The typical content is either a generic compliment ("Great article!") followed by a link to an unrelated site, pharmaceutical and gambling keyword soup, or short pasted text designed to test whether a particular CMS/plugin will render arbitrary HTML. The goal is rarely to communicate with humans — it's either SEO link manipulation, spreading malware URLs, or probing for a platform to exploit.

Why it persists

The economics track the same pattern as email spam: posting is free, moderation costs time, and even a handful of links that survive long enough to be crawled by search engines produce value. Even though modern Google ignores links from obvious comment-spam sources (they've been nofollow by default since 2005), a long tail of less-strict crawlers and citation indexes still weight them.

How it's posted

Comment spam is almost always posted by bots, not humans. The workflow:

  1. A target list crawler enumerates sites running WordPress, Disqus, phpBB, or similar commenting systems
  2. A poster bot fills out the comment form from a list of prepared templates
  3. Source IPs rotate through open proxies, botnets, and residential proxy networks to bypass per-IP rate limits

Defense

The standard stack — CAPTCHA on the comment form, spam-filter plugins (Akismet, CleanTalk), rate limits by IP, and content-based rules (link count, keyword matches, known-spam IP ranges) — catches the vast majority. Moderation of a manageable trickle handles the rest. Blocking IPs that appear on comment-spam blocklists prevents the bot from ever reaching the form. Running repeat offenders through an IP abuse report checker confirms whether they're part of a larger campaign.

Frequently Asked Questions

Three reasons. First, Google's nofollow-by-default for blog comments only applies to its own crawler — many smaller search engines, scrapers, and citation indexes still pass weight from these links. Second, low-budget SEO services charge by quantity ("we'll post your link to 10,000 sites"), not quality, so the spammers get paid regardless of whether the links work. Third, a fraction of comment spam is actually malware-distribution attempts where the goal is to plant links humans might click, not to influence search rankings.
They run automated platform-fingerprinting crawlers that look for the markers of common commenting systems — WordPress `/wp-comments-post.php`, Disqus thread containers, phpBB form structures, generic guestbook scripts. The crawler builds a target list, then a separate poster bot iterates through it submitting comments from a template library. Sites are also grabbed from search-result lists for queries like "leave a comment" or "powered by WordPress".
For WordPress, Akismet (built-in) and CleanTalk are the two market leaders and both catch over 99% of spam without meaningful false positives on small-to-medium sites. For higher volumes or non-WordPress platforms, Cloudflare's Turnstile and hCaptcha at the form-submission step add a bot-challenge layer that blocks most automated posters regardless of their content. Combining a content-based filter with a bot challenge produces the strongest result.
A modern CAPTCHA (reCAPTCHA v3, hCaptcha, Turnstile) blocks the vast majority of basic automated submissions, but determined spammers route around them via human-CAPTCHA-solving services (charged at fractions of a cent per solve) or by fingerprinting the browser well enough to look human. CAPTCHAs are necessary but not sufficient — pair them with rate limiting per IP, content-based filtering, and (for high-stakes forms) account-based moderation.
First-comment moderation holds the first comment from any new identity (email + name combination) for manual approval, then auto-approves subsequent comments from the same identity once they have a clean track record. It catches 90%+ of spam with minimal moderator effort, since spammers almost never post through the same identity twice. WordPress, Disqus, and most commenting platforms include it as a built-in option but it is often disabled by default.