Three Akismet Hacks to Improve Perfomance, Reduce Spam
As you know Akismet is a server-side spam filtering solution for Wordpress (WP) blogs. It works probably by wrapping a REST interface around an open source product like spam-assassin, which does bayesian classification of comments to try and figure out which are spam and which are real. Since it has a wide userbase, the statistical sample size is large enough for it to infer patterns that protect most of its users from most spam.
We can do better.
There are two prime problems with the way Akismet deals with recognized spam. First, it provides a centralized solution. If Akismet goes down, you will get a torrential amount of spam. The solution to partially ameloriate this problem is to add a quick DNS blacklist check on the incoming IP address. Why should you bother hitting the Akismet server without even doing a basic dns check? The second problem is that Akismet lets particular spammers keep on happily spamming your blog. Adding spam comments to the internal blacklist solves that problem.
There’s a also a potential problem with the way Akismet accesses the database. When it deletes all the old comments–which is does constantly–there’s a 20% chance for it to optimize the database. Instead, we want to try to delete all the really old spam, and optimize the table, every few hundred spams we get. In the new version, we delete old spam at a 0.2% rate.
You can download the new version of Akismet here: akismet.zip. Simply unzip and put into your wp-plugins folder as usual.
| This entry was posted on Saturday, September 23rd, 2006 at 3:38 pm and is tagged with open source product, incoming ip address, centralized solution, plugins folder, spams, accesses, server side, akismet, hacks, assassin, wp, interface, blog. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback. |
It works.
The akismet plugin link doesn’t work anymore?
Whitelist, Blacklist, Greylist…
I recently got into a spirited discussion about Akismet. What is Akismet? When a new comment, trackback, or pingback comes to your blog it is submitted to the Akismet web service which runs hundreds of tests on the comment……
Elliott, I’ve been deeper than you did: I blocked blacklisted IPs (using miscellaneous DNS RBLs) from viewing my blog, so even referrer spammers have been blocked. The main issue was about 5 to 10% real visitors were blocked as well.
For sure, using DNS RBLs filters more spammers. However, it also filters legitimate comments as well. The main issue here is people who post comments often have blogs. And they talk about other blogs. If you reject their comments, they won’t comment your blog, they won’t come back to read other comments and they won’t talk about your blog neither.
It’s probably better to have to remove spam manually than preventing opinion leaders (what other bloggers are) to participate to your blog by leaving an interesting comment on it.
IP blacklisting is definitely a technique for the heavily spammed. It will lead to more false positives, but if you don’t mind that price, it should also heavily decrease the amount of spam getting through the filter.
Unfortunately the DNS RBLs cause an unusually high false positive rate, which is why we don’t use them anymore.
The database optimize will be removed or reduced in the next version, it can cause problems on larger blogs.