Wednesday, January 5, 2011

Frustrating Spam Filter Issues Continue...

Since blogger's spam filter cannot be turned off but has been nothing but a disruption on my blog, I went onto the blogger forum to explore options and lodge complaints. The recurring message is that everyone who is part of blogger needs to help "train" the spam filter by correcting its mistakes--by marking misidentified messages in the spam box as "Not Spam." Here is the message I posted in relation to this:

If disabling the filter isn't an option, I'd like to help "train" the filter--which has blocked dozens and dozens of legitimate comments on my blog (and one lonely piece of actual spam, while letting some spam through).

But the problem with "training" it is this: My regular commenters are often engaged in lively ongoing philosophical discussions sparked by my post, discussions which are going on during a period when I am away from the blog and can't sit at the spam filter to hit "Not Spam" every time the filter makes a mistakes (and, in my case, its error rate approches 100%--that is, virtually EVERYTHING it regards as spam is not).

Because my commenters are energetically discussing issues they are passionate about, when their comment disappears they try again. In fact, they keep trying until they find a version that passes the spam filter. By the time I check the filter, there can be up to 20 legitimate comments that were blocked.

But here's where things get tricky. If I mark them all as "Not Spam," they all get published, cluttering up my comments page with redundant and out-of-order comments that disrupt the capacity of readers to follow the conversation that has been evolving. So I find myself wasting precious time comparing what is in the spam filter with what got through, to see if I can identify comments that didn't eventually make it through and only marking THEM as "Not Spam." Sometimes--being a busy person--I just don't hae the time for that. But even when I do, this is such a selective effort that it is hardly going to educate the spam filter properly.

To properly "train the filter" by letting it know every time it makes a mistake, I need to make the comments page an incoherent mess. But if I don't train it, my commenters will continue to face the frustration of having to fight to figure out how to be part of the conversation. Either way, everybody on my blog loses. As far as I am concerned, this filter has been a disaster (perhaps because spam was never really a problem on my blog and continues to be a trivial issue). Any suggestion about how to train the filter to be less disruptive without the training process being disruptive would be more than welcome.
So that's what I wrote, which may or may not get a response. In the meantime, let me ask all of you who regularly comment on my blog what you would prefer. Here are several options: I could go through and mark every one of your filtered comments as "not spam," resulting in a sudden glut of comments cluttering up comment threads on many posts. This would be a messy option but might help teach the filter more quickly and reduce future problems. (I might then go through and delete every comment that just got posted to clean up the mess--but I'm not sure what that would teach the spam filter, since it may be programmed to learn from what bloggers delete).

Alternatively, I could continue with my current policy (and try to be better about checking the filter's spam box regularly). This could mean a slower learning curve for the filter and ongoing frustration of the sort you've been dealing with, but wouldn't turn the comments page into a mess.

Or we could all agree for the next week or so not to attempt reposting when the filter is blocked, and I will try to check the spam box at least once in the morning and once in the evening (except weekends, when I can't promise to make it online)--and mark all comments as "Not Spam" (unless I actually start getting real spam...). This will slow down the pace of conversations, making them partly dependent on my sporadic moderation--but we might find out whether this teaches the filter, leading to a diminishing need for moderation.

Or are there other options people can think of?


  1. LOL...I was wondering what was up when I posted some comments on your blog for the first time the other day....haven't had this issue on blogger myself, or when trying to post elsewhere. How exasperating for you!

    Quick & dirty?? Do whatever makes your life easiest.

  2. I agree with down...whatever makes your life easier. Having to painstakingly compare what's actually posted with what's been blocked seems like way too much trouble.

  3. Eric,

    I have tried to post one of my rejected comments on a test blog of mine and, predictably, it was rejected. I then tagged it as non-spam and tried to post it again, unsuccessfully. I tried 5 or 6 times (each time telling blogger it was not spam) and couldn't get blogger to accept the comment. Apparently the filter needs a *lot* of training...

    Accepting all rejected comments, as you suggest, would be simple and might (eventually) improve the filter. However, this might result in some duplication and you shouldn't have to do the cleanup yourself. Two ideas:

    First, starting with new comments, those with a google account who have reposted a modified version of a rejected comment could go back themselves and remove the redundant entries after the comment has been restored. This would reduce the cluttering to “comment deleted by the author” entries. This wouldn't work for those who don't use this method (Bernard? Keith?). But setting up an account is a simple matter and you don't have to share any personal information (not even email).

    Second, to reduce the waiting time before a rejected comment is posted, you might consider asking one or two trusted friends to help you manage the spam folder. You wouldn't have to give away your own password but this would require, I think, giving them administrator access to your blog (through their own accounts).

    In any case it might be useful to put on the right panel of the blog a permanent link to a post in which you describe the problem and what you expect from the commentators (waiting or reposting+cleanup or ?).

  4. This comment has been removed by the author.

  5. Eric,

    It couldn't have been otherwise... My comment on rejected comments has been rejected...

    I will not try to repost my comment and will wait until you have a chance to accept it.