Reducing Form Spam Without the Use of a CAPTCHA

Posted November 16th, 2009 by Barnaby Knowles in Security, Website Development

Google Buzz

The problem of form spam

Form spam is a growing problem for webmasters. Through our “contact us” feedback forms we’ve all received the ubiquitous emails advertising everything from the little blue pill to cut-price designer timepieces. Bloggers will also be used to receiving lots of comments linking back to the poster’s own website or advertising various wares. The vast majority of this form spam is automated, meaning that a bot comes along and submits the form rather than a human being.

Blocking spambots with a CAPTCHA

Probably the most popular way to protect feedback forms from being spammed it to install a CAPTCHA. A CAPTCHA is a Completely Automated Public Turing test to tell Computers and Humans Apart. There are many different types of CAPTCHAs, but the most common is a distorted image containing a word or phrase that the visitor submitting the form must correctly enter into one of the form fields. Bots cannot “see” the image so they cannot enter the correct word or phrase and submit the form.

Hate them as much as you want, but spammers aren’t stupid. As the use of CAPTCHA images increased, spammers started to defeat them by employing ORC technology that could actually “read” the images (a bit like scanning a document and using OCR software to turn it into editable text again). Unless a spammer really wants to submit your web form, it’s unlikely that a bot of this sophistication will be unleashed on your website. So whilst CAPTCHA images are still a good deterrent, they aren’t the last word in form spam prevention.

There are also other reasons why an image CAPTCHA might not be suitable for your website:

  • CAPTCHA images may make it harder for a visually impaired visitor to contact you.
  • Visitors don’t like having to decipher CAPTCHA images!
  • CAPTCHA images won’t stop human spammers filling your form in manually.

That’s not a definitive list of reasons why you might not want to use a CAPTCHA, but the point is that reasons do exist!

How to block spam without using a CAPTCHA

I have had great deal of success in blocking feedback form spam by filtering user input to identify spam. Spambots appear to operate in a similar way, and patterns can be identified and used to block their form submissions. By scanning input for certain words, phrases or patterns you should be able to virtually eliminate feedback form spam without inconveniencing genuine visitors.

These tips should work using any programming language, whether your website is programmed in PHP, ASP, Coldfusion etc…, as most (if not all) have functions to identify a text string within a larger text string.

Things to look for in user input

Spambots change their behaviour all the time. The items below do not constitute a definitive list of things to check for, but if you look for these you should greatly reduce the amount of form spam that you receive.

PHP-specific hijacking

PHP has the mail() function that allows the webmaster to send email through his website. It is possible for a spammer to craft his form input so as to inject additional headers into the webmaster’s email and thereby add new recipients to the message. If he successfully accomplishes this he can send large volumes of spam through the victim’s website. Many times this type of hijack will contain the phrase MIME-Version: and/or Content-Type. As these are not phrases that genuine visitors are likely to be using, we can assume that any input that includes these phrases is spam.

Email addresses at your website

A lot of the time spambots will use an email address at your website when filling out your form. So if the visitor has used an email address at your website (e.g. sales@your-domain.com) when submitting the form, you can assume that it is spam. Look for @your-domain.com (replacing your-domain.com for your own domain!).

HTML links/code

Spammers often try to submit lots of HTML links in the hope that your form sends you an HTML formatted email and you’ll visit their links. Unless you’re expecting your visitors to be sending you HTML code you can filter out any messages containing a href= as spam.

BBCode

Similarly, spammers often try to submit lots of BBCode links. So unless you’re expecting your visitors to be sending you BBCode you can filter out any messages containing [url as spam.

URLs

Along the same lines as the two points above, spambots often try to submit lots of plain URLs. This is a little more complicated than the former two examples of spam because you might want allow genuine visitors to include URLs in their message. My approach has been to count the number of times http:// appears in the user input and flag any message with more than 2 URLs as spam.

Short messages

Spammers will sometimes test your form for things that they can exploit. Typically they’ll just enter a short message such as “Nice site!” or something similar. You can check the length of the message and flag messages shorter than 11 characters as spam. After all, what genuine visitor would send a worthwhile message that contained fewer than 11 characters?

Common spam email subjects

Spammers often use the same subject line when completing web forms. I have seen a lot of form spam with the subject “some sites“. It’s not a subject that I would expect any genuine visitors to be using so any form submissions with that subject can be marked as spam.

Source email address

I have also seen a large number of spambots use hotmail.ru email addresses. Unless you expect any Russian visitors to be contacting you, you can flag any form submissions using this domain for the email address as spam.

Random spam email subjects

Something that I have started to see more of is the subject containing a random string such as fXNmOtGchIdBGvA. OK so if every spam submission subject is random, how do you block it? Well that string is 15 characters long. How many times would the subject of a genuine form submission be 15 characters long with no spaces? That would require the genuine visitor to be using a single word of 15 characters or more, which seems highly unlikely to me.

Filter out this form of spam by checking the length of the subject line. If it’s 15 characters or more, check for the existence of a space. If none exists then it’s likely that the message is spam.

This type of spam is usually accompanied by the existence of one or more URL in the message. So if we want to be so as not to block legitimate visitors’ messages, we can check the subject line and then also check if the message contains a URL. If both conditions are fulfilled then it’s a pretty safe bet that the message is spam.

Safeguards

Although unlikely, it is possible that a genuine user might trigger one of the spam filters above. Whenever I implement these measures I also assign a useful error message to each filter to that when one is tripped, the user is told exactly why their message has not been accepted. They can then change the offending text. Perhaps you won’t want to reveal your exact phrases or limits just in case a human spammer is accessing your web form, but providing users with an explanation of how to amend their input to pass your filters is a good idea.

Conclusion

By filtering visitor input using these 9 tips you should be able to virtually eliminate form spam. However, as webmasters find new ways to stop spambots, the spammers find new ways to get past our filters. As such user input filtering is an ongoing process and more filters must be added over time. Luckily for the webmaster, spambots’ submissions usually have some discernible pattern that a human can identify and filter out.

  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • FriendFeed
  • LinkedIn
  • MySpace
  • Ping.fm
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • Twitter



Leave a Reply

 
Follow us on twitter! View Our Digg Profile!
Browse Our YouTube Channel! Check Out Our Delicious Bookmarks!
Connect With Us On LinkedIn! Find us on Facebook
Make Child Poverty History
© 2009 RAM. All rights reserved. Built and Powered by WSI. | Sitemap
Website Development and Online Marketing for Huddersfield, Leeds, Manchester, Sheffield & West Yorkshire

WSI Internet Consulting, The Media Centre, 7 Northumberland Street, Huddersfield, HD1 1RL
Registered in England No. 4968860, Bridge End House, Park Mount Avenue, Baildon, BD17 6DS