“Imagination is everything. It is the preview of life's coming attractions”

- Albert Einstein

  • How To Protect Forms From Spam


    How To Protect Forms From Spam

    Not that long ago in a city not that far away a nasty problem appeared. Spam was it’s name, and hardy and troublesome was he. Some may say that spam can contain some interesting content from time to time. Well, maybe that’s true, but when it’s filled with pharmacy offers concerning famous blue pills and send to feminine part of the company - something is terribly wrong.

    In case of such or similar situation we should first write down our goal. We’ll want to create a universal and easy in use solution that will secure any of our existing or future forms against spam. We also don’t want to suffer any usability losses after implementation. Let’s put everything together in a few predicates, which we want to be fulfilled:

    • lack of CAPTCHA,
    • easy in implementation,
    • lack of maintenance,
    • hidden from user,
    • reusable,
    • effective.

    Simplest fulfillment of these requirements would be adding an input field hidden from the normal user but "surely" parsed by spamming crawlers. All we would need is to name the field with some popular word, eg. “name” and implement simple server-side verification which would check if “name” was passed on submit. If yes, then request sender is not a mortal man. Genius, simple, electrifying.

    Not at all. What about crawlers than know how to differ hidden input fields from visible ones and ignore them when it’s decides that’s they’re just a nasty trap? Such robots can be written quite easily using software like Selenium, which allows to automa browser functions (user for example in functional testing). Solution that has been presented above can secure our site from crawlers based on simple page parsing using basic regular expressions, but not against more clever beasts.

    So is there any other solution? Let’s think for a minute about what’s more important for makers of such robots. It must crawl over as many pages as it can in possibly shortest time to increase number of potential "recipients". In that case the critical factor is the time - the faster given instance of the crawler will fill and send the form the better. Let’s assume that the robot is filling and sending forms instantly or at least much faster than any human could do that. Based on common sense we set minimal time that is required to perform such actions for a human being. After form submission we check that time and decide whether it’s spam.

    Splendid you’ll say, but how to implement this solution in a suitable way? Just follow these steps:

    1. Put in any of available forms (between tags <form></form>) the following code:
      <input type="hidden" name="form-display-time" value="0">
      
    2. After that paste between <head></head> OR just before </body> tag:
      <script type="text/javascript" src="http://code.jquery.com/jquery-x.x.x.min.js"></script>
      <script type="text/javascript">
          var incrementInterval;
          // First, let's wait for the document..
          $(document).ready(function () {
              // IncrementDisplayTime() function will be executed every second 
              // until clearInterval(incrementInterval) is called
              incrementInterval = setInterval('incrementDisplayTime()', 1000);
          });
      
          /**
           * Increments every form's hidden input value with name="form-display-time" by 1
           * 
           * @return void
           */
          function incrementDisplayTime() 
          {
              $('form input[type="hidden"][name="form-display-time"]').each(function() {
                  // We use + 1 because increment postfix doesn't work in javascript 
                  // for values returned by functions 
                  $(this).val(parseInt($(this).val()) + 1);
              });
          }
      </script>
      

      Above code requires jQuery framework. Alternatively we can use pure javascript code available below that doesn’t require any frameworks (should be implemented same way as code above):

      <script type="text/javascript">
          var incrementInterval;
          if (document.readyState === "complete") { 
              // Anonymous function will be executed every second 
              // until clearInterval(incrementInterval) is called
              incrementInterval = setInterval(function () {
                  var elements = document.getElementsByName('form-display-time');
                  for (var i = 0; i < elements.length; i++) {
                  elements[i].setAttribute('value', parseInt(elements[i].getAttribute('value')) + 1);
                  }
              }, 1000);
          }
      </script>
      

      Both scripts increase by 1 value of each hidden input field that we put into the code at first step - this allows us to secure multiple forms.

    3. We determine average minimal time required by human beings to fill the form or forms, eg. 6 seconds for a simple contact form should be enough.
    4. All that’s left is the implementation of server-side form validation - in file pointed by form’s action attribute in <form> tag. Example code written in PHP goes like this:
      <?php
      
          if ($_POST['form-display-time'] < 6) {
      	// Some anti-spam operations
          } else {
      	// Operations concerning non-spam messages
          }
      

      This way we can for example add “spam” prefix to e-mail title or abandon further processing in the code.

    Presented security mechanism with appropriately assigned time threshold for each of the forms should fulfil all previously mentioned predicates. It removes spam problems from all crawlers that do not support javascript and also from those a bit cleverer that know how to play dirty.

    Back