Bot/Crawler/Spider Check Current.Request ASP.NET

Feb 2010

While implementing a Caching Solution (LRU caching) for a project that I was working on, I realized that search engine crawlers were flooding the IIS cache which led to "out of memory exception". So for this I had to make sure that if the current request was from a Crawler then do not add to the Cache. So following is a simple implementation of WebCrawler check in C#


public static bool IsCrawler(HttpRequest request)
  {
      if (request != null)
      {
          bool isCrawler = request.Browser.Crawler;
          if (!isCrawler)
          {
              // put any additional known crawlers in the Regex below
              Regex regEx = new Regex("Twiceler|twiceler|BaiDuSpider|baduspider|Slurp|slurp|
ask|Ask|Teoma|teoma|Yahoo|yahoo");
              isCrawler = regEx.Match(request.UserAgent).Success;
          }
          return isCrawler;
      }
      return true;
  }

USAGE:


if(IsCrawler(HttpContext.Current.Request))
{
 response.write("You are a bot. Piss off!!");
}
else { ... }