//Karthik Srinivasan

Product Engineer, CTO & a Beer Enthusiast
Experiments, thoughts and scripts documented for posterity.

Quirky Personal Projects

LinkedIn

Email me

Bot/Crawler/Spider Check Current.Request ASP.NET

Feb 2010

While implementing a Caching Solution (LRU caching) for a project that I was working on, I realized that search engine crawlers were flooding the IIS cache which led to "out of memory exception". So for this I had to make sure that if the current request was from a Crawler then do not add to the Cache. So following is a simple implementation of WebCrawler check in C#


public static bool IsCrawler(HttpRequest request)
{
if (request != null)
{
bool isCrawler = request.Browser.Crawler;
if (!isCrawler)
{
// put any additional known crawlers in the Regex below
Regex regEx = new Regex("Twiceler|twiceler|BaiDuSpider|baduspider|Slurp|slurp|
ask|Ask|Teoma|teoma|Yahoo|yahoo");
isCrawler = regEx.Match(request.UserAgent).Success;
}
return isCrawler;
}
return true;
}


USAGE:

if(IsCrawler(HttpContext.Current.Request))
{
response.write("You are a bot. Piss off!!");
}
else { ... }