Product Engineer, CTO & a Beer Enthusiast
Experiments, thoughts and scripts documented for posterity.
Feb 2010
While implementing a Caching Solution (LRU caching) for a project that I was working on, I realized that search engine crawlers were flooding the IIS cache which led to "out of memory exception". So for this I had to make sure that if the current request was from a Crawler then do not add to the Cache. So following is a simple implementation of WebCrawler check in C#
public static bool IsCrawler(HttpRequest request)
{
if (request != null)
{
bool isCrawler = request.Browser.Crawler;
if (!isCrawler)
{
// put any additional known crawlers in the Regex below
Regex regEx = new Regex("Twiceler|twiceler|BaiDuSpider|baduspider|Slurp|slurp|
ask|Ask|Teoma|teoma|Yahoo|yahoo");
isCrawler = regEx.Match(request.UserAgent).Success;
}
return isCrawler;
}
return true;
}
if(IsCrawler(HttpContext.Current.Request))
{
response.write("You are a bot. Piss off!!");
}
else { ... }