Product Engineer, CTO & a Beer Enthusiast
Experiments, thoughts and scripts documented for posterity.
Aug, 2014
Since I work on a extremely user driven content web site, I have to make sure that there is no user inputted HTML on the page that break the CSS or the layout of the page. So we had to build a HTML Stripping functionality to strip out the HTML on the fly. We had initially used the obvious RegEx technique to strip out the HTML. But as the traffic increased the page performance/page load time started increasing. So we decided to enable trace on the page to determine the most expensive operation. So while re factoring the code we realized that the HTML stripping functionality was adding on the page load time.
private static string StripHTML(string pSource)
{
char[] ca = pSource.ToCharArray();
StringBuilder sb = new StringBuilder((pSource.Length * 5 / 4));
bool inside = false;
for (int i = 0; i <= ca.Length - 1; i++)
{
char oTempChar = ca[i];
if (oTempChar == '<'){ inside = true; continue; }
if (oTempChar == '>'){ inside = false; continue; }
if (!inside) { sb.Append(oTempChar); }
}
return sb.ToString();
}