5:35 PM -- To strip or not to strip? Nope, it's not a question for the drunk people on Bourbon Street -- it's a tactical security question that faces a lot of developers. When you are faced with malicious content, should you strip it or block it? The line between the two may seem fine, but your choice could have a major impact on the end user's experience.
Let's take a real world example that I'm sure that we have all faced on the Net. You've spent 15 minutes crafting the perfect response to a question that someone posed on a Web-board -- only to find yourself clicking "submit," getting an error, and then not being able to recover your lost words. Dostoyevsky himself would have been impressed by your use of language, yet no one will ever see it. Press "back" as you might, the page reloads and your masterpiece has been lost forever.
Sure, the Website believes it has protected the site. But all it has done is stop a wagonload of text from being entered. And, as a consequence, it has harmed the consumer experience. Obviously, this example points to a need for better pattern matching, but let's put aside that issue for now.
Now let's think about it from another angle. The Website code has no idea what "onload" means, or in what context it is being used. But it is scary-looking, so the developer will strip it out to save the end user some pain. So now our sentence -- "I love this site a wagonload!" -- becomes "I love this site a wag!" It's confusing, but the site is theoretically safe. Maybe the developer will build in an "edit" function to allow the person to revise text until it doesn't offend the site's security mechanisms. The world now seems safe.
Or is it?
Now let's pretend I'm an attacker. I desperately want to put "onload" on the page, because I intend to attack it. I know it's not allowed, and it's stripped. So let's enter something like "ononloadload," which turns into "on[stripped]load" which reads "onload." So I, as an attacker, have used the stripping mechanism against itself. This is a very simple example, but it is a problem that plagues large sites that want to allow some HTML -- without allowing anything malicious.
A prime example of this problem is MySpace, which has been hit by the same vulnerability six times because it has not properly stopped attackers from entering malicious text through stripping. In providing a consumer benefit, MySpace has made its site far more dangerous to those very same consumers.
So the tradeoff is clear: security or usability? It's a long-standing question that has plagued the security community for years, but MySpace is a perfect, real-world example of where the tradeoffs are causing many problems. The patching exercise at MySpace is forcing a level of creativity in filtering mechanisms that at worst is obfuscation, and at best is a good attempt at patching a leaky ship.
So which is it? To strip or not to strip? I suppose the answer lies in the cost of closing the security hole and blocking the content outright. I'll tell you one thing though: I'd never advise anyone to strip -- unless they have the body for it.