$secure_text = strip_tags($original_text, '<b><i>'); // we only allow the <b> and <i> tags, everything else will be removedDo you feel peaceful, calming sense of safety? You shouldn't ;)
The problem with
<b>tag can be dangerous. For example if the input is
<b onclick="alert('PWNED')">click me</b>then the
onclickattribute won't be removed if you simply sanitize the input with
You may think that you can write a simple function to properly remove the dangerous attributes of the allowed elements, and use that instead of
strip_tags(), but there are 2 problems with that:
- you can't do that safely with regexp
- if you take into account the
<a>tag, then the problems become more complicated: if you allow links in the user input - why not, the user must be able to put links into his forum post - then the obviously needed
hrefattribute can also contain malicious code. Example:
To sum up,
strip_tags()is not enough. Let's see what else we have.
If you are up-to-date with the well-known PHP libraries, then you have surely heard about HTMLPurifier. It's known to be a very mature tool to securely handle the user input. This is the first thing you should have in mind. Secondly, your template engine may also be responsible for properly escape the unsecure data. I didn't want to check all template engines out there, so I picked Twig - which I have had to use recently - and tried out what it knows. I have made the testing using the following PHP script:
After downloading HTMLPurifier and Twig, you can simply try it out.
For testing, I used the following form data:
strip_tags(), HTMLPurifier and Twig. Nothing much to say about it, both
- If you want to allow some HTML in the untrusted input, then don't rely on
strip_tags(), use HTMLPurifier instead.
- If you want to use Twig then escape the input in your controller using HTMLPurifier, before rendering, and don't rely on the
striptagsfilter of Twig, which in fact doesn't do anything else just calls
strip_tags(). (Otherwise Twig is not a bad choice for templating in my opinion)
- You may provide a custom formatting syntax for your user and completely disallow HTML, for example BB codes and wiki syntax can be an option in some cases, but due to the lack of WYSIWYG editors for those "languages" it's not possible in a lot of cases nowadays.