Why validate?

Incoming data validation is "must have". Always be aware, that user will send you corrupted (or wrong) data. Sometimes by mistake, sometimes on purpose.

SQL injection and XSS (Cross-site scripting) are possible due to lack of data verification.

What should I validate?

E-v-e-r-y-t-h-i-n-g. Every piece of incoming data.

While we can assume that administrator may be able (and sometimes he should be) to place the HTML or Javascript in a message content, we can't say this about standard user posting message in a guestbook.

What can we lose?

Data. Logins, passwords and other stuff.In commercial products that could be a disaster.

Let's mess things up!

I've created simple test "site" (click), where you can play with content injection. Here you can download source code.

Inject h1 tag and some content:

<h1>Injected content</h1>

It's not hard to guess, what we will see after sending this message. H1 tag inserted into page content. Not so bad. We can inject HTML so why not CSS and Javascript? Let's try:

<div style="text-decoration: underline;">

It works! Strip_tags alone is not enough. PHP has some other nice methods: addslashes i stripslashes. The add (and remove) slashes before "dangerous" signs like '"'. Thanks to those methods we will be protected against SQL Injection - because after this, even "special" chars will be treated as standard ones. If we send out Jacascript and we use strip_tags_addslashes, we will see in source code - that javascript tag has been removed and other tags are escaped.

Why we need stripslashes?

We protect ourself by adding slashes - but before we show output to user - we should remove them.

Buuu - inconvenient!

You think so? Yeah - you're right. It is a lil bit inconvienient, but do not worry, there is another option. If you use Mysql database, you can use mysql_real_escape_string. It detects dangerous stuff and "add and strip" slashes without our help.

But, what about HTML?

Hmm but what about situation when we would like to show a html code as a text? Using previous methods - we would remove all html from content. There is one method that should help use - htmlspecialchars . It will change "living" HTML into its entity equal (but safe) version, so you can display it and it will not be interpreted.


I've tried to show the "basics of basics" of data validation. Read, test, read, test and never feel safe :)