Most people already heard that, however it's worth mentioning (via wiki):
A cryptographic hash function is a deterministic procedure that takes an arbitrary block of data and returns a fixed-size bit string, the (cryptographic) hash value, such that an accidental or intentional change to the data will change the hash value. The data to be encoded is often called the "message," and the hash value is sometimes called the message digest or simply digest.
But, what does it really mean? It's quite simple - we have and input string (or some other data), we "insert" it into our algorithm and on output we will have a new "shortcuted" string. This operation is one-sided, so you cannot turn it back (to be honest you can but it is really hard). If you use SHA-2 hash function, the output looks similar to this:
4e2ecff8f8be5a7d4d8821266d956d844aa5b8eebd5983edbaaa6fa7fc9bc9e21
de42d443f50d8608a79f6507b7e95c6d4a913615c85710f86a40bc23cdc5d5d
When we store users passwords in our systems (databases, files, etc), they should be safe. If we get hacked and our database will get stolen, passwords should be protected. No one should be able to read them. Most users have one password for all their web-activities, so if this password get stolen, probably cracker will be able to log in into victim Facebook, Twitter and any other web accounts.
If we store not a pure password but its hash shortcut - even if it get stolen, cracker will not be able to use it to authorize into any type of account.
When using cryptographic hash function, we must remember about some rules:
- MD5 should not be used for critical functions such as hashing passwords
- Every hash function with "open" algorithm can be "broken" using brute-force attack
- Every brute-force attack can be speeded up by using rainbow tables
- Allowing users to create simple passwords is also not recommended
Remember this and you will be safe.
First of all, lets select one of hash functions. MD5 is old (and weak), also SHA1 has some vulnerabilities. The most common safe hash function is SHA2 and it is recommended when hashing password.
But what about brute-force attacks? Any password should be validated before use. They should not be to short or two simple. We can do it by using regular expression like this one:
^(?=.*\d)(?=.*([a-z]|[A-Z]))([\x20-\x7E]){8,40}$
Regexp presented above will ensure has minimum 8 chars, minimum one big letter and minimum one digit. Using this type of regular expressions will ensure that none user will have password like "abc" or any similar. But still, if we have rainbow tables and a lot of password hashes, we can extract at least some of them. How to protect ourself against attacks based on rainbow tables? Use salt.
What is salt? Salt consists of random bits, creating one of the inputs to a one-way hash function.In a typical usage for password authentication, the salt is stored along with the output of the one-way function, sometimes along with the number of iterations to be used in generating the output (for key stretching). After mixing salt into password any rainbow table will be meaningless.
How tu generate and use salt? The easiest way is to use one, global salt. Example:
# only small letters and digits
Password: "123qwerty"
# small and big letters, special chars and digits
Salt: "%^&*(#@$@K:JKBJVCHKB@QRU)+{KMF er23"
# password+salt
Hash: sha2
As you can see above - using salt will dramatically increase password power. One global salt has one major and really big disadvantage. If two users have same password they will also have same output hash. So, if we have a lot of users and some of them have same hashed password, we need to figure out only one hash and we will have access to accounts of the rest of users with same hash. We can also generate our own rainbow table dedicated for our cryptographic hash function and salt.
To protect against such behaviours we should use uniq per user salt. How to generate such salt? Combine some per user data and some random stuff. Example:
salt = user.login+user.created_at+rand(10**5)+'65241770q_ E9089u(&'
We store salt with password hash. Don't worry - it is safe. Since each user has his own uniq hash, there does not exist any general rainbow table. Mix password, dynamic and static salt and you will be safe. Furthermore, when mixing salts and password in a uniq way - until cracker steals database and source codes, he will not know how to generate rainbow tables. Example:
hashed_pass = SHA2(user.login+user.password+salt+static_salt)