You're Probably Storing Passwords Incorrectly

Are we reeaally insecure?

The Web is nothing if not a maze of user accounts and logins. Almost everywhere you go, on the web, requires yet another new set of credentials. Users collect usernames and passwords like they do Pokemon. In such a sorry state of the web, let’s say a hacker somehow obtains a list of all our usernames and passwords, it’s only for the better, if the hacker can’t still know passwords right away.
but who stores passwords as plain-text anyway, only very-novice programmers do so, we are secure, right?

few years ago, a backup copy of Reddit database had been stolen. and one of the developers confirm storing the details in plain-text.

You might think it’s relatively unimportant if someone’s forum password is exposed as plain text. After all, what’s an attacker going to do with forum credentials? Post angry messages on the user’s behalf? But most users tend to re-use the same passwords, probably because they can’t remember the two dozen unique usernames and passwords they’re forced to have. So if you obtain their forum password, it’s likely you also have the password to something a lot more dangerous: their online banking and PayPal.

So what can we do?

Attempt 1: STORE THE PASSWORDS UNENCRYPTED

If what you run is a platform with just a few users you might even consider it an advantage to store passwords unencrypted. That way, if someone forgets their password, you can just look it up and tell them what it is.

Don’t do this, for the simple reason that anyone who gets to peek at the file immediately knows how to login as any user.

Worse still, they can see what sort of password a user uses, and given luck, he might end up with credentials for online banking or something equally sensitive.

The point is that neither you, nor any of your fellow system administrators, should be able to look up a user’s password.

It’s not about trust, it’s about definition: a password ought to be like a PIN, treated as a personal identification detail that is no-one else’s business.

Attempt 2: ENCRYPT THE PASSWORDS IN THE DATABASE

Encrypting the passwords is much better right? You can arrange to have decryption key on another server, and retrieve it only when required and only ever keep it in memory.

No! This doesn’t solve that problem mentioned in attempt one, neither you, nor any of your fellow system administrators, should be able to recover a user’s password.

Worse still, if someone manages to steal your database, and also gains access to key, this turns into attempt one.

This probably was the mistake Adobe did in resulting in October 2013 data breach. Not only was it one of the largest breaches of username databases ever, with 150,000,000 records exposed, it was also one of the most embarrassing.

By the way, inept encryption like simple DES has one more serious flaw, length of the encrypted data gives us a clue about the length of the unencrypted password.

With such an approach, one can easily tell, without even the decryption key, that some people have same password, a needless information leak.

before trying to solve our problem with hashing, we insist on the following requirements:

User’s passwords should not be recoverable from data
Similar or even identical passwords should have different hashes
The Database should give no hints as to the length of password

Attempt 3: HASH THE PASSWORDS

One of the requirements is that user’s password should not be recoverable from data, this can be done with what is known as a cryptographic hash, which takes an input of arbitrary length, and mixes up the input bits into a sort of digital soup.

As it runs, the algorithms strains off a fixed amount of random-looking output data, finishing up with a hash that acts as a digital fingerprint for its input.

Mathematically, a hash is a one-way function: you can work out the hash of any message, but you can’t go backwards from the final hash to the input data.

A cryptographic hash is carefully designed to resist even deliberate attempts to subvert it, by mixing, mincing, shredding and liquidising its input so thoroughly that, at least in theory:

You can’t create a file that comes out with a predefined hash by any method better than chance.
You can’t find two files that “collide”, i.e. have the same hash (whatever it might be), by any method better than chance.
You can’t work out anything about the structure of the input, including its length, from the hash alone.

Well known and commonly used hashing algorithms are MD5, SHA-1 and SHA-256.

Of these, MD5 has been found not to have enough “mix-mince-shred-and-liquidise” in its algorithm, with the result that you can find two files with the same hash very much faster than by chance.

This means it does not meet its original cryptographic promise – so do not use it in any new project. SHA-1 is similar to MD5 in algorithm, so we’ll use SHA-256.

All SHA-256 hashes are 256 bit in length, so we are not leaking any information about length of the password.

To verify a users password, only keep the password in memory, never let it touch the disk, compute its hash. If the computed hash matches the stored hash, user has input the right password.

We’ve solved two of the three problems. But what we face now is, identical inputs yield same hash. Also,Hashing the passwords prevents plaintext exposure, but it also means you’ll be vulnerable to the astonishingly effective rainbow table attack (i’ll write about it later).

i.e. A hacker can lookup hashes of popular tables or even hashes of passwords up to a certain length given memory, and crack the entire database in a single lookup.

Attempt 4: SALT AND HASH

Fortunately, the solution for rainbow table attacks is simple enough– add a ‘salt’ to the hashes to make them unique.
hash = SHA256('super-awesome-salt-' + password)
We can adapt the hash that comes out for each password by mixing in some additional data known as a salt, so-called because it “seasons” the hash output. A salt is also known as ‘nonce’ — number used once

Simply put, we generate a random string of chars that we include in our hash calculation along with the actual password.

The salt is not an encryption key, even with the salt, it’s not possible to arrive at original password from hash (again, in theory).

So it can be stored in the password database along with the username – it serves merely to prevent two users with the same password getting the same hash.

For that to happen they would need the same password and the same salt, so if we use 16 bytes or more of salt, the chance of that happening is small enough to be ignored.

Now whenever a user inputs a password, calculate hash using the stored salt as well. Make sure you choose random salts – never use a counter such as 000001, 000002, and so forth, and don’t use a low-quality random number generator like C’s random().

By using sufficiently many bytes from a decent source of random numbers, you as good as guarantee that each salt is unique, and thus that it really is a “number used once.”

We have satisfied all our requirements, so, are we there yet?

Although we have them, modern hash cracking servers are powerful enough to calculate hundreds of billions of hashes per second!

So a determined hacker can crack a single password + salt hash rapidly. Now it only makes sense to make this process slower: hash stretching.

Attempt 5: HASH STRETCHING

Cryptographic hashing algorithms are carefully designed to stop any coming back, but the same result can be achieved by trying to go forward, several times.

If crooks manage to steal hashes, and can work offline, there is no limit except computational power that they can try combining every word in a dictionary ( or every password possible ) with every salt in your database, calculating the hashes and seeing if they get any hits.

It therefore makes sense to slow down offline attacks by running our password hashing algorithm as a loop that requires thousands of individual hash calculations

This lag can’t even be noticed when iterating for a single password, but it will definitely reduce the rate at which a hacker can carry out the attack, in direct proportion to number of iterations you choose.

let length of password be k, and number of iteration be i
for user: i (time complexity of hashing)
for hacker: ((128)**k)(i * (time complexity of hashing))

7 bit ASCII supports 128 characters

Well known algorithms for hash stretching are: PBKDF2, bcrypt or scrypt.

PBKDF2 is a part of RSA Laboratories‘ Public-Key Cryptography Standards (PKCS) series and also published as Internet Engineering Task Force‘s RFC 2898.

PBKDF2 using HMAC-SHA-256 repeated several say 10000 times or more is recommended. As the computing power available to attackers increases, you can increase the number of iterations you use – for example, by doubling the count every year.

HMAC SHA 256 works in a (special way)[https://tools.ietf.org/html/rfc2104] detailed description of which would make an other post.

If you are using an old style hash, when a user logs in successfully, you simply regenerate and update their hashes using the new iteration count, if you change your iteration count or algorithm.

In summary:

Use a strong and secure random generator to create a salt of 16 bytes or longer.
Use PBKDF2 with HMAC-SHA-256 as core
perform a large number of iterations
take 32 bytes from output as hash of password
store hash, salt, iteration count
increase iteration count regularly to keep up with faster cracking tools.

The last word

Do not invent your own “clever” password storage scheme.

I know, you’re smart, and you got this crypto stuff, but through this door lies madness– and abominations like LMHash that have ongoing, worldwide security ramifications we’re still dealing with today.

Take advantage of whatever password storage tools your framework provides, as they’re likely to be a heck of a lot better tested and more battle-proven than any crazy scheme you and your team can come up with on your own. Security vulnerabilities, unlike functionality bugs in your application, run deep and silent. They can lay dormant for years.

And when they come out, they are the game-over bugs. You can lose all your reputation and trust of users you have accumulated over the years. It didn’t end well for Adobe, and it is unlikely to end well for you.