Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

no matter how elaborate a password you choose, as long as it is based on words and rules, even if there are many words and many rules, it will probably be cracked

So this is what I've been wondering about the current "best practice" to use long passphrases. How are those really any stronger than any other "rule" based password, the "rule" being that they are likely constructed of words and phrases from human language.

Would the passphrase "My first car was a 1972 Monte Carlo" really be harder to crack (once the cracking tools are adapted) than a random 8 character password?



Well, calculating the "true" strength is difficult to do, because even though sophisticated tools are available to aid the process, the attackers are still human, and can input their own guesses that may or may not be more accurate. If the attacker knows (or can closely guess) the password rules used to generate your password, he or she has a better chance of getting a hit.

Let's look at a password like "My first car was a 1972 Monte Carlo". The password is 35 chars, 3 upper case, 6 special (spaces), and 4 numbers. The key space is all upper and lowercase english letters, all numbers, and all special characters. That's a key space of 95 characters, over 35 places. Objectively, there are 1.66 x 10^69 possible combinations. Given that the LinkedIn password crackers are slowed down at about 9 chars it seems like you're incredibly secure. But let's assume the attacker knows something about your password structure. Let's say they know that you use words (many people do, so it's a reasonable guess). Let's also assume that for numbers the attacker knows that years are popular for password numbers. Now instead of 35 chars, your password has 7 words and a date. We've changed the key space from 95 to about 100,000. (The exact number of words there are is a tricky number to pin down, but crackers have some good data on what the most popular ones are.) As for the date, there are really only a couple hundred interesting numbers, including all dates from this and last century, as well as common patterns.

Password strength is (key depth) ^ (key length). An uninformed attacker has 1.66 x 10^69 possible combinations (95^35), while an informed attacker has roughly 1.0 x 10^40 possible combinations (100,000^8). Obviously, the less an attacker knows (or can guess) about your password structure, the better chances your password has against being cracked.

Now, you asked about your password versus a random 8 char password. Let's take a "strong" password like "1~qQ%57h" This password also has upper and lowercase letters, numbers, and symbols. We can assume that there is nothing predictable about this password for this exercise. The password strength is 95^8, or 6.6 x 10^15, obviously much lower than the longer sentence, even if the attacker knows the sentence is 7 words and a date.

Now remember, our passwords are being matched against human crackers attempting to guess the ways our passwords are most likely put together. For now, most passwords are 6-12 characters. In fact, most websites only allow passwords of these kinds, so it makes the most sense for crackers to go after these passwords. But it's still an arms race. If we assume that webmasters see the light and allow (or enforce) long, sentence-like passwords, the crackers will adjust. It's plausible I think that 5-10 years from now, we'll see articles like this one that use sentence structure syntax as an attack method.

Until we discover and implement a better system that obsoletes passwords, the best we can really do is have long, complex, and unique passwords for everywhere we go, and have a system to manage them for us. I believe that something like LastPass or KeePass are the way to go for now.

*Disclaimer: This was written on a groggy Sunday morning. Do not rely on my calculations. Do not use any of the examples as passwords. Do please check my work.


Beautifully written. Also worth noting is that sites exist that only use lower(trunc(password, 8), so your first 8 characters should be sufficiently random. For the grandparent, that leaves "my first", which is especially weak in a dictionary attack.


I don't get it. Is there a reason for some sites to actually do that? (considering that they don't store your password as plaintext)

I guess if someone stole their database it would be impossible to know your real password, but still...

Or am I missing something here?


Some sites lowercase all passwords after they are input to "help" users who hit caps lock or are otherwise challenged by case sensitivity. Then you have DES crypt (as once used by Gawker), which only uses the first 8 characters of the password. A site which uses either or both of these methods may happily let you type in a password of any length or complexity, but the version they use will have significantly lower entropy. I've even seen sites silently strip special characters.


> I don't get it. Is there a reason for some sites to actually do that?

Yes it's to save space ...

No, wait.

It's so they don't use all the CPU power ...

No, not that either.

It's because the programmer didn't want to use their braincells.

Yeah, that would be it.


I've heard of sites truncating to only use the first 8-12 characters as well. So if you are going to use lots of words, put them after a highly complex first 8 characters.


One improvement: for most people, the risk is not that someone tries to crack your password, it is that someone uses rainbow tables to crack many passwords, one of which may be yours.

Rainbow tables have a degree of freedom: the function that maps hashes back to passwords. You should try and pick a password that that function will never generate. To get that, do something unique. Good options, I think, are including a foreign language word (neither English nor your native language, nor the site's language), reversing a word or a syllable inside it, and made up words that have Hamming distance greater than two to any other 'obvious' word.

Short (<= 8 characters) passwords, I think, are bad choices for that reason, even if they consist of ASCII gibberish.

Disclaimer: I have never looked what kind of code commonly used rainbow tables use.


I thought GPUs killed rainbow tables? (the storage space alone makes them impractical compared to cracking realtime)


Not that that says much, but I am not aware of that. More importantly, googling for "GPU vs rainbow table" leads me to phrases such as "a fully GPU accelerated set of rainbow table tools". Or has the term changed meaning?



Thanks.


And a lot of systems (i.e. linux) will reject it as a "dictionary word", even if such words don't appear in its dictionary. Such a password in an obscure foreign language isn't going to be cracked until someone starts doing a full-space search, which is still very difficult once you have more than 8 characters, yet it will still be rejected.


Can you explain more about this rainbow table function? From what I understand, rainbow tables are simply precomputed hashes of common passwords. What you're saying is that we should use passwords that aren't in a rainbow table, which by definition implies that the passwords are not common.


Rainbow tables are a clever way to implement a time/space trade-off for finding the inverse of a hash value in general by doing a lot of precalculation (see wikipedia, the core nice idea there is explained under "hash chains" in the Rainbow Table page).

Besides, rainbow tables are supposed to be pointless because everyone's supposed to be using salt with their passwords...


I have no idea. I thought they were, but your comment made be do some really naive analysis:

For a typical password, each character can be one of around 92 characters, depending on what rules are in place - 26 lowercase letters, 26 uppercase letter, 10 digits, and ~32 special characters on the keyboard (I may have miscounted). Other characters could be used, but these are going to be the most common.

This means that your 8 character password can have about 100^8 possibilities. To put that into more familiar, and more easily comparable terms, that's 1x10^16 password possibilities.

According to Oxford Dictionaries, "The Second Edition of the 20-volume Oxford English Dictionary contains full entries for 171,476 words in current use." This means that, without reducing that space, a four word passphrase would have about 8.6x10^20 possibilities.

Admittedly, there are some massive problems here. The most obvious of which is the fact that most of those 171k words aren't words a normal person would use. For this to be a valid analysis, you would have to believe that the average person would pick a passphrase like "gastroenteritis jurisprudence algorithm aberration", which is clearly ridiculous. Also, most people would, like your example, use a grammatically correct sentence. The possible combinations would be pretty severely reduced in that case.

Now, more combinations are introduced by capitalization, punctuation, and the introduction of "numeric words", like the year 1972 in your example, but I have no idea how to account for that.

In either case, the average person is going to have a much easier time remembering "My first car was a 1972 Monte Carlo" than they will remembering "8gj2;hg^".


>you would have to believe that the average person would pick a passphrase like "gastroenteritis jurisprudence algorithm aberration", which is clearly ridiculous.

Oh how I wish my bank and mortgage lender would let me choose easy-to-remember passwords like that.


I might have to start throwing in a non-dictionary word here and there .. "Don't touch the Snorlax after 4:45" .. "The Grue desires my 25th Triforce" ..


Forget your rainbow tables and bring a pokedex!


> For this to be a valid analysis, you would have to believe that the average person would pick a passphrase like "gastroenteritis jurisprudence algorithm aberration", which is clearly ridiculous.

There are many more short words than long words, thus a person would be very unlucky to pull out that passphrase.

But what if you reduce the space? Instead of using a dictionary with about 175,000 words, why not use the Diceware list, which has only 7776 words? None of them are over 6 letters long (I think.) A few words are numbers; or have special characters.

Because many websites won't allow you to use a diceware passphrase you'd use a good password safe with a long diceware passphrase. You'd then let the safe generate random passwords for you.


With a passphrase, each word comes from a much bigger set than (alphanumeric + special characters), so it stands to reason that it'd be harder to brute-force. There are speech patterns, though, so it's likely that crackers would be able to reduce the search space somewhat by checking common phrases like "my first car".

But change to something like "my first grandma was a 1927 haircut" and you're likely to future-proof it significantly.


Or go completely nonsensical. "Vanilla elephant trampoline inverted cork routine" contains no common phrases and is just as easy to remember.


One of the tricks I’ve learned and I think it stands vindicated now is to use the English translation of my vernacular language words as passwords. They stick on to your memory unlike "My first car was a 1972 Monte Carlo"

I come from down-south India and I talk Malayalam.


Your passphrase reduces to the password Mfcwa1MC. Is Mfcwa1MC easier to crack than a random 8 character password? Even if the attacker "half-cracked" your passphrase and knew its initials, there's still more work to do.

Making an intelligent phrase will affect the distribution of initials, but even something commonplace like "the quick brown zebra jumped over the mooon" or tqbzjotm hits the less frequent letters like q and z. It won't be completely random, but it's going to cover way more of the 8 letter space than words are.


They way I see it, and I'm no expert on this topic, a longer password is better than a short, completely random one. The attacker doesn't know how long your password is, so he will start with short passwords. Each additional character adds a lot more possible combinations, so thats where you get your safety from. Now if you include lower/upper case letters, digits and special characters you have increased the search space as well, so the attacker will have to try even more combinations.


But that doesn't matter at all if the attacker is targeting your algorithm in particular.

Say my algorithm is to pick the password "1" * 1000 (that's the character 1 repeated 1000 times) and also pretend that 90% of the sites didn't have stupid limits and it was a valid password. It's certainly a long password. The time it would take to brute force it by testing all possible strings in order of increasing length is an unimaginable number. It's not on the scale of the universe - not on the scale of a million universes either.

But now let's say that this "the more characters the better" became a universal truth and everyone jumped on the same bandwagon and did the same quick hack of having 1000 1s. Suddenly, we're all screwed, because the algorithm "pick 1000 ones" is staggeringly weak. In fact, it provides no protection at all - the attacker already knows your password.

The true measure of security measures is not how long they last when no one knows about them - it's how long they last when everybody knows. "Pick 10 random symbols" will last for a while. "Pick 'password'", not even a second.

Where does "pick a meaningful English sentence" fall on the grand scale? That's one incredibly hard question to answer. It's also bloody difficult to break, for reasons of generating sentences, not password entropy.


But does it actually add entropy when a hacker could use a dictionary and combine those words in various ways? The best practice of using several random words is still 'rule based' - the individual 'units' in the password simply become words instead of characters, and the arbitrary length doesn't really matter. Start with the most common 40,000 words in English, and combine them in all possible orders - that gives 2.6e18 combinations. Compare that to the "random" password of length 10, with say, 40 possible characters: 1e16 combinations. I think OP has a point about the relative strength.


Add a few fun prefixes and suffixes hither and yon, and you largely eliminate the "token" nature of words as well. Even a couple of well-placed (but ordinarily inappropriate) uns, antis, disens, ousitys and ishnesses increase the problem space dramatically without significantly decreasing (and perhaps even increasing) the memorability.


> They way I see it, and I'm no expert on this topic, a longer password is better than a short, completely random one. The attacker doesn't know how long your password is, so he will start with short passwords.

Did you read the article? It describes exactly what a possible attacker does. And it's not "start with short passwords".

There's only two options:

- Use a really random password string, from a non-broken random generator

- Do something nobody else does

The latter only works if you can stop yourself from bragging about it on public fora. Which is why one of the best pieces of advice for secure passphrases is to include something really, really embarrassing, horrible, shameful, completely unfit for print and absolutely boring. Especially don't use a funny quip or play on words, don't try to be clever, there ought to be no audience to appreciate it. And if at all possible it shouldn't even look like a password.

(kinda OT) I read that advice many years ago, and I don't understand why Julian Assange did not take it to heart. Remember when that Guardian journalist wrote his book and published the passphrase to that AES encrypted data dump (because the nitwit assumed the AES passphrase would be automatically invalidated after a few hours ...), it was something like "a diplomatic history from <date>" with some random uppercasing, special characters, etc. It would have been pretty strong, except it was WAY too clever and typical-super-secret-password-looking to use for the sort of hypersensitive data Assange was carrying about. If he had simply picked some terribly bad and misspelled slashfic involving Martin Luther King, a dead baby and pres. Nixon--like Spider Jerusalem would've done--no way the Guardian journalist would have published that, anywhere.


obligatory xkcd reference: http://xkcd.com/936/


I've always disagreed with this XKCD. Given a passphrase dictionary attack, the passphrase would be discovered in less than a minute.

And technically, if you didn't know the format of the password, and you were just trying to get a random 11 character password, that would take a long time to crack. There are (roughly) 94 character that you could safely use for your password pretty much universally on any website...

94^11 = 5.06x10^21 which means if your computer can generate 2 million hashes a second it would take: 80 million years to crack a truly random 11 character password.

Passphrases are stupidly insecure unless you throw enough randomness in it.

ex(quotes included): "My Phone Number is `(123)546-8794!!!`"


Given a passphrase dictionary attack, the passphrase would be discovered in less than a minute.

Wait, what?

2,048^4 == 2^44 == 17,592,186,044,416

At 2 million hashes/second it would still take 101 [edit: actually, on average, 50] days to find this password, if it was unsalted. Perhaps if you had spent a few years of supercomputer time to generate some massive rainbow tables, you might be able to discover it quickly, but absent the need for your linkedIn password to be resistant to attacks from a nation state, you'd be pretty safe with such a password for a while.

It's entirely unclear how you came to the conclusion that it could be discovered in "under a minute" with a passphrase dictionary attack.


Diverging from your main point a bit: 2MH/s is unrealistically low. For a couple thousand dollars you can build FPGA HW that can do several billion SHA1 hashes/s. The bitcoin mining world is getting 400-450 SHA256 MH/s from a $130 chip. With similar technology, you can brute force a 2^44 SHA1 space in a lot less than 50 days.


Which is why no-one should be storing passwords in SHA1.


I see xkcd's passphrase is correcthorsebatterystaple and think that it is the wrong way to do it.

The memorization of that password would work much better than a simple passphrase like that.

I.E. the actual password would be:

    "That's a battery staple. Correct!"
And I don't believe that people will easily be able to crack that even with the minimal randomness that has been put in with current techniques. Sure if natural language cracking becomes popular you may have to become a little more creative like using a made up word or name or a number but even your example if no one knows what your password is:

    "My Phone number is (123) 546-8794."
should be sufficient for a very hard to crack password. And again is many times better than a simple dictionary passphrase with a few words combined.


And note that assumes there are about 2,048 common words to choose from, not the 171,000 you can find in a dictionary.


Calculating the entropy of such a passphrase is a bit of a black art, but note that it's a lot easier to remember than a random 8-character password!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: