Biz & IT —

How LinkedIn’s password sloppiness hurts us all

Second data dump lets hackers be 6 times better cracking future dumps.

How LinkedIn’s password sloppiness hurts us all
Laurie Harker, Minneapolis Star Tribune / Getty Images
Jeremi M Gosney (@jmgosney) is a world-renowned password cracker and security expert. He is the Founder & CEO of the password-cracking firm Sagitta HPC, and a member of the Hashcat development team. Jeremi also helps run the Security BSides Las Vegas, Hushcon, and PasswordsCon conferences.

Me: "The full dump from the 2012 LinkedIn breach just dropped, so you're probably not going to see much of me over the next week."

Wife: "Again?"

Yes, again. If you're just waking up from a coma you would be forgiven for thinking that it's still 2012. But no, it's 2016 and the LinkedIn breach is back from the dead—on its four-year anniversary, no less. If you had a LinkedIn account in 2012, there's a 98 percent chance your password has been cracked.

Back in 2012, fellow professional password cracker d3ad0ne (who regretfully passed away in 2013) and I made short work out of the first LinkedIn password dump, cracking more than 90 percent of the 6.4 million password hashes in just under one week. Following that effort, I did a short write-up ironically titled The Final Word on the LinkedIn Leak

But those 6.4 million unique hashes posted on a Russian password-cracking forum in June 2012 only accounted for a fraction of the total LinkedIn database. This second dump, on the other hand, contains 177.5 million password hashes for 164.6 million users, which aligns perfectly with LinkedIn's user count in the second quarter of 2012. After validating the data that I received with several individuals, I concluded that this does appear to be a nearly complete dump of the user table from the 2012 LinkedIn hack.

I say "nearly complete" because there are some e-mail addresses in the dump that do not have hashes associated with them (the hash was replaced with the string "xxx"), and there are also some hashes that are not associated with an e-mail address (e-mail address is NULL.) While I presume the hashes not associated with any e-mail address are deleted accounts, I cannot even venture a guess as to why some of the password hashes are missing. That's the way it goes when you're working with second-hand data from an unknown source—you just can't get a pristine database dump these days.

You may think that 178 million password hashes is a lot, and you wouldn’t be wrong. But some 362 million passwords, allegedly from Myspace, have recently been posted for sale on the darkweb elsewhere. What makes the LinkedIn breach more notable? While Myspace also acknowledged the breach, the data actually holds very little analytical value due to the fact the passwords were dramatically altered before being hashed. Those passwords were all converted to lowercase and truncated to just 10 characters, so it's impossible for us to know what the original input data was. Further, two of the top 10 passwords from the Myspace list appear to be created by spammers creating fake profiles and likely do not reflect the choices of actual end users.

So as it stands today, the LinkedIn breach is the largest and most relevant publicly acknowledged password breach in Internet history.

Password cracking and the age of enlightenment

As Ars explained a few months after the first batch of LinkedIn passwords spilled, password cracking is an endless feedback loop. We crack the passwords so that we can learn about passwords which helps us to crack more passwords, which we can then analyze and use to crack more passwords. We start off with a small amount of data that enables us to crack a small number of passwords. Those passwords then give us some insight into how passwords are created, which enables us to crack more in the future.

And it’s not just passwords we’re interested in, either. Any short, low-entropy, human-generated string—e.g. usernames and screen names, e-mail addresses, etc.—are all potentially useful. Similar to what we’ve learned in the absence of external factors such as password complexity policies, the username selection process is not all that different from the password selection process. The more data we can accumulate and analyze, the more successful we are at cracking passwords.

Back in the early days of password cracking, we didn't have much insight into the way people created passwords on a macro scale. Sure, we knew about passwords like 123456, password, secret, letmein, monkey, etc., but for the most part we were attacking password hashes with rather barbaric techniques—using literal dictionaries and stupid wordlists like klingon_words.txt. Our knowledge of the top 1,000 passwords was at least two decades old. We were damn lucky to find a password database with only a few thousand users, and when you consider the billions of accounts in existence even back then, our window into the way users created passwords was little more than a pinhole.

Those were the dark ages of password cracking. The age of enlightenment came after 32 million non-unique plaintext passwords from RockYou were leaked to the Internet. Suddenly that pinhole turned into a porthole, and for the first time in history we got a solid look at how users were creating passwords on a mass scale.

The RockYou breach revolutionized password cracking. No longer were we using crap like list_of_kitchen_appliance_manufacturers.txt for wordlists. Everyone was just using rockyou.txt, and they were cracking a significant percentage of passwords. Markov statistics, mangling rules, everything was being based off what we learned from the RockYou passwords.

The RockYou breach coincided with another turning point in password cracking history: the advent of general-purpose GPU computing. By harnessing the parallel processing capabilities of graphics cards we could now crack password hashes tens of times faster than with a regular CPU. Meanwhile, software like Hashcat helped bring GPU password cracking into the mainstream, displacing now-obsolete techniques like rainbow tables. Instead of pushing pixels, we were pushing RockYou-powered passwords, and we were cracking password hashes with unprecedented speed and success. This fueled a wave of new password research, and when other large password breaches came our way—eHarmony, Stratfor, Gawker, and LinkedIn, for instance—we were ready and waiting.

But most post-RockYou breaches have paled in comparison to the latest LinkedIn leak. Breaches from Zappos, Evernote, and LivingSocial (with 24 million, 50 million, and 50 million respectively) would have made for fantastic password statistics, except those hashes never saw the light of day. I'm sure the Adobe breach (at 130 million) was an amazing win for whoever stole the encryption key, but the rest of us are stuck playing a crossword puzzle. It’s certainly possible that there are some other large password databases slowly making their way across the darkweb from companies that don’t even know that they’ve been breached, but as far as confirmed data breaches go, RockYou was the previous password cracking standard for relevant and useful breaches.

Channel Ars Technica