How big is the ideal dick…tionary?

Hey all,

As some of you know, I’ve been working on collecting leaked passwords/other dictionaries. I spent some time this week updating my wiki’s password page. Check it out and let me know what I’m missing, and I’ll go ahead and mirror it.

I’ve had a couple new developments in my password list, though. Besides having an entirely new layout, I’ve added some really cool data! passwords

One of the most exciting things, at least to me, is the passwords (story). Back in 2009 (I realize that’s a long time ago – my friends would yell ‘OLD!’ if I tried talking about it on IRC), 32.6 million passwords (14.3 million unique passwords) were stolen from These passwords were not encrypted/hashed and were stolen through, I believe, SQL injection. This attack was incredibly useful, at least from my perspective, because that’s a HUGE number of passwords. Basically, it’s a perfect cross section of the passwords people use when they aren’t restricted.

I’m mirroring a few versions of the password list on my password page, so go grab a copy if you want one (the full list is 50mb+ compressed). Just for fun, the top 10 passwords, which were used by 4.66% of all users on, were:

  1. 123456
  2. 12345
  3. 123456789
  4. password
  5. iloveyou
  6. princess
  7. 1234567
  8. rockyou
  9. 12345678
  10. abc123

Password coverage

When talking about dictionary sizes, the question often comes up: does size really matter? The answer, I’m assured by experts is, ‘yes’. But what’s the ideal size (for sanctioned penetrations, of course)?

So, here’s the question: how many accounts can be cracked with the top-X passwords? Let’s start by looking at a graph:

As you can see, there’s some definite diminishing returns there. I was actually excited that the graph looks exactly how I thought it’d look. Pretty sweet!

Now, let’s look at in a less exciting but more useful table form:

Passwords Coverage

What’s that mean? It means that if you take the top 10 passwords, you would have cracked 4.66% of accounts on The top 100 passwords would have gotten you 10.34% of the accounts, and so on. That’s cool to know, but isn’t as useful for penetration testing. Let’s go by coverage instead of count (I’ve included links to the password files, as well – the same links you’ll find on my wiki):

Passwords Coverage Download
134.99%rockyou-5.txt (104 bytes)
9210.00%rockyou-10.txt (723 bytes)
24915.01%rockyou-15.txt (1,943 bytes)
51220.00%rockyou-20.txt (3,998 bytes)
92925.00%rockyou-25.txt (7,229 bytes)
155630.00%rockyou-30.txt (12,160 bytes)
250635.00%rockyou-35.txt (19,648 bytes)
395740.00%rockyou-40.txt (31,220 bytes)
616445.00%rockyou-45.txt (49,133 bytes)
943850.00%rockyou-50.txt (75,912 bytes)
1423655.00%rockyou-55.txt (115,186 bytes)
2104160.00%rockyou-60.txt (170,244 bytes)
3029065.00%rockyou-65.txt (244,535 bytes)
4266170.00%rockyou-70.txt (344,231 bytes)
5918775.00%rockyou-75.txt (478,948 bytes)

This is essentially the same table – I just based the rows on the coverage instead of the number of passwords. With this table you can determine, for example, that to crack 10% of users’ passwords, you only need to try the top 92 passwords. I put the same table and links on my password page.

phpbb passwords

One last interesting change on my password page is the addition of Brandon Enright’s cracked phpbb passwords. As I’m sure you all know, Phpbb had its password list stolen some time ago (closing in on two years, maybe?). Since then, Brandon has been diligently working to crack every single md5 password, and has mostly succeeded (over 97% cracked, I believe). He was kind enough to share that list with me, and it’s now mirrored on my password page so check it out!


Join the conversation on this Mastodon post (replies will appear below)!

    Loading comments...