Building a Wordlist using Youtube
Thursday, September 13, 2012 | Author: Deep Flash
A good wordlist is essential to increase the probability of cracking hashes. It becomes even more significant based on the type of algorithm you are attacking.

For fast algorithms such as unsalted MD5 and SHA-1, a bruteforce attack for passwords of upto length 9 using a fairly recent GPU is feasible these days.

A good ruleset can bring the best out of your wordlist. While your wordlist may not be very good, an efficient ruleset will help in increasing the probability of cracking hashes using it.

Here comes the important point, when you are attacking algorithms like SHA512 (crypt), bcrypt, WPA/WPA2 handshakes, your reliance on the wordlist increases even more.

Till date, these algorithms have not been accelerated (either on GPU or CPU) to an extent that will allow you to bruteforce them or even run hybrid mask attacks and word mangling rule based attacks.

Your chances of cracking a WPA/WPA2 handshake is as good as your wordlist. If the passphrase is not there in your wordlist, then you can move on to capturing another WPA/WPA2 handshake.

Note: Elcomsoft has demonstrated that cracking of WPA/WPA2 handshakes can be accelerated using an FPGA or an Array of FPGAs, but even this set up does not allow rule based attacks in a reasonable time span.

This is the reason, we need a better wordlist.

Not long ago, someone made a blog post about using Twitter to build a wordlist. His idea was based on using the Search API provided by Twitter for Developers, who can query for a specific keyword and retrieve 'N' number of recent tweets.

While this method is effective indeed, your success also depends on the keywords you search for.

The URL provided by Twitter Search API to make queries looks like:

http://search.twitter.com/search.json?q=&rpp=

It sends a JSON response which can be easily parsed using the JSON libraries provided by the scripting language you are using.

In Perl, with JSON and JSON::XS, we can easily parse the response.

If you have ever used Twitter's Search API to build a wordlist, you might have encountered a Rate Limit.

Rate Limits are set by Twitter on the Server Side to prevent the abuse of the Search API, so that developers do not write a code which sends a huge amount of requests to fetch tweets and end up slowing down the Server.

The advantage for us is, Twitter Search APIs do not have strict rate limits. The count of the number of requests you can make has not been made public but it is higher than the limit set for other APIs like the REST API.

They also provide a good documentation which states that an HTTP Header field will be set in the Response from Twitter Server once you reach the rate limit.

Retry-After: x number of seconds.

This is useful for developers because now they can include checks in the scripts to look for this field in the HTTP Response Headers and wait before executing the next operation.

In Perl, it would be:


The value of the Retry-After header depends on the number of tweets you are requesting from the Search API. I observed this based on my experiments.

A quick lookup on Wikipedia for Twitter tells us that there are 500 Million Users on Twitter (as per their last update). This means, a strong wordlist.

Another good point about using Twitter's Search API to build a wordlist is that, it is dynamic. This means, two requests sent to Twitter for the same keyword at different points of time, will give different results.

Unlike, building a wordlist using Facebook by crawling the Names Directory where the results returned are based on the directory that you are crawling, with Twitter it is easier since it is more dynamic.

We can get much more words with lesser work :)

If you feel that the Twitter API is not so developer friendly due to the rate limits imposed by it, then welcome to GData API of Youtube :)

I use Youtube to build my wordlist. Every user on Youtube has the option to subscribe to other channels on Youtube.

Each channel will have a unique name.

While it is not possible to query a particular channel using the GData API and retrieve all the subscribers of that channel. It is still possible to view the list of all the subscriptions of a particular user.

Changes are often made to Youtube's GData API, so it cannot be said for how long this feature will be kept.

There are also other restrictions imposed by Youtube which make it difficult to build a wordlist using it.

1. The maximum number of subscriptions you can view in one request = 50.
2. You cannot view subscriptions beyond index 1001.
3. The JSON response from Youtube is much more complicated as compared to the response from Twitter's Search API. However, once you have figured it out, it should be easy.

Below screenshot shows the code in action:



Due to the large user base of Youtube, this script would need to run for months.

I shall update the statistics regularly in this blog post.

That's all for now :)

Thank you for the APIs to Youtube and Twitter.

Update #1 (as of 19th September 2012)

Twitter - 7527213 (approx 7.52 Million)

Update #2 (as of 19th October 2012)

Twitter - 8898721 (approx 8.89 Million)
Youtube - 218902

Important Observations:

In the case of Twitter, a lot of duplicate usernames will be collected while running the script. The count posted above, filters all the duplicates.

Also, most of the usernames will have an underscore character in them which would replace the spaces in between the username.

To process the wordlist, I remove all underscores and convert all the entries to lowercase. More combinations will be generated by rulesets at the time of cracking.

cat twitter.txt | sed -e 's/_//g;' | perl -pe '$_=lc($_)'

Listening Now: Lee Jung Hyun - Bakkwo
Cryptography - Personal Notes #1
Monday, August 27, 2012 | Author: Deep Flash
These are my personal notes for Cryptography and any other subject related to Cryptography. They are for my reference however the fact that they are available here might help someone else who stumbles across this blog.

These notes are short lines, summarized to understand complex subjects. Mathematical Representations are used where required.

The content of this blog evolves with time. More information shall be added.

bcrypt hashing algorithm:

Designed by Niels Provos and David. Paper presented in Usenix in 1999.

Based on Blowfish 64 bit symmetric key block cipher.

It uses a similar key schedule algorithm as used in Blowfish, however slightly modified. Often referred to as an Expensive Key Schedule. This is the step that makes bcrypt hashing algorithm computationally intensive.

Similar to other block cipher based hashing algorithms like DEScrypt, it shares a few common properties:

Makes use of a 16 round Fiestel Network.
S Boxes (consists of 4 S Boxes) and P Array (Consisting of 18 Subkeys)
A key schedule which is used to derive the S boxes and P arrays from the encrypted key.

The differences are as follows:

Unlike DES, the S Boxes in bcrypt are not fixed. They depend on the encrypted key and the salt. Even though the 4 S Boxes are initialized with the hexadecimal digits of the number Pi, they are later modified using the bits of the salt.

Each S box consists of 256 32 bit words (dwords).

Size of 1 S Box = 256 * 32 bits = 8192 bits = 1024 bytes = 1 Kb
There are a total of 4 S Boxes.
Total Size = 4 Kb

During the expensive key schedule setup, the contents of these S boxes are modified constantly using the 128 bit salt (64 bits or 2 S box entries at a time).

This makes the bcrypt hashing algorithm memory intensive and hence it cannot be accelerated on GPU easily.

Similar to S Boxes, the contents of the P-Array are also modified using both the salt and the variable length key.

P Array consists of 18 sub keys.
Size of 1 sub key = 32 bits
Size of the P Array = 72 bytes

Number of Blowfish Encryptions per bcrypt hash:

There are 3 different types of invocations of the ExpandKey function in the Eksblowfish function:

1. ExpandKey(state, salt, key) - It uses the current key schedule along with the 128 bit salt and the variable length key to modify the contents of S boxes and P Arrays.

There is only one invocation of the ExpandKey function with the above arguments.

2. ExpandKey(state, 0, key) - Here the second argument of the function is 128 0 bits. Using the current state of the key schedule, it modifies the contents of the S Boxes and P Arrays. Since, the salt consists of 0 bits here, as a result of that, the key schedule setup is similar to that in Blowfish Block Cipher.

Reason being, during the key schedule setup, the XOR operations which are performed with the bits of the salt will now have no effect since, an XOR operation between a number and 0 gives back the number itself.

3. ExpandKey(state, 0, salt) - This is similar to the above function call, except for the fact that here the 16 byte salt is used as the encrypted key. The key schedule in this case is also the same as Blowfish Block Cipher.

Based on the work factor mentioned in the bcrypt hashing algorithm, the total number of iterations are decided.

Lets assume a work factor of 10 (which is realistic as per modern day systems).

This means, there are a total of 2^10 (1024) invocations of the ExpandKey(state, 0, key) and the ExpandKey(state, 0, salt) functions.

Now, let us calculate the total number of blowfish encryptions per bcrypt hash.

For one invocation of ExpandKey(state, salt, key) we have:

Each time, 64 bits of the salt are blowfish encrypted, they are used to replace the contents of 2 Sub Keys.

1 Blowfish Encryption = 2 Sub Keys replaced

Since there are a total of 18 sub keys, we require 9 Blowfish encryptions of 64 bits of the salt to replace all the Sub Key entries in the P Array.

At this point, we have,

9 Blowfish Encryptions = Replace all the entries in the P Array

Each time, 64 bits of the salt are blowfish encrypted, we can replace, 2 Sbox Entries

1 Blowfish Encryption = Replace 2 Box Entries

Since we have a total of 256 S box entries or 128 2 S box entries, we have,

128 Blowfish Encryptions = Replace 128 2 Sbox Entries = Replace all the entries of 1 S box.

Since we have a total of 4 S boxes, we have,

128*4 = 512 Blowfish encryptions to replace all the entries in all the sboxes.

Putting it together, we have,

9+512 blowfish encryptions to replace all the entries in the S boxes and the P Array in key schedule.

Now, we have two more ExpandKey functions, each called 1024 times (assuming a work factor of 10),

Each of these ExpandKey functions also require (9+512) blowfish encryptions to replace the contents of all the S boxes and P Arrays:

At this point, we have,

9+512 + 1024 * (512+9) * 2 blowfish encryptions to replace all the entries of the S boxes and P Arrays in bcrypt.

bcrypt hashing algorithm also encrypts a 192 bit constant value 64 times.

This 192 bit constant value = OrpheanBeholderScryDoubt

Since, Blowfish is a 64 bit block cipher, this 192 bit value will be blowfish encrypted in 3 iterations (in each iteration 64 bits of the constant value are encrypted).

Also, there are a total of 64 iterations where this 192 bit constant value is encrypted.

So, a total of:

64*3 blowfish encryptions to encrypt the 192 bit constant value in bcrypt.

Once again, putting it altogether, we have,

9+512 + 1024 * (512+9) * 2 + 64 * 3 = 521 + 1067008 + 192 = 1067721

Or, if we consider a work factor of 5, then it is:

9+512 + 32 * (512+9) * 2 + 64 *3 = 521 + 33344 + 192 = 34057

This value is in place with the calculation done by Solar Designer.
c0d3inj3cT@Crack Me If You Can 2012
Sunday, July 29, 2012 | Author: Deep Flash
Crack Me If You Can contest at DEFCON 2012 got over just a few moments back. Like last time, I participated from InsidePro. We ended up in the third place.

More statistics here: https://contest-2012.korelogic.com/stats.html

Congratulations to Hashcat and john-users :)

The start of the contest was not good for me. I was all set and ready for the contest to begin and then 5 minutes into the contest after Korelogic released the key to decrypt the PGP'ed tarball, there was a Network Outage in my area. How much can Luck Suck? Well, here is your answer.

At first, I thought it would be resolved within a few minutes. It did not. Then, a few hours? Well it did not. Time was running out and I tried all possible alternatives to connect to Internet so that I could participate in this contest, but nothing seemed to work. It was now, 5 hours into the contest and I was still without Internet Access. This was a mood spoiler indeed!!

After about, 8 hours I managed to find a stable alternative Internet Connection (which would not ditch me :D ).

This year I wanted to see how many hashes I can crack alone to know how many points I would have earned had I participated alone in the contest. I went through various hash types I cracked during the contest and after compiling the list, I am rather happy :D

For the majority of the contest, I focused on the heavy algorithms like MD5 (Unix), MD5 (APR), DCC2, OpenBSD and SHA-512 Unix and here is a brief overview of my statistics:

MD5 (APR) - 415  
MD5 (Unix) - 349
DCC2 (Mscash2) - 17  
DCC (Mscash) - 598  
DES - 387  
SHA-512 Crypt - 4  
OpenBSD (Blowfish) - 2

and along with this the other easy algorithms like MD5, SHA1, NTLM, MD4 and so on.

Applying the Point Distribution to the above stats and adding approximately 2000 points to it for the easy algorithms which I cracked, the total comes to: 4,15,683 points. This is more than a quarter of the Maximum Total Password Crack points earned by one Team in the contest.

This looks quite good to me, considering I was around 8 hours late into the contest :D

I am happy for john-users, because in this contest they performed really well with the toughest algorithm hashes like SHA-512 Crypt, Blowfish, Sun-md5. This was also a good time for Solar Designer and other developers of JtR to test their OpenCL and CUDA Code Implementations in JtR. I am not sure, how many of these hashes were cracked actually by them on the GPU Platform.

What did I find interesting about this contest? 

1. The password patterns were much more realistic than last year's contest. I believe the huge password leaks (LinkedIn/eHarmony/last.fm) right before the contest helped the organizers in coming up with a wide variety of password patterns.

2. The iteration counter (work) for the Blowfish hashes was increased from 5 to 8.

Last year, we had bcrypt hashes as, $2a$05$ (2^5 iterations).

However, this year, it was $2a$08$ (2^8 iterations).

3. The point distribution among the different hash types was good. However, the only hash type for which they need to reconsider the number of points given is DCC2.

Note: Even though this hash type can be accelerated by implementing the algorithm on GPU, there were minimum number of cracks for this particular hash type (excluding bcrypt) by any Team (with the maximum being 96)

Each DCC2 hash for 2000 points is a fair enough point allocation according to me. Though it appears that the point distribution has been done more according to the hash algorithm difficulty than the difficulty of the passwords specific to that hash algorithm.

What makes me curious in this contest? 

1. There were certainly many aspects of this contest which made me curious. However, from the top of my head, I can recollect one specific password pattern. There were many hash types with difficulty level varying from easy to medium which had many digit passwords of lengths, 8,9,10,11 and 12. It is possible to crack these easily using a mask attack with oclhashcat-plus and a fast enough GPU.

However, I found these password patterns for the heavy algorithms like MD5 (Unix) and MD5 (APR) as well. While running certain rule attacks on the already found passwords, I discovered a couple of hashes for above algorithms with 10 digits in them.

For instance:

$apr1$IlZ3iLOl$WKJ5N0j5QzdmFb4fVIa/p/:8979570490 $apr1$FuHAQ3Dw$3oTO0YexLzL/FWbeuu12C1:3059872160

It makes me wonder, were they really expecting us to Bruteforce this hash type as well or was there some pattern in these passwords which we had to identify? It would be interesting to know more about this.

What were the passwords to the toughest algorithms like Blowfish, sun-md5 and SHA-512 (Unix)?

I guess we are all eager to know the answer to this question. The only passwords I was able to crack for these algorithms were all lowercase words or a simple rule applied to a lowercase word. For instance, 2 of them were:

$6$ad.V4U6/ru/mYEZp$AQza3gdhwEFu2JubVaBGZ2H4Rcqu7ijW.1NJ6RubEerKDdQ1ukC6/uzmjOjFUE.CQyDnqWpilk4jfO5.wVFjX/:@password
$6$xX1ZwbZCduQJ6bOG$K37gaEJAwJxcwGreloZtmePJBIS.89PNpD4im.obF5YcjdPa5uzuqr
Ws1LkdBmmgege0SOCe/sIhYq1u9Jvju0:jpassword

So, according to me, we were supposed to analyze the already cracked passwords and identify base words (such as "password" above) and apply simple rules to that like prefix and append a ?d,?l,?s.

What would have made this contest better for me? 

A Good and Reliable Internet Connection! Making my way into the contest 8 hours late did not allow me to give sufficient time to analyzing passwords and patterns. I hope to have a backup connection, or rather multiple backup connections in future.

That's all for now. A more detailed writeup with various password patterns used in the contest and how to crack them later.

Thanks to the organizers, Korelogic, it was a good experience.

c0d3inj3cT
A Chase to Backup!
Monday, March 12, 2012 | Author: Deep Flash
Today was one of my worst days. The reason being, my laptop's hard disk crashed in the morning. This may not sound like a major incident to most. But it can be, if that hard disk contains your research work of years, your precious data and lots of other stuff.

At first, it was hard to believe. Now that I have got some time to get myself back together, I am in a position to highlight the sequence of events:

1. Wake up at 8 AM in the morning.

2. Start my laptop. And while it boots, I continue with my other daily routine.

3. When I come back, I notice that the system restarted and I am presented with Windows Error Recovery Menu. It provides 2 options:
a) Startup Recovery Mode
b) Start Normally

4. At this point, I did not really take it seriously. So I decided to go for the 2nd option just to make sure if it was not a persistent issue. I also wanted to check that what exactly made the system restart on its own.

5. And there you go, right after Windows Logo appeared, a Blue Screen of Death flashed in front of me and system restarted again.

6. Now was the time to get serious. So I tried to keep my cool and did not let the negative thoughts in my mind take over me. I calmly chose the first option on boot menu, "Startup Recovery Mode".

7. And to my surprise, there was an error thrown during this process stating that it did not recognize my 2 HDDs. Ok, some thing is getting wrong now!

Precise Error Message:

"REM Couldn't perform screening because couldn't detect 2 HDD on this system"

Bang! Taken a back!

8. They say, "hope is good". So, I restarted the machine and once again chose the first option from Error Recovery Menu and it's almost like the system was waiting for me to do that. I was presented with the same error message once again!

Panic time yet? I said to myself, "Well this looks serious, but it is still possible to fix it"

9. As a next step, I restarted and hit the F2 key to enter the Safe Mode. While booting in Safe Mode, Win 7 shows you the list of all the system drivers it is loading. And bang! once again, another BSOD! I noticed that while it was trying to load the classpnp.sys driver, it hung. And then BSOD appeared!

Ok, losing my cool now. But in the midst of all this troubleshooting, I did not have the time to think of the consequences. You only think of them once the consequence itself is faced.

10. Going further, I rebooted and pressed F10 to enter the BIOS Menu. Seleted the Diagnostics Tab from the menu.

11. Time to test the Hard Drive for any errors. There are multiple tests provided by the machine for the hard drive. A Quick one and the other Full one.

12. HDD passed the quick test however in the full test I received this message at the end: "Hard Disk 1 Full (305)"

13. After looking up this code on the vendor's site, BOOM! All hope gone at once!

The vendor site said, Hard Disk needs to be replaced!

Lost my cool, panic time started.

The reason for panic was, I had the consequences right in front of me once I read the meaning of that error message. I had not taken any data backup!!

14. What am I going to do now?

15. Ok, so I had an Ubuntu 7.10 CD lying somewhere. I decided to boot using it as a Live CD. And following was the sequence:

fdisk -l

/dev/sda2 -> my hard disk (640 GB)

I felt safe :)

fdisk -s /dev/sda2

640 GB

Felt safe again.

cd /mnt
mkdir windows
mount -t ntfs-3g /dev/sda2 /mnt/windows

ntfs-3g is not installed. You can install it by running the command:

apt-get install ntfs-3g

So, since I was using a Live CD. I could not connect to the net and download/install this package.

16. Next, I downloaded a System Rescue CD. Went outside, bought a couple of blank CDs and DVDs.

System Rescue CD is meant for such situations. Where you need to take a backup of your hard drive or if you want to create an image of the hard disk for some other purposes like forensics.

It comes installed with ntfs-3g and also has the mount point for windows (/mnt/windows) created. So, all you have to do is, mount your hard drive.

17. I booted my laptop using the System Rescue CD now.

After getting to the prompt, I typed in:

mount -t ntfs-3g /dev/sda2 /mnt/windows

What do you expect to see as the output?

If all goes well, you should be back at the prompt with no message displayed to you.

In my case, I was neither back at the prompt and neither was I presented with any error message.

"No, please, I hope it is not what I am thinking it is!". This is what was going on in my mind.

18. I closed my eyes and next when I looked at the console. I had a long array of messages stating that there appears to be a problem with the NTFS file system and I should check it with chkdsk utility of windows.

I knew the hard drive was gone.

This was really a hard day for me. It is hard to accept but I have no other option.

As a coincidence, I had it on my mind to take a backup of all the data on my hard drive, since it had been a while.

Infact, I had it planned for this weekend. But, I never got a chance to do it!

I can end up this article by stating a line said by Tyler Durden to Jack in the movie, Fightclub:

"The things you own end up owning you"
Music to Trigger Reminiscences
Friday, January 13, 2012 | Author: Deep Flash
Giorgio Moroder's Masterpiece, Chase (The Midnight Express Soundtrack) is the key to it.

It does not bring back old memories but rather takes me back to them. I cannot define the level at which I feel connected to myself and my past when I listen to this song.

This is also the last song I listened to, with my father a few nights before he passed away. I remember explaining him that how this song makes me feel so special and I remember him smiling. Those vivid visions.

Listening Now: Giorgio Moroder - Chase