Search for question
Question

Algorithms for Data Science Lab 3 RSA Cracking In this lab, you will gain a better understanding for how quickly running times increase for exponential, O(2"), algorithms. We will explore this by looking at cracking RSA encryption. In 1977, Ronald Rivest, Adi Shamir, and Leonard Adleman came up with the now famous RSA encryption algorithm (based on the work of mathematical greats like Euler and Fermat). This algorithm is one of the cornerstones of public key encryption (PKE). In fact, this algorithm is so important that in 2002, the authors were awarded the highest honor one can receive in the field of computer science: the Turing Award. The underlying principle of RSA is that a one-way computation is performed that is, something that is easy to do in one direction but impossible (or very, very difficult) to reverse without the proper key. The computation used is the multiplication of prime numbers. For example, take the two prime numbers P=13 and Q=23. PQ=13*23=299. This is a simple calculation to perform. A computer can do it with ease, and a human could even do it with nothing more than paper and pencil. However, the reverse operation of finding the P and Q given only PQ is very difficult, even for a computer. This is called finding a prime factorization of PQ. Try it: Given a PQ of 203, what were the P and Q? Give up? How about P=7 and Q=29? How did I know this? Because I picked P and Q in the first place and cheated! OK, so a human has a hard time finding P and Q from PQ. But a computer can do it easily, right? Well, a computer can do it, but it takes a lot of time. And the time it takes is proportional to the length (in bits) of PQ. So, a computer could crack the example given above because P and Q are small numbers. But if a PQ were picked so that it contained 1024 bits or more (e.g., a 1024-bit number can have a range of [0...2^1024], or up to 309 decimal digits!), it would take a computer longer than the estimated age of the universe to crack the code. 1024 is actually a typical key length value for many encryption systems. In addition to finding primes P and Q, as well as PQ, the RSA algorithm also has two other values it needs to compute: E and D. Our public key will be the pair (E, PQ), and the private key will be the pair (D, PQ). P and Q are used to generate all three of these numbers but are then thrown away after PQ, E, and D are found. E and PQ are public, and anyone can use them to encrypt a message to send to the person holding the matching private key. D is private and is used along with PQ to decrypt a message encrypted with E and PQ. So, you never want to give out D. And the only way the encryption code can be cracked is if someone can recreate the factors P and Q from their product, PQ. The details of E and D are not important for this lab. For this lab, you are going to time how long it takes to crack RSA encryption keys. The first step will be to create the key in the first place. To do so, you need to create two functions. The first is def is Prime (p) : This function should take a number, p, and return true if it is prime and false otherwise. You should do this by simply trying to divide p by all the numbers between 2 and p (or if you want to get fancy, the sqrt(p)). If any of them evenly divide it (p%i == 0 ), then it is clearly not prime. If you make it through all the numbers, then it is prime because the only numbers that evenly divide p are 1 and p itself. The second function is def nBit Prime (n): This function should generate a random prime number that is up to n bits long. An n-bit number can range from 0...2^n. For example, if n is 8, then the number can be from 0...256. To do this, first generate a random float using random.random(). This generates a number between 0.0 and 1.0. Then multiply it by 2**n to get a random number in the range we are interested in. If this number is >= 2 and prime, then return it. Otherwise repeat and generate another potential prime random number. Eventually it will generate and return one. After you can generate large n-bit length prime numbers, generate two of them (P and Q), and multiply them together to form PQ. For example, to generate PQ from two 20-bit primes: pq nBit Prime (20) * nBit Prime (20) = Up to this point, all the code you have written would have been done when creating the public/private key pairs. P and Q would then be thrown away, and PQ would be shipped out as a public key that everyone had access to. Thus, in order to “crack" RSA encryption, all one needs to do is to factor PQ into the P and Q that formed it. It turns out that PQ has exactly two factors, P and Q, because of how it was formed from primes. So, all you need to do to factor PQ is to try to divide it by every number between 2 and PQ and find the one that evenly divides it. That will be P, and Q would then be PQ/P (or P and Q could be reversed, but that doesn't matter). I want you to write that function. Have it return both P and Q. def factor (pq) : After the factor is working, you should now time how long it takes to perform just the factor step. The factor step would be the “cracking” step. Use the timing setup from the previous lab to time how long this function takes to run in milliseconds. I then want you to create and factor keys for increasing bit lengths. Start with something small like 15 bits and work your way up until it starts taking longer than around 5 minutes to factor. You should see that the times increase rapidly. In fact, this is an exponential algorithm, O(2"). Most people don't fully understand just how quickly exponential curves increase. To demonstrate this, I want you to place your timing results into a program that can fit curves and provide predictions. Excel will do this as well many free and online curve fitters. You will have pairs like 15 0.2 16 1.1 17 1.4 18 10.6 19 30.2 20 46.6 Put them into the curve fitter, and fit an exponential curve to the data. For Excel, creating a scatterchart is often the best choice for this type of data. When it plots it will show only the collected data points so you will want to add a trend line that connects the points together (right-click the grid points to add a trend line). Note that Excel is picky about its exponential fitting algorithm. If your data has zeros and perhaps other duplicate values, it won't offer you exponential as a choice. If this happens, then the problem is most likely your data at the start of your run. Simply make the chart without this starting data so you can fit an exponential curve to the remaining data. Once you have the trend line, you can estimate how long it would take to crack a key of any length (the forecast option of trend lines). I want you to give me an estimate for how long it would take to crack a 1024-bit key (a typical length for many encryption systems). I want this estimate in both milliseconds and in years. Please include this number as a comment in your code. Keep in mind that what you are really plotting is the relative time it takes to do each crack-that is, as the bit length goes up, the cracking time goes up by some percentage. Because of this, you must run all your timing tests on the same computer system. They will be meaningless if you run half of them on your desktop and half on your laptop. It doesn't matter what machine they are run on, it just needs to be the same machine for all the tests.