Math: Factoring Details

 

FACTORING DETAILS

Forming a list

In dealing with Mersenne numbers, it is easy to prove that if 2P -1 is prime, then P must be a prime. Thus, the first step in the search is to create a list of prime exponents to test.

Trial Factoring

The next step is to eliminate exponents by finding a small factor. There are very efficient algorithms for determining if a number divides 2p-1. For example, let's see if 47 divides 223-1. Convert the exponent 23 to binary, you get 10111. Starting with 1, repeatedly square, remove the top bit of the exponent and if 1 multiply squared value by 2, then compute the remainder upon division by 47.

 

Square

Remove top bit

Optional mul by 2

mod 47

1*1 = 1

1 0111

1*2 = 2

2

2*2 = 4

0 111

no

4

4*4 = 16

1 11

16*2 = 32

32

32*32 = 1024

1 1

1024*2 = 2048

27

27*27 = 729

1

729*2 = 1458

1

 

Thus, 223 = 1 mod 47. Subtract 1 from both sides. 223-1 = 0 mod 47. Since we've shown that 47 is a factor, 223-1 is not prime.

One very nice property of Mersenne numbers is that any factor q of 2P-1 must be of the form 2kp+1. Furthermore, q must be 1 or 7 mod 8. A proof is available. Finally, an efficient program can take advantage of the fact that any potential factor q must be prime.

The GIMPS factoring code creates a modified sieve of Eratosthenes with each bit representing a potential 2kp+1 factor. The sieve then eliminates any potential factors that are divisible by prime numbers below 40,000 or so. Also, bits representing potential factors of 3 or 5 mod 8 are cleared. This process eliminates roughly 95% of potential factors. The remaining potential factors are tested using the powering algorithm above.

Now the only question remaining is how much trial factoring should be done? The answer depends on three variables: the cost of factoring, the chance of finding a factor, and the cost of a primality test. The formula used is:

        factoring_cost < chance_of_finding_factor * 2 * primality_test_cost

That is, the time spent factoring must be less than the expected time saved. If a factor is found we can avoid running both the first-time and double-check primality tests.

Looking at past factoring data we see that the chance of finding a factor between 2X and 2X+1 is about 1/x. The factoring cost and primality test costs are computed by timing the program. At present, the program trial factors to these limits:

Exponents up to

Trial factored to

3960000

260

5160000

261

6515000

262

8250000

263

13380000

264

23390000

265

29690000

266

37800000

267

47450000

268

58520000

269

75670000

270

96830000

271

P-1 Factoring

There is another factoring method that GIMPS uses to find factors and thereby avoid costly primality tests. This method is called Pollard's (P-1) method. If q is a factor of a number, then the P-1 method will find the factor q if q-1 is highly composite - that is it has nothing but small factors.

This method when adapted to Mersenne numbers is even more effective. Remember, that the factor q is of the form 2kp+1. It is easy to modify the P-1 method such that it will find the factor q whenever k is highly composite.

The P-1 method is quite simple. First, pick a bound B1. P-1 will find the factor q as long as all factors of k are less than B1 (k is called B1-smooth).  Second, compute E - the product of all primes less than B1. Third, compute x = 3E*2*P. Finally, check the GCD (x-1, 2P-1) to see if a factor was found.

There is an enhancement to Pollard's algorithm that uses a second bound B2 and will find the factor q if k has just one factor between B1 and B2 and all remaining factors are below B1. GIMPS has used this method to find some impressive factors. For example:

22944999-1 is divisible by 314584703073057080643101377. 314584703073057080643101377 is 2 * 53409984701702289312 * 2944999 + 1. The  value k, 53409984701702289312, is very smooth: 53409984701702289312 = 25 * 3 * 19 * 947 * 7187 * 62297 * 69061

So how does GIMPS intelligently choose B1 and B2? We use a variation of the formula used in trial factoring. We must maximize:                    

          chance_of_finding_factor * 2 * primality_test_cost - factoring_cost

The chance of finding a factor and the factoring cost both vary with different B1 and B2 values. Dickman's function (see Knuth's Art of Computer Programming vol. 2) is used to determine the probability of finding a factor that is k is B1-smooth or B1-smooth with just one factor between B1 and B2. The program tries many values of B1 and if there is sufficient available memory several values of B2, selecting the B1 and B2 values that maximize the formula above.