|
Math: Factoring Details |
In dealing with
Mersenne numbers, it is easy to prove that if 2P
-1 is prime, then P must be a prime. Thus, the first step in the search is to
create a list of prime exponents to test.
The next step is
to eliminate exponents by finding a small factor. There are very efficient
algorithms for determining if a number divides 2p-1.
For example, let's see if 47 divides 223-1.
Convert the exponent 23 to binary, you get 10111. Starting with 1, repeatedly
square, remove the top bit of the exponent and if 1 multiply squared value by
2, then compute the remainder upon division by 47.
Square |
Remove top bit |
Optional mul by 2 |
mod 47 |
1*1 = 1 |
1 0111 |
1*2 = 2 |
2 |
2*2 = 4 |
0 111 |
no |
4 |
4*4 = 16 |
1 11 |
16*2 = 32 |
32 |
32*32 = 1024 |
1 1 |
1024*2 = 2048 |
27 |
27*27 = 729 |
1 |
729*2 = 1458 |
1 |
Thus, 223
= 1 mod 47. Subtract 1 from both sides. 223-1
= 0 mod 47. Since we've shown that 47 is a factor, 223-1
is not prime.
One very nice property
of Mersenne numbers is that any factor q of 2P-1
must be of the form 2kp+1. Furthermore, q must be 1 or 7 mod 8. A proof is
available. Finally, an efficient program can take advantage of the fact that
any potential factor q must be prime.
The GIMPS factoring
code creates a modified sieve
of Eratosthenes with each bit representing a potential 2kp+1 factor. The
sieve then eliminates any potential factors that are divisible by prime numbers
below 40,000 or so. Also, bits representing potential factors of 3 or 5 mod 8
are cleared. This process eliminates roughly 95% of potential factors. The
remaining potential factors are tested using the powering algorithm above.
Now the only question
remaining is how much trial factoring should be done? The answer depends on
three variables: the cost of factoring, the chance of finding a factor, and the
cost of a primality test. The formula used is:
factoring_cost < chance_of_finding_factor * 2 * primality_test_cost
That is, the time
spent factoring must be less than the expected time saved. If a factor is found
we can avoid running both the first-time and double-check primality tests.
Looking at past
factoring data we see that the chance of finding a factor between 2X
and 2X+1 is about 1/x. The
factoring cost and primality test costs are computed by timing the program. At
present, the program trial factors to these limits:
Exponents up to |
Trial factored to |
3960000 |
260 |
5160000 |
261 |
6515000 |
262 |
8250000 |
263 |
13380000 |
264 |
23390000 |
265 |
29690000 |
266 |
37800000 |
267 |
47450000 |
268 |
58520000 |
269 |
75670000 |
270 |
96830000 |
271 |
There is another
factoring method that GIMPS uses to find factors and thereby avoid costly primality
tests. This method is called Pollard's (P-1) method. If q is a factor of a
number, then the P-1 method will find the factor q if q-1 is highly composite -
that is it has nothing but small factors.
This method when
adapted to Mersenne numbers is even more effective. Remember, that the factor q
is of the form 2kp+1. It is easy to modify the P-1 method such that it will
find the factor q whenever k is highly composite.
The P-1 method is
quite simple. First, pick a bound B1. P-1 will find the factor q as long as all
factors of k are less than B1 (k is called B1-smooth). Second, compute E
- the product of all primes less than B1. Third, compute x = 3E*2*P.
Finally, check the GCD (x-1, 2P-1)
to see if a factor was found.
There is an
enhancement to Pollard's algorithm that uses a second bound B2 and will find
the factor q if k has just one factor between B1 and B2 and all remaining
factors are below B1. GIMPS has used this method to find some impressive
factors. For example:
22944999-1
is divisible by 314584703073057080643101377. 314584703073057080643101377 is 2 *
53409984701702289312 * 2944999 + 1. The value k, 53409984701702289312, is
very smooth: 53409984701702289312 = 25 * 3 * 19 * 947 * 7187 * 62297
* 69061
So how does GIMPS
intelligently choose B1 and B2? We use a variation of the formula used in trial
factoring. We must
maximize:
chance_of_finding_factor * 2 * primality_test_cost - factoring_cost
The chance of finding
a factor and the factoring cost both vary with different B1 and B2 values.
Dickman's function (see Knuth's Art of Computer Programming vol. 2) is used to
determine the probability of finding a factor that is k is B1-smooth or
B1-smooth with just one factor between B1 and B2. The program tries many values
of B1 and if there is sufficient available memory several values of B2,
selecting the B1 and B2 values that maximize the formula above.