# Comparison of two arbitrary-precision arithmetic packages

We have presented a demo in which a calculator implementing arbitrary-precision arithmetic using our own EDU.ksu.cis.calculator.LargeInteger class significantly outperforms a calculator using the java.math.BigInteger class distributed by Sun Microsystems, Inc. Here, we discuss why this comparison is unfair, then give a more detailed comparison of the two classes.

The main reason the comparison is unfair is that it unfairly exploits the difference in how the two classes represent arbitrary-precision integers. Sun's BigInteger class stores all values in binary. While this tends to allow efficient individual calculations, it means that a radix conversion must be done before the value can be displayed. This radix conversion is done whenever a value is converted to a String, even if the String gives the binary representation. With a calculator application, each result must be converted to a String in order to be displayed. As we discuss in more detail later, radix conversions tend to be significantly more expensive than arithmetic operations such as multiplication or division.

Our LargeInteger class allows values to be stored in any radix up to 256. (More precisely, the values are stored as arrays of bytes, each byte containing the value of i consecutive base-b radix positions, where i is the largest integer such that bi ≤ 256. For example, if b = 10, each byte contains a value less than 100 representing two consecutive digits.) Thus, expensive radix conversions can usually be avoided. For example, if all input values are presented in decimal and all output values are required to be presented in decimal, no radix conversions will need to be done. Furthermore, when radix conversions do need to be done, our algorithm is much more efficient than the one used in the BigInteger class. We will discuss this difference in more detail later.

In order to compare the costs of the arithmetic operations themselves, we should instead use a program that does not display the results (or at least does not display the result of each operation). Short of this, we can examine and analyze the algorithms used in the respective classes. Prior to version 1.3 of JavaTM, the mutliplication and division routines in the BigInteger class were written in native code, to which we do not have access. However, with version 1.3 (which is used by the applets in the demo), the algorithms are implemented entirely in Java and included in the freely available source distribution. We can therefore compare our algorithms with those used in this version of Java.

A thorough comparison of the two classes should also include a comparison of their capabilities. Sun's BigInteger class contains methods for modular arithmetic, primality testing, and random number generation which we do not include in our LargeInteger class. The reason for the difference has to do with the purpose of each class. Sun's BigInteger class is intended to be a general-purpose class for representing and manipulating arbitrarily large integers. In particular, it provides methods that are useful in cryptographic applications, perhaps the most common applications in which large integers are needed. Our LargeInteger class was written specifically for our arbitrary-precision integer calculator application. More precisely, we wanted to implement some efficient algorithms for multiplication, division, and radix conversion, and we felt like a large-number calculator using a class that represents large integers in arbitrary radix would be a nice way to show off the implementations. One reason that we did not implement any of the other methods is that we know of no better algorithms than those used in the BigInteger class, so we were not particularly interested in implementing them.

The remainder of this document is organized as follows. First, we discuss in turn the algorithms in both classes for multiplication, division, exponentiation, and radix conversion. For each pair of algorithms, we analyze the time complexity and compare the strengths of the respective implementations. We then summarize our analyses and discuss possible improvements to our implementations.

## Multiplication

The multiplication algorithm used in Sun's BigInteger class is essentially the standard algorithm taught in elementary school. Given two integers of lengths u and v, the algorithm operates in O(uv) time. If we let n = u + v, the worst case occurs when u and v are about the same. In this case the algorithm operates in O(n2) time. However, if one of the two numbers is very small, the performance approaches O(n).

In the LargeInteger class, we use the much more complicated Fast Fourier Transform (FFT) to multiply. For the details of the FFT, we refer readers to [1]. Here, we only sketch the main idea of the algorithm, and focus instead on our implementation decisions.

Large positive integers can be thought of quite naturally as polynomials. Let b be the base in which an integer i is represented. We then have

i = a0 + a1b + a2b2 + ... + an-1bn-1

where n is the number of radix positions in the base-b representation. By treating b as a variable, we have represented i as an n-1st degree polynomial p. To obtain the value i, we simply evaluate p(b). Furthermore, the product of two such polynomials represents the product of the integers they represent.

An n-1st degree polynomial can be uniquely represented by its values at n distinct points. Suppose we have the values of two polynomials at n distinct points. We can then compute the values of the product of these polynomials at these n points by pointwise multiplication.

Suppose we have two polynomials, the sum of whose degrees is strictly less than n, where n is a power of 2. The FFT multiplies the two polynomials as follows:

1. The two polynomials are evaluated at n distinct points.
2. The values are multiplied pointwise.
3. The product polynomial is derived from the n pointwise products.

What makes the FFT efficient is the choice of n points. In order for the algorithm to work, the polynomials must be interpreted over a commutative ring with unit element; furthermore, this ring must have a principal nth root of unity (i.e., a value r such that rn/2 = -1), and n must have a multiplicative inverse. For example, the complex field satisfies these properties for all n. The polynomials are then efficiently evaluated at r, r2, ..., rn, which are guaranteed to be distinct points. Whatever ring we choose, the operations of the FFT must be computed over that ring. In order to avoid complex (or real) operations, we would like to use integers as our underlying set. The integers themselves with operations addition and multiplication form a commutative ring with unit element, but this ring has no principal nth root of unity for n > 2, and has no multiplicative inverse for n > 1.

We therefore will use a ring of integers mod m for some m > 1. Whatever the choice of m, we obtain a commutative ring with unit element. However, whether the appropriate roots and inverses exist depends upon the choice of m. Our implementation uses m = 70383776563201. In what follows, we demonstrate that this choice of m yields the desired properties. Perhaps a more interesting question, though, is how we arrived at this value for m. Take a look at our discussion on how the FFT constants were found for an answer to this question.

As is shown in the table of multiplicative inverses, 2n has a multiplicative inverse in this ring for 0 ≤ n ≤ 30. This can be verified by multiplying by the given inverse (mod m) and verifying that the result is 1. Because 230 is the largest power of 2 that can be used as the size of a Java array, it is large enough for our purposes. (An array size of 230 is sufficient to allow us to compute any product whose base-16 representation can be encoded as a Java String.)

It is not hard to show that in any ring with unit element, for n > 1, if r is a principal 2nth root of unity, then r2 is a principal 2n-1st root of unity. Furthermore, it is easily seen that the principal square root of unity is the unique value that is not equal to 1, but whose square is 1 (this is always m-1 in the ring of integers mod m, for any m > 2). Thus, it can be verified that the values given in the table of primitive roots of unity are, in fact, principal 2nth roots of unity for 0 ≤ n ≤ 30. Furthermore, the multiplicative inverses may be verified as above.

There is one other criterion that must be satisfied before the FFT can be used to multiply large integers. That is, we must make sure that the product polynomial, when computed in this ring, is the same as the product we would obtain if the product were computed over the integers. In order to satisfy this, m must be larger than any coefficient in the product polynomial. If the degree of the product polynomial is less than 230, and if all of the coefficients in the polynomials being multiplied are at most 28, then each coefficient in the result is less than 246 < 70383776563201. We therefore choose this ring in which to perform the FFT algorithm. Assuming sufficiently large memory, this will allow us to compute any product whose base-16 representation can be stored in a String in Java.

The FFT algorithm computes the product polynomial in O(n log n) time, where n is the smallest power of 2 larger than the degree of the product polynomial, assuming that a principal nth root of unity, its multiplicative inverse, and the multiplicative inverse of n can be produced in constant time, and that multiplications in the underlying ring can be computed in constant time. As we have already shown, the roots, their inverses, and the inverses of the powers of 2 have been precomputed. These are hard-coded into lookup tables, so they can be retrieved in constant time. Clearly, multiplication mod m can be done in constant time. Thus, we claim that our implementation exhibits O(n log n) behavior.

Note: We have not claimed an O(n log n) algorithm for aribitrary precision multiplication; in fact, we know of no such algorithm. Because we are restricting ourselves to a fixed finite ring, our algorithm only operates on a finite (though extremely large) set of instances. As a result, we could legitimately say that our algorithm operates in constant time; however, this analysis, though technically correct, seems rather useless.

Based on our analysis of the two algorithms, the performance should be comparable when the size of the smaller operand is logarithmic in size of the larger operand. Presumably, the constant of proportionality in this equation could be determined experimentally. As the operands approach the same size, our LargeInteger class performs better, but as one operand becomes much larger than the other, Sun's BigInteger class performs better. Based on worst-case performance alone, our LargeInteger class performs much better.

## Division

The division algorithm used in Sun's BigInteger class is essentially the long division algorithm commonly taught in elementary school. For a dividend with length u and a divisor with length v, it operates in O(v(u-v)) time. For n = u + v, the worst case occurs when v = u/2. In this case, the algorithm operates in O(n2) time. As v approaches either 1 or u, the performance approaches O(n).

The division algorithm we use in LargeInteger is based on Newton's method for finding a reciprocal. The idea is credited to S. Cook in [2, p. 296]. We use a top-down formulation of the algorithm described in [2, pp. 295-297]. In [3], we give a detailed presentation and analysis of the algorithm for the case in which the numbers are stored in binary. For the most part, the extension to arbitrary radix is straightforward. The only exception is the base case. In this case, we use the same technique as described in [2, p. 296]. Assuming a time complexity of O(n log n) for multiplication, the time complexity for our division algorithm, as shown in [3], is in O(n log n), where n is the length of the dividend. The comparison of the two division algorithms therefore mimics the comparison of the two multiplication algorithms.

## Exponentiation

Most of the exponentiation algorithms used in both the BigInteger and LargeInteger classes are some variation on repeated squaring, which has a time complexity linear in that of the multiplication algorithm used. Thus, if n is the length of the result, exponentiation is performed in O(n2) time in Sun's BigInteger class, whereas it is done in O(n log n) time in our LargeInteger class. Our implementation is therefore much more efficient.

The only exception to the above characterization is that raising 2 to some power can be done in linear time using the BigInteger.shiftLeft(int) method. Because our calculator based on Sun's BigInteger class calls this method, the 2x operation is performed in O(n) time by this calculator, whereas it is performed in O(n log n) time by the calculator using our LargeInteger class. (These analyses do not include any radix conversions.)

The radix conversion algorithms used in the two classes are similar. Suppose b is the target radix. Both algorithms may be expressed as follows for large values (all computations are done in the initial radix):
1. Divide the value to be converted by bk for some k, obtaining the quotient and remainder.
2. Convert the quotient and remainder to the target radix.
3. Concatenate the converted values, padding the remainder with 0's if necessary to obtain k radix positions.

The essential difference is in the choice of divisor in step 1. In Sun's BigInteger class, the divisor is the largest power of the radix that will fit into a long (i.e., a 64-bit signed integer). As a result, the remainder of the division can be converted using the Long.toString(long, int) method. Because the divisor is small, each division is performed in O(n) time, where n is the size of the dividend. It is easily seen that the overall performance of the conversion is in O(n2).

In our LargeInteger class, we generalize a technique described in [2, p. 592] and credited to A. Shönhage. The divisor in step 1 is the largest bk such that

• k is a power of 2; and
• bk is less than the value to be converted.
Step 2 then consists of two recursive calls, each of which will result in division by a smaller bk (unless the value is smaller than b). In order to avoid recomputing bk values, our algorithm begins by precomputing values bk until a value as large as the value to be converted is produced. This precomputation is done by repeated squaring in O(n log n) time, where n is the length of the value to be converted. It can then be shown that the conversion is done in O(n log2 n) time. Thus, our radix conversion is significantly faster than that done by Sun's BigInteger class as the value to be converted becomes large.

In some cases, we use a simpler algorithm to achieve a constant-time radix conversion. Let r be the initial radix and b the target radix. Recall that we store in each byte the value of i consecutive radix positions, where i is the largest value such that ri ≤ 256. Let j be the largest value such that bj ≤ 256. If bj = ri, then no change needs to be made the individual bytes.

## Discussion

The multiplication, division, and radix conversion routines in Sun's BigInteger class are all O(n2), whereas our multiplication and division routines are O(n log n), and our radix conversion routine is O(n log2 n). All of the techniques we use are well-known in the algorithms literature.

On the other hand, there is an advantage to simplicity, namely, a greater confidence that the implementation is correct. Our multiplication and division algorithms are much more complicated than the ones in Sun's BigInteger class. As a result, we have less confidence in their correctness (see our disclaimer at the top of the demo page). In fact, some of our decisions have sacrificed performance for clarity, albeit to a lesser extent than what Sun has done. For example, our first attempt at implementing division used a bottom-up technique patterned directly after the algorithm in [2, pp. 295-297], extended to arbitrary bases. However, debugging this implementation became so difficult that we rewrote in a top-down style, using recursion rather than iteration. Later, we discuss other modifications that could improve the performance of our code at the cost of greater code complexity.

One might wonder whether there exist other algorithms, perhaps less efficient than the ones we chose, but still reasonably simple and more efficient than those used by Sun. For multiplication, this is certainly true. There is a divide-and-conquer algorithm (due to Karatsuba and Ofman) which multiplies two large integers a and b by first splitting both into a high-order half and a low-order half; thus:

• a = ah2k + al
• b = bh2k + bl
where k is half the number of bits in the larger operand. The following three products are then recursively computed:
1. c1 = ah bh
2. c2 = (ah + al)(bh + bl)
3. c3 = al bl
It is easily shown that ab = 22kc1 + 2k(c2 - c1 - c3) + c3. This computation can be performed in O(nlg 3) time, where lg is the base-2 logarithm (lg 3 = 1.59...). See [4, pp. 219-223] for more details. Furthermore, as we discuss below, this algorithm can be modified so that its performance is linear when one operand is small.

For division, however, we know of no "simple" algorithm that significantly outperforms the one used by Sun. Furthermore, although the radix conversion algorithm we use is simple enough, without an efficient division algorithm, its performance is even worse than the radix conversion algorithm used by Sun. In particular, if our radix conversion algorithm uses Sun's division algorithm, the performance of radix conversion becomes O(n2 log n).

Finally, we consider ways in which the performance of our implementation could be further improved. For multiplication, our code is more efficient than Sun's when the operands are roughly the same size, but Sun's code is more efficient when one operand is much larger than the other. We could improve our algorithm by breaking the larger operand into pieces of roughly the same size as the smaller operand. Each piece of the larger operand could then be multiplied by the smaller operand and the results combined to give the product. (Note that with this approach, we do not need to apply the FFT to the smaller operand more than once.) If the larger operand is of length u and the smaller operand is of length v, the product would be computed in O(u log v) time. Letting n = u + v, we find that the performance is in O(n log n) if u = v, but approaches O(n) as v decreases. However, we cannot expect this algorithm to match the performance of the simple algorithm when v is very small, because the simple algorithm has so little overhead. Note that this same technique can be applied to the divide-and-conquer algorithm described above, so that it is also linear when one operand is small. Furthermore, we can apply this technique to the division algorithm to yield O(u log v) performance, where u is the lenth of the dividend and v is the length of the divisor.

Unfortunately, neither of these improvements would significantly affect the performance of our radix conversion algorithm because all multiplications are squares, and in most divisions, the divisor is roughly half the length of the dividend. We can, however, achieve a constant-factor improvment by doing more precomputation. As we compute the divisors, we can also compute a reciprocal and apply the FFT to it. However, we know of no way to improve the asymptotic performance.

## Conclusion

Sun's BigInteger class provides users with an implementation of arbitrarily large integers together with a rich set of operations. Included are those operations commonly needed for cryptographic applications (e.g., random prime generation and modular arithmetic operations). Furthermore, the performance of these operations is adequate for cryptographic operations, where the sizes of integers typically do not exceed a few hundred bits. However, because the multiplication and division methods use quadratic algorithms, they do not scale up well.

We have demonstrated that these operations can be implemented in a way that scales up much better. The cost of these performance improvements, however, is that the code is much more complex. As a result, bugs are more difficult to detect.

## References

[1] T. Cormen, C. Leiserson, and R. Rivest, Introduction to Algorithms, McGraw Hill, 1990, pp. 776-800. (2nd Ed. due Summer 2001).

[2] D. Knuth, The Art of Computer Programming, Vol. 2: Seminumerical Algorithms, 2nd Ed., Addison Wesley, 1981.

[3] R. Howell, "Arbitrary-Precision Division", www.cis.ksu.edu/~howell/calculator/division.pdf, 2000.

[4] G. Brassard and P. Bratley, Fundamentals of Algorithmics, Prentice Hall, 1996.