Several algorithms exist to perform division in digital designs. These algorithms fall into two main categories: slow division and fast division. Slow division algorithms produce one digit of the final quotient per iteration. Examples of slow division include restoring, non-performing restoring, non-restoring, and SRT division. Fast division methods start with a close approximation to the final quotient and produce twice as many digits of the final quotient on each iteration. Newton-Raphson and Goldschmidt fall into this category.
The following division methods are all based on the form where
Contents |
Slow division methods are all based on a standard recurrence equation:
where:
Restoring division operates on fixed-point fractional numbers and depends on the following assumptions:
The quotient digits q are formed from the digit set {0,1}.
The basic algorithm for binary (radix 2) restoring division is:
P := N
D := D << n * P and D need twice the word width of N and Q
for i = n-1..0 do * for example 31..0 for 32 bits
P := 2P - D * trial subtraction from shifted value
if P >= 0 then
q(i) := 1 * result-bit 1
else
q(i) := 0 * result-bit 0
P := P + D * new partial remainder is (restored) shifted value
end
end
where N=Numerator, D=Denominator, n=#bits, P=Partial remainder, q(i)=bit #i of quotient
The above restoring division algorithm can avoid the restoring step by saving the shifted value 2P before the subtraction in an additional register T (i.e., T=P<<1) and copying register T to P when the result of the subtraction 2P - D is negative.
Non-performing restoring division is similar to restoring division except that the value of 2*P[i]
is saved, so D does not need to be added back in for the case of TP[i] ≤ 0
.
Non-restoring division uses the digit set {−1,1} for the quotient digits instead of {0,1}. The basic algorithm for binary (radix 2) non-restoring division is:
P[0] := N i := 0 while i < n do if P[i] >= 0 then q[n-(i+1)] := 1 P[i+1] := 2*P[i] - D else q[n-(i+1)] := -1 P[i+1] := 2*P[i] + D end if i := i + 1 end while
Following this algorithm, the quotient is in a non-standard form consisting of digits of −1 and +1. This form needs to be converted to binary to form the final quotient. Example:
Convert the following quotient to the digit set {0,1}: | |
Steps: | |
1. Mask the negative term: | |
2. Form the two's complement of N: | |
3. Form the positive term: | |
4. Sum and : |
Named for its creators (Sweeney, Robertson, and Tocher), SRT division is a popular method for division in many microprocessor implementations. SRT division is similar to non-restoring division, but it uses a lookup table based on the dividend and the divisor to determine each quotient digit. The Intel Pentium processor's infamous floating-point division bug was caused by an incorrectly coded lookup table. Five entries that were believed to be theoretically unreachable had been omitted from more than one thousand table entries.[1]
Newton–Raphson uses Newton's method to find the reciprocal of , and multiply that reciprocal by to find the final quotient .
The steps of Newton–Raphson are:
In order to apply Newton's method to find the reciprocal of , it is necessary to find a function which has a zero at . The obvious such function is , but the Newton–Raphson iteration for this is unhelpful since it cannot be computed without already knowing the reciprocal of . A function which does work is , for which the Newton–Raphson iteration gives
which can be calculated from using only multiplication and subtraction, or using two fused multiply–adds.
If the error is defined as then
Apply a bit-shift to the divisor D to scale it so that 0.5 ≤ D ≤ 1 . The same bit-shift should be applied to the numerator N so that the quotient does not change. Then one could use a linear approximation in the form
to initialize Newton–Raphson. To minimize the maximum of the absolute value of the error of this approximation on interval one should use
Using this approximation, the error of the initial value is less than
Since for this method the convergence is exactly quadratic, it follows that
steps is enough to calculate the value up to binary places.
Goldschmidt (after Robert Elliott Goldschmidt)[2] division uses an iterative process to repeatedly multiply both the dividend and divisor by a common factor Fi to converge the divisor, D, to 1 as the dividend, N, converges to the quotient Q:
The steps for Goldschmidt division are:
Assuming N/D has been scaled so that 0 < D < 1, each Fi is based on D:
Multiplying the dividend and divisor by the factor yields:
After a sufficient number of iterations k:
The Goldschmidt method is used in AMD Athlon CPUs and later models.[3][4]
The Goldschmidt method can be used with factors that allow simplifications by the Binomial theorem. Assuming N/D has been scaled by a power of two such that . We choose and . This yields . Since after steps we can round to 1 with a relative error of at most and thus we obtain binary digits precision. This algorithm is referred to as the IBM method in.[5]
Methods designed for hardware implementation generally do not scale to integers with thousands or millions of decimal digits; these frequently occur, for example, in modular reductions in cryptography. For these large integers, more efficient division algorithms transform the problem to use a small number of multiplications, which can then be done using an asymptotically efficient multiplication algorithm such as Toom–Cook multiplication or the Schönhage–Strassen algorithm. Examples include reduction to multiplication by Newton's method as described above[6] as well as the slightly faster Barrett reduction algorithm.[7] Newton's method's is particularly efficient in scenarios where one must divide by the same divisor many times, since after the initial Newton inversion only one (truncated) multiplication is needed for each division.
Division by a constant is equivalent to multiplication by its reciprocal. Since the denominator is constant, so is its reciprocal . Thus it is possible to compute the value of once at compile time, and at run time perform the multiplication rather than the division
When doing floating point arithmetic the use of presents no problem. But when doing integer arithmetic it is problematic, as will always evaluate to zero (assuming D > 1), so it is necessary to do some manipulations to make it work.
Note that it is not necessary to use . Any value will work as long as it reduces to . For example, for division by 3 the reciprocal is 1/3. So the division could be changed to multiplying by 1/3, but it could also be a multiplication by 2/6, or 3/9, or 194/582. So the desired operation of can be changed to , where equals . Although the quotient would still evaluate to zero, it is possible to do another adjustment and reorder the operations to produce .
This form appears to be less efficient because it involves both a multiplication and a division, but if Y is a power of two, then the division can be replaced by a fast bit shift. So the effect is to replace a division by a multiply and a shift.
There's one final obstacle to overcome - in general it is not possible to find values X and Y such that Y is a power of 2 and . But it turns out that it is not necessary for to be exactly equal to in order to get the correct final result. It is sufficient to find values for X and Y such that is "close enough" to . Note that the shift operation loses information by throwing away bits. It is always possible to find values of X and Y (with Y being a power of 2) such that the error introduced by the fact that is only approximately equal to is in the bits that are discarded. For further details please see the reference.[8]
As a concrete example - for 32 bit unsigned integers, division by 3 can be replaced with a multiply by . The denominator in this case is equal to .
In some cases, division by a constant can be accomplished in even less time by converting the "multiply by a constant" into a series of shifts and adds or subtracts.[9]
Round-off error can be introduced by division operations due to limited precision.