Talk:Binary GCD algorithm
From Wikipedia, the free encyclopedia
Contents |
[edit] unsigned integers
If don't see, why the algorithm is restricted to unsigned integers. As far as I can see it works equally on negative numbers. Even more the algorithm would be simplified because no v>=u comparison is necessary. 134.102.210.237 11:54, 31 March 2006 (UTC)
[edit] ARM implementation
The ARM implementation lacks the initial u = 0 test. The code also does not appear to make good use of the conditional instruction set, so I have some optimizations to suggest:
[edit] remove_twos_loop optimization
gcd: @ Arguments arrive in registers r0 and r1 mov r3, #0 @ Initialize r3, the power of 2 in the result orr r12, r0, r1 @ r12 = (r0 | r1) tst r12, #1 @ Test least significant bit of (r0 | r1) bne gcd_loop @ If nonzero, either r0 or r1 are odd, jump into middle of next loop remove_twos_loop: mov r0, r0, lsr #1 @ Shift r0 right, dividing it by 2 mov r1, r1, lsr #1 @ Shift r1 right, dividing it by 2 add r3, r3, #1 @ Increment r3 tst r0, #1 @ Check least significant bit of r0 bne gcd_loop @ If nonzero, r0 is odd, jump into middle of next loop tst r1, #1 @ Check least significant bit of r1 beq remove_twos_loop @ If zero, r0 and r1 still even, keep looping
[edit] optimization 1
gcd: @ Arguments arrive in registers r0 and r1 mov r3, #0 @ Initialize r3, the power of 2 in the result orr r12, r0, r1 @ r12 = (r0 | r1) remove_twos_loop: tst r12, #1 @ Test least significant bit of (r0 | r1) moveq r0, r0, lsr #1 @ Shift r0 right, dividing it by 2 moveq r1, r1, lsr #1 @ Shift r1 right, dividing it by 2 moveq r12, r12, lsr #1 @ Shift r12 right, dividing it by 2 beq remove_twos_loop @ If zero, r0 and r1 still even, keep looping
[edit] optimization 2
gcd: @ Arguments arrive in registers r0 and r1 mov r3, #0 @ Initialize r3, the power of 2 in the result remove_twos_loop: tst r0, #1 @ Test least significant bit of r0 tsteq r1, #1 @ Test least significant bit of r1 moveq r0, r0, lsr #1 @ Shift r0 right, dividing it by 2 moveq r1, r1, lsr #1 @ Shift r1 right, dividing it by 2 beq remove_twos_loop @ If zero, r0 and r1 still even, keep looping
[edit] gcd_loop optimization
gcd_loop: @ Loop finding gcd of r0, r1 not both even tst r0, #1 @ Check least significant bit of r0 moveq r0, r0, lsr #1 @ If r0 is even, shift r0 right, dividing it by 2, and... beq gcd_loop_continue @ ... continue to next iteration of loop. tst r1, #1 @ Check least significant bit of r1 moveq r1, r1, lsr #1 @ If r1 is even, shift it right by 1 and... beq gcd_loop_continue @ ... continue to next iteration of loop. cmp r0, r1 @ Compare r0 to r1 subcs r2, r0, r1 @ If r0 >= r1, take r0 minus r1, and... movcs r0, r2, lsr #1 @ ... shift it right and put it in r0. subcc r2, r1, r0 @ If r0 < r1, take r1 minus r0, and... movcc r1, r2, lsr #1 @ ... shift it right and put it in r1. gcd_loop_continue: cmp r0, #0 @ Has r0 dropped to zero? bne gcd_loop @ If not, iterate mov r0, r1, asl r3 @ Put r1 << r3 in the result register bx lr @ Return to caller
[edit] optimization
gcd_loop: @ Loop finding gcd of r0, r1 not both even tst r0, #1 @ Check least significant bit of r0 moveq r0, r0, lsr #1 @ If r0 is even, shift r0 right, dividing it by 2, and... beq gcd_loop @ ... continue to next iteration of loop. gcd_loop_2: @ Loop finding gcd of r0, r1 tst r1, #1 @ Check least significant bit of r1 moveq r1, r1, lsr #1 @ If r1 is even, shift it right by 1 and... beq gcd_loop_2 @ ... continue to next iteration of loop. cmp r0, r1 @ Compare r0 to r1 subcc r2, r1, r0 @ If r0 < r1, take r1 minus r0, and... movcc r1, r2, lsr #1 @ ... shift it right and put it in r1. bcc gcd_loop_2 @ ... continue to next iteration of loop (r0 is still odd). sub r2, r0, r1 @ If r0 >= r1, take r0 minus r1, and... movs r0, r2, lsr #1 @ ... shift it right and put it in r0. bne gcd_loop @ Has r0 dropped to zero? If not, iterate mov r0, r1, asl r3 @ Put r1 << r3 in the result register bx lr @ Return to caller
- Sounds good to me. The point of this code sample is just to illustrate how efficient binary GCD is on a real machine. Feel free to plug in the most optimized version you can imagine. Deco 19:50, 31 October 2005 (UTC)
[edit] Implementations
I've removed the ML implementation, because it teaches more about ML than about the algorithm. I have my doubts about the value of the assembly version, but I can't really assess it as I can't read it. IMHO, the C implementation is important because it exemplifies the use of bitwise operations for efficiency. Qwertyus 22:31, 13 April 2006 (UTC)
- I think you did the right thing. I just restored the C and ARM implementations after User:Arvindn blanked them, for these reasons: IMHO, C isn't important for showing bitwise operations; it should be obvious that you'd use bitwise ops where appropriate. But the C implementation is easier to read than the English algorithm at the moment, so it needs to stay at least until there's an appropriate substitute (pseudocode? Knuth-style algorithm description?). IMHO, the ARM implementation is really enlightening, because it shows the real size and speed benefit of the algorithm in a way that no higher-level language's compiler even approaches. (In particular, it totally stomps the C implementation.) Without some hand-optimized assembly implementation, the advantage of binary GCD over Euclid is not obvious, IMHO. --Quuxplusone 21:18, 1 January 2007 (UTC)