Multiply-with-carry

In computer science, multiply-with-carry (MWC) is a method invented by George Marsaglia for generating sequences of random integers based on an initial set from two to many thousands of randomly chosen seed values. The main advantages of the MWC method are that it invokes simple computer integer arithmetic and leads to very fast generation of sequences of random numbers with immense periods, ranging from around 260 to 22000000.

As with most pseudorandom number generators, the resulting sequences are functions of the randomly chosen seed values.

General theory

A MWC sequence is based on arithmetic modulo a base b, usually b = 232, because arithmetic modulo of that b is automatic in most computers. However, sometimes a base such as b = 232  1 is used, because arithmetic for modulus 232  1 requires only a simple adjustment from that for 232, and theory for MWC sequences based on modulus 232 has some nagging difficulties avoided by using b = 232  1.

In its most common form, a lag-r MWC generator requires a base b, a multiplier a, and a set of r+1 random seed values, consisting of r residues of b,

x0, x1, x2 ,..., xr1,

and an initial carry cr1 < a.

The lag-r MWC sequence is then a sequence of pairs xn, cn determined by

 x_n=(ax_{n-r}+c_{n-1})\,\bmod\,b,\ c_n=\left\lfloor\frac{ax_{n-r}+c_{n-1}}{b}\right\rfloor,\ n\ge r,

and the MWC generator output is the sequence of x's,

xr , xr+1 , xr+2, ...

The period of a lag-r MWC generator is the order of b in the multiplicative group of numbers modulo abr  1. It is customary to choose a's so that p = abr  1 is a prime for which the order of b can be determined. Because 2 is a quadratic residue of numbers of the form 8k±1, b = 232 cannot be a primitive root of p = abr  1. Therefore there are no MWC generators for base 232 that have the maximum possible period, one of the difficulties that use of b = 232  1 overcomes.

A theoretical problem with MWC generators, pointed out by Couture and l'Ecuyer (1997) is that the most significant bits are slightly biased; complementary-multiply-with-carry generators do not share this problem: "We shall see that, for the complementary MWC, each bit of the output value is fair, that is, the two binary digits will appear equally often in a full period, a property not shared by MWC generators." They do not appear to elaborate further as to the extent of the bias. Complementary-multiply-with-carry generators also require slightly more computation time per iteration, so there is a tradeoff to evaluate depending on implementation requirements.

Comparisons with linear congruential generators

Linear congruential generators are implemented as

x_{n+1}=(ax_n+c)\ \bmod\,2^{32},

because most arithmetic processors are able to put the multiplier a and the current x in 32-bit registers, form the 64-bit product in adjoining registers, and take the lower 32 bits as the product, that is, form

a\times x\ \bmod\,2^{32}.

Adding the 32-bit c to that lower half then provides (ax+c) mod 232. If a mod 8 is 1 or 5 and c is odd, the resulting base 232 congruential sequence will have period 232.

A lag-1 multiply-with-carry generator allows us to make the period nearly 263 by using those same computer operations, except that this time the top half of the 64-bit product is used rather than ignored after the 64 bits are formed. It is used as a new carry value c rather than the fixed carry value of the standard congruential sequence: Get ax+c in 64-bits, then form a new c as the top half of those 64 bits, and the new x as the bottom half.

With multiplier a specified, each pair of input values x, c is converted to a new pair,

x\leftarrow (ax+c)\,\bmod\,2^{32},\ \ c\leftarrow \left\lfloor\frac{ax+c}{2^{32}}\right\rfloor.

If x and c are not both zero, then the period of the resulting multiply-with-carry sequence will be the order of b = 232 in the multiplicative group of residues modulo ab  1, that is, the smallest n such that bn = 1 mod (ab  1). If we choose an a of 28 to 31 bits such that ab−1 is a "safe prime", that is both ab  1 and ab/2  1 are prime, then the period will be ab/2  1, approaching 263, which in practice may be an acceptably large subset of the number of possible 32-bit pairs (x, c).

Following are some maximal values of a for computer applications which satisfy the above safe prime condition:

Bits in a b Maximum a Such That ab−1 is a Safe Prime Period
15 216 215−50 = 32,718 1,072,103,423
16 216 216−352 = 65,184 2,135,949,311
31 232 231−563 = 2,147,483,085 4,611,684,809,394,094,079
32 232 232−178 = 4,294,967,118 9,223,371,654,602,686,463
64 264 264−742 = 18,446,744,073,709,550,874 170,141,183,460,469,224,887,945,252,369,640,456,191
128 2128 2128−10,408 2127(2128−10,408)−1
256 2256 2256−9166 2255(2256−9166)−1
512 2512 2512−150,736 2511(2512−150,736)-1

However, as being a safe prime does not affect the randomness of the sequence, one may instead simply choose a such that the order of b is ab/2  1. The following are again maximum values of a of various sizes.

Bits in a b Maximum a Such that b has order ab/21 Period
15 216 215−29 = 32,739 1,072,791,551
16 216 216−22 = 65,514 2,146,762,751
31 232 231−68 = 2,147,483,580 4,611,685,872,398,499,839
32 232 232−76 = 4,294,967,220 9,223,371,873,646,018,559
63 264 263−140 = 9,223,372,036,854,775,668 85,070,591,730,234,614,574,571,566,698,273,439,743
64 264 264−116 = 18,446,744,073,709,551,500 170,141,183,460,469,230,661,776,147,440,730,111,999

Here is a comparison of congruential and MWC sequences for the simple case of arithmetic modulo 10; here the "registers" are a single digit, adjoining registers are two digits:

Starting with x_0=1, the congruential sequence

x_n=(7x_{n-1}+3)\,\bmod\,10,

has this sequence of adjoining registers:

10,03,24,31,10,03,24,31,10,\ldots,

and the output sequence of x's, (the rightmost register), has period 4:

0,3,4,1,0,3,4,1,0,3,4,1,\ldots

Starting with x_0=1,c_0=3, the MWC sequence

x_n=(7x_{n-1}+c_{n-1})\,\bmod\,10,\ c_n=\left\lfloor\frac{7x_{n-1}+c_{n-1}}{10}\right\rfloor,

has this sequence of adjoining registers

10,01,07,49,67,55,40,04,28,58,61,13,22,16,43,25,37,52,19,64,34,31 10,01,07,...

with output sequence of x's having period 22:

0,1,7,9,7,5,0,4,8,8,1,3,2,6,3,5,7,2,9,4,4,1 0,1,7,9,7,5,0,...

Notice that if those repeated segments of x values are put in reverse order starting from a x_{22n+20},

449275\cdots97101\,449275\cdots9710144\cdots

we get the expansion j/(ab−1) with a=7, b=10, j=31:

\frac{31}{69}=.4492753623188405797101\,4492753623\ldots

This is true in general: The sequence of x's produced by a lag-r MWC generator:

 x_n=(ax_{n-r}+c_{n-1})\bmod\,b\,,\ \ c_n=\left\lfloor\frac{ax_{n-r}+c_{n-1}}{b}\right\rfloor,

when put in reverse order, will be the base-b expansion of a rational j/(abr  1) for some 0 < j < abr.

Also notice that if, starting with x_0=34, we generate the ordinary congruential sequence

x_n=7x_{n-1}\,\bmod\,69,

we get the period 22 sequence

31,10,1,7,49,67,55,40,4,28,58,61,13,22,16,43,25,37,52,19,64,34, 31,10,1,7,...

and that sequence, reduced mod 10, is

1,0,1,7,9,7,5,0,4,8,8,1,3,2,6,3,5,7,2,9,4,4, 1,0,1,7,9,7,5,0,...

the same sequence of x's resulting from the MWC sequence.

This is true in general, (but apparently only for lag-1 MWC sequences):

Given initial values x_0,c_0, the sequence x_1,x_2,\ldots resulting from the lag-1 MWC sequence

 x_n=(ax_{n-1}+c_{n-1})\,\bmod b\,,\ \ c_n=\left\lfloor\frac{ax_{n-1}+c_{n-1}}{b}\right\rfloor

is exactly the congruential sequence yn = ayn  1 mod(ab  1), reduced modulo b.

Choice of initial value y0 merely rotates the cycle of x's.

Complementary-multiply-with-carry generators

Establishing the period of a lag-r MWC generator usually entails choosing multiplier a so that p=abr  1 is prime. If p is a safe prime, then the order of b will be p  1 or (p  1)/2. Otherwise, it is likely that p  1 will have to be factored in order to find the order of b mod p, and p = abr  1 may be difficult to factor.

But a prime of the form p = abr + 1 will make p−1 easy to factor, so a version of multiply-with-carry that involves the order of b for a prime p = abr + 1 would reduce considerably the computational number theory required to establish the period of a MWC sequence.

Fortunately, a slight modification of the MWC procedure leads to primes of the form abr + 1. The new procedure is called complementary-multiply-with-carry (CMWC),

and the setup is the same as that for lag-r MWC: multiplier a, base b, r + 1 seeds

x0, x1, x2, ..., xr1, and cr  1.

There is a slight change in the generation of a new pair (x, c):  x_n=(b-1)-(ax_{n-r}+c_{n-1})\,\bmod\,b,\ c_n=\left\lfloor\frac{ax_{n-r}+c_{n-1}}{b}\right\rfloor.

That is, take the complement, (b−1)−x, when forming the new x.

The resulting sequence of x's produced by the CMWC RNG will have period the order of b in the multiplicative group of residues modulo abr+1, and the output x's, in reverse order, will form the base b expansion of j/(abr+1) for some 0<j<abr.

Use of lag-r CMWC makes it much easier to find periods for r's as large as 512, 1024, 2048, etc. (Making r a power of 2 makes it slightly easier (and faster) to access elements in the array containing the r most recent x's.)

Some examples: With b=232, the period of the lag-1024 CMWC

 x_n=(b-1)-(ax_{n-1024}+c_{n-1})\,\bmod\,b,\ c_n=\left\lfloor\frac{ax_{n-1024}+c_{n-1}}{b}\right\rfloor.

will be a\cdot232762, about 109867 for these three as: 109111 or 108798 or 108517.

With b = 232 and a = 3636507990, p = ab1359  1 is a safe prime, so the MWC sequence based on that a has period 3636507990\cdot243487 \approx1013101.

With b = 232, a CMWC RNG with near record period may be based on the prime p = 15455296b42658 + 1. The order of b for that prime is 241489*21365056, about 10410928.

Implementation

The following is an implementation of the CMWC algorithm in the C programming language. Also, included in the program is a sample initialization function. In this implementation the lag r=4096. The period of the resulting generator is about 2^{131104 }.

#include <stdint.h>
 
#define PHI 0x9e3779b9
 
static uint32_t Q[4096], c = 362436;
 
void init_rand(uint32_t x)
{
	int i;
 
	Q[0] = x;
	Q[1] = x + PHI;
	Q[2] = x + PHI + PHI;
 
	for (i = 3; i < 4096; i++)
		Q[i] = Q[i - 3] ^ Q[i - 2] ^ PHI ^ i;
}
 
uint32_t rand_cmwc(void)
{
	static uint32_t i = 4095;
	uint64_t t;
 
	i = (i + 1) & 4095;
	t = (18705ULL * Q[i]) + c;
	c = t >> 32;
	Q[i] = 0xfffffffe - t;
 
	return Q[i];
}

See also

References