Random access noise: Avalanche effect through chaotic waveshaping

By Joel K. Pettersson. Added 2022-03-14. Updated 2023-01-07.

See also: A Gaussian white noise follow-up.

A classic type of audio noise generator produces new pseudo-random numbers at a rate, or frequency, and connects those dots in some way, for example producing squarish hold lines between values. Sweeping the frequency of such a generator up and down produces some sounds which many have heard for decades. But what if you want to spice things up a little further and support "through-zero" FM synthesis behavior (reversing the waveshape in time when frequency becomes negative)? Or what if you want to use extreme phase modulation to select noise values at some (possibly large) distance from a center? Now I think I have a good solution, in the form of a simple function that turns an orderly series of values into something chaotic enough to sound like white noise, which may possibly be used in a future version of the saugns program.

On some reflection, I decided that the problem to solve was, how to produce a stream of white noise values efficiently, while allowing random access to noise values within a roughly 232 period? Efficient random access, meaning perfect predictability and ability to simply jump around, is of course at odds with the cryptographic criteria for good random numbers that people most often want to strive for, but it's exactly what's wanted if instead the goal is a flexible audio signal generator. Forget about pre-computing and caching noise values, as that's inefficient, and the period is also somewhat arbitrary and could as well be a 64-bit period.

Update: Below, step by step I describe the development of my ranoise32 functions, which use a plain 32-bit counter for state. The lower-quality versions are usable in place of e.g. plain 32-bit LCGs. For the impatient, the final section has test results and final versions in C99. More can be found in my "ranoise" git repository on Codeberg. The statistically best full function (which is only a handful of lines long) tests in PractRand as delivering 8 GB of good-enough randomness, where 16 GB would be ideal for a PRNG with only 32 bits of state and a 32 bit output word size. But another version is better chosen for performance instead.

In hindsight, a possible alternative would have been to use high-quality integer hash functions of the usual form which alternate xor-and-rightshift with multiplications. Essentially, the SplitMix64 random number generator uses a 64-bit function of that kind, and so could easily be modified to allow random access. The same can be done for a 32-bit function. The approach I arrived at, by contrast, uses a multiplication of a value by a variably bitrotated copy of the value, which throws around bits (especially the upper ones) enough that one multiplication less is needed for a pretty good result. For higher-quality output, a few xor-and-rightshifts are also needed, but for lower quality they can be left out.


Good avalanching and a white noise sound

From the world of cryptography and hash functions, the avalanche effect is the property of a function that the bits of the result change greatly enough each time just one bit of the input changes. Compare a sawtooth wave ramp (linearly increasing values) to a white noise sequence (random numbers), and this is obviously also relevant for digital audio white noise synthesis. There's no neater solution to my problem than a cheap and simple function that turns the former into the latter, i.e. some function which provides good avalanching with minimal computation; then the value of a linear counter will provide the sample position in a corresponding stream of noise, and the counter can also be read and modified arbitrarily.

An alternative solution would be a simple conventional random number generator which can be used in two directions, e.g. a linear congruential generator – for which it is simple to calculate how to create a function to retrieve the previous value, which can be used alongside the function for providing the next value. But moving several steps back or forth would then require looping, so it's not a good solution for random access. Update: I didn't know it beforehand, but it's possible to make a LCG jump around in logarithmic time (handling the number of steps to move bit by bit), so it would have worked in practice as a solution, though less optimally.

So, returning to the idea of a simple-enough bit-scrambling function, what to use? There are some more well-known functions used as part of hash functions, to mix up the bits some more after producing some number (e.g. MurmurHash fmix, and other multiply and/or xor and shift/rotate variations), or sometimes even for plain random number generation purposes, though whatever the purpose, they tend to either scramble input values far too poorly for it to sound like white noise, or they are the opposite and use rather heavy-duty cryptographic computations that seem overkill – though in this context, looping a few simple operations more than a few times per sample is already slow, and unnecessary as it turns out.

(Update: I underestimated the potential of using several multiplications for throwing bits around into a random-seeming pattern. Indeed, some pretty simple PRNGs can use that to very good effect, requiring only one more multiplication than my approach for a good "random access noise" result. A note has been added to the intro about alternatives I didn't look carefully at before I began experimenting.)

I arrived at something intuitively, through experimenting, searching out building blocks as I went. Essentially, I ended up thinking of the problem of creating diffusion or avalanching as a two-dimensional problem, where two different things tackle two different aspects of what is to be done. The first, spreading values out across the target range, could be solved with a multiplication. The second, making the spacing of values less regular, could be solved in part by using a variable amount of bitrotation.

Fibonacci hashing

In order to spread a sequence of values out evenly across a target range, in the sense that regardless of whether a few or a lot of input values are used, they end up covering the same range, Fibonacci hashing can be used. It uses a simple multiplication with a value; for 32-bit integer numbers, 232 divided by the golden ratio gives 2654435769 as the multiplier. The result for a linear input sequence is, however, still evenly-spaced values rising or falling and wrapping around. In other words, it still sounds perfectly periodic as long as frequency (spacing between successive input values) remains constant.

The multiplication part of Fibonacci hashing seems like a good function for initially changing the 32-bit input in the function to construct, making lower and higher numbers much more similar, and spreading out bits some. Bit-twiddling when done afterwards gives much more bang for the computational buck.

Exclusive-or with bitshift and multiply

I tried to use exclusive-or (xor) of bitshifted values in various ways for simple additional transformations, but by itself it's simply not good enough to be a useful addition. Many variations, when placed following the Fibonacci hashing multiplication, each result in its own peculiar metallic sound, noisy but very different from white noise. Throwing in more and more fixed shifts (or fixed rotations) may improve results, but is not the most effective way to do so.

It began to seem promising when twice using a xor with the bitshifted value, and one of the times also using an multiplication of the shifted value with the unshifted value. Suddenly it sounded better – almost but not quite like white noise. Doing it that way, audibly some parts of the frequency spectrum still contain too-repetitive noise, but at which frequencies the function sounds good vs. not quite right depend on specifics of the shifts, multiplications, and how they are ordered.

Below are four variations of a function which each produce a variation of noise almost, but not quite, like white noise. Without some change to the approach, it seems it's not possible to get rid of the defects without many more operations, only to make trade-offs and audibly shuffle around the problems.

 * Four simple variations of a function usable with a counter.
 * Produces noise somewhat but not quite like white noise.
 * Less useful, among earlier testing before new approach.

int32_t warp1(uint32_t n) {
        uint32_t s = n * 2654435769UL;
        s ^= (s >> 16) * s;
        s ^= (s << 16);
        return s;

int32_t warp2(uint32_t n) {
        uint32_t s = n * 2654435769UL;
        s ^= (s << 16) * s;
        s ^= (s >> 16);
        return s;

int32_t warp3(uint32_t n) {
        uint32_t s = n * 2654435769UL;
        s ^= (s >> 16);
        s ^= (s << 16) * s;
        return s;

int32_t warp4(uint32_t n) {
        uint32_t s = n * 2654435769UL;
        s ^= (s << 16);
        s ^= (s >> 16) * s;
        return s;

All of these, and white noise, result in spectrograms which look very much alike; that particular measurement approach doesn't really work at and beyond this point.

Variable bitrotation and multiply

Using bitrotation instead of bitshifts can give a little more, but the most important improvement on the above came from using a variable length of shift or rotation, rather than a fixed length. The lowest bits seem to vary well enough, so why not just use them? This change to the above approach got rid of the audibly repetitive parts of the noise here and there in the frequency spectrum. Thereafter it turned out that most of what else I'd added could be simplified away without audible loss of quality. I found that every xor could be dropped.

Here is the most minimal code I've written that sounds right, passable to use as a source of white noise. (For my listening, the upper 16 bits went into 16-bit samples at one or another sample rate.) The variable x specifies the noise sample to compute, giving random access, though it can in the simplest case be the value of an increasing counter. While x is unsigned, it's fine to pass values from a signed integer signal as input. Furthermore, the value 0 remains 0. The function can, in other words, be used like a waveshaping function with a chaotic result for non-silence.

#define FIBH32 2654435769UL // 32-bit Fibonacci hashing constant

#define ROR32(x, r) \
        ((uint32_t)(x) >> ((r) & 31) | (uint32_t)(x) << ((32-(r)) & 31))

int32_t ranoise32_minimal(uint32_t x) {
        x *= FIBH32;
        x *= ROR32(x, x + 16);
        return x;

What about the x + 16 above? Using the function sounded slightly rough before changing the bitrotate length, e.g. adding an additive offset, though further fine-tuning is tricky and I don't fully understand it. The effect of different further tweaks for an input stream can be "listened to" in some form after modifying the function to simply assign, rather than multiply-assign, the bitrotation result, though it's difficult to tell how the coloration of the resulting intermediate noise translates into a much subtler and probably largely imperceptible final impact when the multiplication is put back in. Update: Related or not, the Diehard "Squeeze" statistical test – more on such testing further below – fails unless the offset is at least 5 and at most 27; as 16 is the natural center, I changed the function on this page back to using it, the first value I originally went with, instead of one of the also-good alternatives, 11. But is any of the good offsets truly the best? I don't know.

Different measurement methods, i.e. not going by ear (nor by spectrogram) may point towards other ways of improving the function for greater output quality, whether or not the improvement is worth any extra computational cost in practice.

Update: This is the core of my approach, but I missed one thing earlier, which is very easy to change – the uppermost 5 bits are the best to use for the variable bitrotation, because they vary well not only when the argument to the function is changed between calls in increments with the lower bits set, but also when it's changed with larger increments without lower bits set. So, instead of x + 16, it's better to use (x >> 27) + 16.

Another small addition to try which I missed at first, which has more of an impact on quality than the offset for the bitrotation, is adding an or-by-1 to one side of the multiplication, as in x = (x | 1) * ROR32(x, x >> 27);. But by itself, this is not enough to make the function do well in statistical tests for randomness. The journey to pretty decent results of that kind, and making use of the or-by-1 addition, took more twists and turns...

Statistical testing with dieharder and further experimenting

In short, the above version of my algorithm is (as I'd expected) bad for general purpose random number generation, but may behave well for some other special purposes beyond white noise which sounds, or in graphics looks, good enough. (2022-03-17) I found a small addition, further below, can mitigate flaws and make it suitable for more purposes. (2022-03-20)

Out of curiosity, I ran the dieharder (v3.31.1) tests, with stdin input from variations of a trivial program I wrote, which uses my function above in an infinite loop with a counter to make and write values to stdout. (I varied the program to test passing simple linear and non-linear modifications of the counter value to the function.) Failed tests for uses of the above version of my algorithm usually include 7–8 out of 17 of the "diehard_*" tests, and other tests which deal with bit counting and bit patterns. But tests which use random numbers for other purposes such as "dice throwing" and sums of values seem to do well much of the time.

Initially, I was a little surprised that so many of the tests passed, though the predictable failure of about half of the varieties suggests, as I would have suggested anyway, that my algorithm in the above version shouldn't be used for general random number generation purposes. A little later I found that the worst flaws can be fixed with a small xor-and-bitshift addition, making it less bad for general purposes, and in the simplest version reducing the failed Diehard tests to only 2 of them, and the STS tests also finally passing along with more (almost all) of the RGB tests. With a double xor-and-bitshift addition, all tests may pass.

Beyond audio, where in any case the noise sounds good enough, I do think that graphics is a likely area of usefulness with or without the addition described later, since in graphics it's whether or not it looks good enough that counts. Similarly for creating variation in other structures generated – and random access may help in cases where a variable level of detail is wanted, to shift between more coarse-grained and fine-grained pseudo-random number series (at different octaves, or at different resolutions or zooms, etc.). I'm guessing there's also other special purpose uses I haven't thought of, where developers dealing with those areas could test and confirm if results are sensible and if it's useful. And it would be interesting to hear of other cases where it finds use.

Improving quantity rather than quality?

Following failed first attempts to make more tests pass by complicating the minimal function above in various ways, I changed track for a while and tried to see how much the algorithm could produce. The x + 16 part can be varied, changing the number allowing for 32 possible outputs that the tests judged to be of fairly similar quality. (That offset must however be in the range of 5–27 inclusive for passing "diehard_squeeze", so the other 9 variations may be best to exclude.)

One idea is to produce several streams in parallel. And e.g. for audio purposes, re-using the initial multiplication part of the function, while varying copies of the part from its second line to assign a new output each time using a different offset value, could be used to produce multi-channel noise on the cheap (where some more numerical relatedness of differing values in different outputs doesn't matter much).

On further testing, it even seems easy to generate up to three 32-bit pseudo-random values from each counter position by varying the bitrotation offset and emit them as part of the same output stream, without weakening the result to the point where more tests fail. When taking it beyond 3, the result degrades gradually, if the offset is (in accordance with the principle of Fibonacci hashing) increased by 19 between each generated value. For example... (The constant 11 below is less important.)

#include <stdio.h>
#include <stdint.h>

#define FIBH32 2654435769UL // 32-bit Fibonacci hashing constant

#define ROR32(x, r) \
        ((uint32_t)(x) >> ((r) & 31) | (uint32_t)(x) << ((32-(r)) & 31))

int main(int argc, char *argv[]) {
        int i = 0;
        for (;;) {
                /* chaos whaveshaper test */
                uint32_t x = i++;
                x *= FIBH32;
                for (int j = 0; j < 8; ++j) // 1 gives best quality, 8 lower
                        putw(x * ROR32(x, x + j * 19 + 11), stdout);
        return 0;

The worsening of results with more of the inner loop done is in the form of more types of tests failing, while other types continue to pass much longer. Essentially (going by which tests fail), I think fair dice throws no longer work when a series of values are related in the above way. Perhaps in each sub-series of 4, 8, 16, or whichever above-1 value is entered, the dice are loaded in the direction of that sub-series until the next sub-series is reached.

Listening a bit more, bit by bit, and fixing the worst flaws

1-bit random noise sounds like white noise, too, having a flat frequency spectrum. When listening to the 1-bit noise extracted from individual and adjacent bits in the result from my function, I found that the quality of randomness varied a lot among the bits. The highest bits are the most random, and the lowest the least, in a gradient of quality; the lowest bits don't sound anything like white noise (rather like some kind of rich metallic noise). This explains both the statistical flaws and why the 32-bit noise sounds good, since the highest bits determine most of the amplitude level and thus have the largest impact on how it sounds, while the poorest-quality bits are largely unheard.

There's a simple fix for the bad-quality bits – use the highest bits to add randomness to the lowest bits, with a xor operation...

int32_t ranoise32_fix1(uint32_t x) {
        x *= FIBH32;
        x *= ROR32(x, x + 16);
        x ^= x >> 16; // improve worse lower bits using better higher bits
        return x;

This makes most of the statistical tests, which failed before, now pass; a few still fail. In theory, the function when used with a counter for x is now significantly better as a random number generator (but still a bit too rough for general purposes).

How about use for lower-precision signals?

For audio purposes, signals are often in the form of single-precision floating point, or converted to it. 32-bit floats have 24-bit precision, so the lowest 8 bits don't matter in that case. Furthermore, when listening to the final audio signal, in practice usually only 16 bits (or a bit less) matter, whether or not more are stored. (This also assumes the noise is used at high volume.) This means that in practice, adding the above fix may not matter for audio purposes. Similarly for graphics and perceivable brightness levels, and a few more purposes.

When higher bits matter more, it could be more worthwhile to instead use a smaller bitshift, like 8 or less instead of 16 or more, bringing some improvement to middle-high bits below the highest at the cost of improving the lowest bits less. (This brings a slightly different, on the whole smaller, improvement to which statistical tests pass.) It may or may not be useful to use by itself.

For the most general usefulness

Combining the two ideas – adding two xor-by-bitshift, to improve both mid-high and mid-low bits – should in theory improve e.g. audio quality a little and, also, turns out to improve the results in the statistical tests more than the first fix alone. When two xor-by-bitshift are used, it further seems like the combination of the two helps weaker bits enough that making both shifts yet a little smaller has a beneficial result.

Below is a "deluxe version" of the function, among the first which passed all dieharder statistical tests when I tested it. Assuming you generally have a few spare CPU cycles to burn, which you probably do, it's simply the best version so far.

int32_t ranoise32_fix2(uint32_t x) {
        x *= FIBH32;
        x *= ROR32(x, x + 16);
        x ^= (x >> 6) ^ (x >> 15); // improve worse lower bits with higher bits
        return x;

Note that for an x which is not increased or decreased in evenly spaced ways between calls, e.g. a squared counter, the result is usually weaker and some tests may still usually fail.

Finally, I'm not sure the bitshift numbers above are the very best; a few other numbers for the smaller and larger bitshift seem to work similarly well.

TestU01 testing and bigger changes

Update: A little program called TestU01_stdin makes this kind of testing easier, allowing the same test programs piped into dieharder (and later PractRand, see further below) to be used for TestU01 as well. My version of TestU01_stdin adds options to it for testing with bit order reversed in each 32-bit word, as generally suggested as an extra thing to check by M.E. O'Neill.

On the website for the PCG random number generator, M.E. O'Neill offers a quick how-to for trying out TestU01, a statistical test system which generally raises the bar further compared to dieharder. I found the page useful for quickly getting going testing my ranoise32 functions, and it's trivial to change the provided example to do so.

I don't aim for this side-project of mine to archive pseudo-perfect randomness, or even succeed in passing the BigCrush test. I'm not even concerned about my functions showing some flaws on the smaller tests, while still useful and without unnecessary flaws (those avoidable without more complexity). And it's already doing much better in testing than practically useful PRNGs like plain 32-bit LCGs and xorshift32 (an PRNG I like when it's good enough), with TestU01 as with dieharder. (Both those alternatives have some SmallCrush failures.)

Update: I ended up experimenting until I reached what seems the limit of how good I can make a simple function, at 5 BigCrush failures (mainly 4 "Gap" tests that fail in all variations of my function, while other failures vary more); this may be close to the practical limit for PRNGs with only 32 bits of state (general-purpose PRNGs with truly high-quality output have larger state). Lower bits may still be weaker than higher, though. For the best-in-test version of my function, skip down one section.

For practical audio purposes – what people can hear – it instead makes sense to strip the later code down to something with results comparable to LCGs and xorshift32 in SmallCrush tests, i.e. some failures. The resulting signal sounds nice to me, and seems to have a little more variation smoothly shifting over time than the output of an LCG.

Bitrotation tweak statistics

The first series of tests (2022-03-23) was followed by another (2022-03-26) for a new bitshift length selection which was the best for a few offset numbers. This is about Crush tests and a not-quite-final version.

Tweaking the bitrotation offset is worth testing further, though it's also potentially a main parameter to vary for the purpose of producing different streams from the same input. Between e.g. 8–23, the 16 varieties of noise are all of reasonable quality. The other constants in the function, 6 and 15 for the bitshift length of the extra two xor-with-bitshift, can also be varied a little up and down; so far 9 seems like a good difference to have between them; changing either or both affects which other variations pass TestU01 SmallCrush in a somewhat random-seeming way, but a few picks also make performance better in the medium-sized Crush test.

To save a bit of time (half an hour per Crush test), for now I've mainly tested the less-bad 5–27 range of offset numbers which passed Diehard Squeeze in earlier tests (mentioned further above). Testing with TestU01 in the simplest way, using a plain counter increased when calling my function, typically 0–1 tests fail in SmallCrush, depending on the tweak tested (which test ends up failing varies). For the medium-sized Crush test, which is the focus below, the number of failures was 10 for the offset 16 and bitshifts 6 and 15. One particular tweak (using an offset of 14 instead) was able to halve the failures (and SmallCrush also passed with that version), and a tweak of bitshifts brought failures down to just 3.

Below are tables of the number of failed Crush tests for different offset numbers, when calling the function with a counter increasing from a 0 start. (The offsets in parentheses also have a SmallCrush failure.) I also listed (to be taken with a grain of salt) the number of differently-named Crush tests which failed as the number of types of failed tests.

For bitshift lengths 6 and 15
Bitrotation offset Failures Types
(4) 21 10
5 14 10
6 16 10
7 11 8
(8) 8 7
9 9 9
10 7 6
11 9 9
12 6 6
13 7 6
14 5 4
(15) 6 6
(16)10 8
17 8 8
18 10 9
19 8 7
20 8 8
21 7 6
(22) 7 7
23 7 6
24 9 6
25 7 6
26 11 8
27 13 8
(28) 23 9
For bitshift lengths 7 and 16
Bitrotation offset Failures Types
4 18 10
5 14 9
6 9 6
(7) 8 7
8 6 6
9 6 6
10 4 4
11 5 5
12 5 5
13 4 4
14 3 3
15 4 4
16 9 7
17 7 7
(18) 11 9
(19) 6 6
20 7 7
21 5 5
22 5 5
23 5 5
24 6 6
25 7 6
26 8 6
27 14 9
28 16 9

The offset number 14 does the best in these Crush tests. But there's not that clear a correspondence between these scores and how good it sounds, to the best of my own discernment. I'm convinced that trying to pick these variations by ear wouldn't minimize the number of failed Crush tests, but would rather result in picking some random number which is mediocre or better. (As an aside, beyond these tests I would expect as a general rule that the best-sounding noise may be a little too regular for chance in some ways, e.g. how amplitude fluctuates.) Subjectively, I'm happy with the sound of the version with 14, among some others.

Going back to and expanding on some of my old remarks about "intermediate noise" (which can be listened to when temporarily changing the algorithm to assign rather than multiply-assign the result of the bitrotation), there's a fairly big difference in how the intermediate noise sounds for 14 and 16. (The output for testing by ear in this way was scaled down to 16-bit samples at 96 kHz.) Where 16 gave a clearer and sharper metallic sound, 14 had a more subdued and slightly bassier coloration (and was one of several softer-sounding offsets). However, those qualities change into something much subtler and different in the real output (when the missing multiplication is added back in). The number 16 is unique in making the bitrotation for an "average" number 0 (as 16 + 16 modulo 32 makes 0).

All that musing aside, a version of my function with the offset 14 seems in order. It seems to make sense to my ears along with the statistical tests.

A best pick

Update: I now call this version "ranoise32_old". Compared to later versions which do better with TestU01 testing and PractRand testing, it has characteristics (including statistical flaws) which are more similar to those of LCGs. It is "too smooth", a little like LCGs.

int32_t ranoise32_old(uint32_t x) {
        x *= FIBH32;
        x *= ROR32(x, x + 14);
        x ^= (x >> 7) ^ (x >> 16); // improve worse lower bits with higher bits
        return x;

If you want 16-bit output, then you can obviously reduce the third part to just x ^= (x >> 7);, and then keep the highest 16 bits. But if you use the function to create a random amplitude signal, for that purpose you may be able to get away with removing that whole line, since the difference in the quality of lower bits will (despite making the function a bad random number generator) generally be difficult to hear or see (unless perhaps the noise is examined in detail in isolation).

Other small changes that do more

What more would be needed to reduce remaining flaws? I missed a few basic things that do more to improve the function than the above tweak, one of which allows passing all Crush tests; when one or more of those changes are made, the above tables are obsolete. When all are made, the number of BigCrush failures are reduced but some still remain.

Or-by-1, or what?

It turns out adding an or-by-1 is enough to make the above function fully pass the medium-sized Crush test, when used with an increasing counter. Furthermore, doing so greatly changes the picture of how well other tweaks to the function work, e.g. all bitrotation offsets become basically fine to use. Taking the next step with BigCrush testing, there's still 13 failures for the below version. Setting one or two well-chosen bits can be well-worth it, but which ones are best?

int32_t ranoise32_or1(uint32_t x) {
        x *= FIBH32;
        x = (x | 1) * ROR32(x, x + 14);
        x ^= (x >> 7) ^ (x >> 16); // improve worse lower bits with higher bits
        return x;

An or-by-1 addition for one side of the main multiplication almost always seems worth it according to the statistical tests, but another bit may also be worth setting. I made a separate crude test for looking at avalanching for different starting points, and realized that for each bitrotation offset, a corresponding bit in the multiplier can be set as a minimal fix to "normalize" what the line of code does, in a certain sense. Without that and no "or", for some large power-of-two input numbers, the result is zero when the offset is not zero. An or-by-1 fixes lower but not upper bits. Adding an or, but leaving out the 1 bit, makes for worse avalanching.

Bitrotation value source

There's also a different type of flaw to my function, which has to do with its use with arguments that are changed in increments that are large and lack in lower bits. It would be better to use the upper 5 bits rather than the lower 5 bits to decide the bitrotation, because then the function is maximally sensitive to a wide range of argument changes and can be used as flexibly as possible. Combined with the "or"-change, this also improves BigCrush results further. (With BigCrush, my testing of variations is however not exhaustive, because it takes long, so keep that in mind; I've skipped to testing other things upon finding – also through other means – each new simple improvement.)

For simpler testing of variations, I'll now introduce a new operation, which builds upon "ror32" (rotate right 32 bits), namely "muvaror32" (multiplicatively mix with variable rotation to the right, 32 bits), with the "or"-adjustments included in the macro.

#define FIBH32 2654435769UL // 32-bit Fibonacci hashing constant

#define ROR32(x, r) \
        ((uint32_t)(x) >> ((r) & 31) | (uint32_t)(x) << ((32-(r)) & 31))

/** Multiplicatively mix bits using varying right-rotation,
    for 32-bit unsigned \p x value, \p r rotation, \p ro offset. */
#define MUVAROR32(x, r, ro) \
        (((uint32_t)(x) | ((1<<((ro) & 31))|1)) * ROR32((x), (r)+(ro)))

Changing the function to use it (e.g. x = MUVAROR32(x, x >> 27, 14);), the bitrotation offset is no longer as important to the quality of the result – using 0 (which simplifies away the addition) gives as good a result as using 14, while some other numbers give only somewhat different results – which seems good for the purpose of varying the offset for 32 different variations of noise. The number 16 is now better, with 6 failed tests (of 3 types), while 0 and 14 both have 8 similar failures (of 4 types). I didn't at this point test further to the point of picking one best number.

Placement of xor-and-rightshift?

Comparing the structure of my function with conventional integer hash functions, and thinking more about how the alternating xor-and-rightshift and multiplication sequences work in them, I began to think it was another silly mistake to put both my xor-and-rightshifts at the end of my function. While the variable bitrotation made the result usable anyway, it seemed suboptimal not to reorganize it. Placing one before and one after the main multiplication, and picking new bitshift values for them, also improved BigCrush results further – for the special case of a 0 bitrotation offset. (As an aside, it's also easy to change such a function to use simply x = (x | 1) * x; for the main multiplication and still have it pass SmallCrush.) Later, I realized that for other offsets, the older arrangement however works better.

While testing functions of this form, 14 as the length of the first bitshift (before the multiplication) made it the easiest to get a good result when testing variations of the function. For the final shift after the multiplication, 13 worked the best together with the rest of the function. 15 and 14 is another pair that works roughly as well in TestU01 testing.

A simplest code pick

Versions which simplify away the bitrotation offset (by making it 0) have slightly slimmer code than the others now in testing. They are also simple to write without the new macro. Below is such a function, which also uses the different xor-and-rightshift arrangement, and passes Crush but has 5 BigCrush failures – mainly four failed "Gap" tests – when tested with an increasing counter passed as the argument.

int32_t ranoise32a(uint32_t x) {
        x *= FIBH32;
        x ^= x >> 14;
        x = (x | 1) * ROR32(x, x >> 27);
        x ^= x >> 13;
        return x;

Quick and dirty practical use for audio

Update: This code is too slow to be worthwhile using on x86-64 if aiming for speed. Something without a variable bitrotation, like the ranfast32 variation, or alternatively picking some SplitMix32 version, runs much faster.

For audio purposes, I simply used the value scaling and the MUVAROR32 macro above directly, rather than any of the full ranoise32 functions. The two lines to apply both contain the core of the latest algorithm, and that macro is perhaps the essence of this mini project. When the rest (the xors and shifts) is peeled away, it still sounds fine, but SmallCrush failures appear on testing.

For running several noise generators with different audio outputs, I used another PRNG to set a different initial position for each. I found the stereo sound when running two, one in each channel, nicely varied over time, and to me it sounds a little more lively than the somewhat more flatly even "noisescape" from doing the same thing with two 32-bit LCGs.

When a plain sequence of values is wanted (no random access), only one multiplication is needed per value; that outside the MUVAROR32 macro can be replaced with an addition, done to a position state for each new value to produce. For such sequential values, it also works well enough for audio to simply do MUVAROR32(x, x, 0) instead of MUVAROR32(x, x >> 27, 0) for varying 32-bit inputs, though more failures then appear in SmallCrush (mainly "RandomWalk1" appearing multiple times) – thus racking up test failures comparable to those of a plain 32-bit LCG.

Below are minimalistic functions for generating floating point values.

 * Get next noise sample for and update current position \p pos.
static inline float noise_next(uint32_t *pos) {
        uint32_t x = *pos += FIBH32;
        return ((int32_t) MUVAROR32(x, x, 0)) * (1.f / INT32_MAX);

 * Get noise sample at arbitrary 32-bit position \p i.
static inline float noise_get(uint32_t i) {
        uint32_t x = i * FIBH32;
        return ((int32_t) MUVAROR32(x, x >> 27, 0)) * (1.f / INT32_MAX);

(If you want the outputs of the two functions to match, then the same second line should be used in both, though.)

Further testing with PractRand and final changes

Back in April, I also tested with PractRand and made some further changes accordingly. PractRand is sensitive to a broader range of statistical flaws than TestU01, and stops testing upon a clear failure; testing PRNGs with only 32 bits of state to the fullest takes much less time with PractRand than it does using TestU01.

Most old commonly used PRNGs, including e.g. LCGs and xorshift32, fail on a 1 MB test pass. PractRand keeps doubling from the start size used until a pass fails, if it does – and it always does with a PRNG that has only 32 bits of state, 16 GB being the maximum size of passable output in theory if all flaws are caught and output words are also 32 bits in size.

Of my ranoise32 functions, the weaker ones, including the smooth "ranoise32_old", are also among those that fail at 1 MB. The version performing more strongly with TestU01 ("ranoise32a" above) also does better in PractRand – failing during the 2 GB stage, a large difference from failing at 1 MB. That's also a little better than how good 32-bit integer-to-integer hash functions I tried perform; e.g. some variations on a 32-bit "SplitMix" usually fail at either 512 MB or 1 GB. With further tweaks, I got a new "ranoise32b" function to pass until failing during the 16 GB stage.

int32_t ranoise32b(uint32_t x) {
        x *= FIBH32;
        x ^= x >> 14;
        x = (x | 1) * ROR32(x, (x >> 27) + 16);
        x ^= x >> 13;
        return x;

Here, I've discarded the extra bitrotation offset "compensation" I added earlier in a "MUVAROR32" operation/macro. Simply adding an offset of 16 to code otherwise working the same as ranoise32a did the best of what I tried. (However, at one point I had a freak success I can't replicate, where PractRand passed 16 GB with a "VERY SUSPICIOUS" and a great many failures then appeared at the next, 32 GB stage. I'm not sure whether I tested the wrong version of a file and lost a ranoise32 variation which can do that, or if it was due to PractRand somehow working glitchily at the time.)

PractRand results, among those for the other test suites used, can be found in the "ranoise" git repository on Codeberg – including some comparisons with other PRNGs, mainly with 32 bits of state. There the code for all PRNGs I include results for can also be found.

What's the fastest function?

January 2023, I finally got to performance benchmarking my ranoise32 functions (as in looking at speed), and found that a few things about them suck when aiming for speed on x86-64:

  1. The variable bitrotation makes even the slimmest ranoise32 version significantly slower than SplitMix32 variations.
  2. Adding a constant to the bitrotation amount additionally makes for a surprisingly large performance hit.

As such, SplitMix32 variations easily beat all of the above versions of my PRNG for speed. But I had another option, which turned out a little faster than SplitMix32, namely peeling off the bitrotation part and just multiplying by the value.

int32_t ranfast32(uint32_t x) {
        x *= FIBH32;
        x ^= x >> 14;
        x = (x | 1) * x;
        x ^= x >> 13;
        return x;

This is a form of the function I had temporarily used earlier, while experimenting with finding the constants which ended up 14 and 13. As a weakened version of what became ranoise32a, bad choices for the constants usually failed after a short time in TestU01 SmallCrush rather than after a long time in larger tests.

This function is worse in the statistical properties tests than ranoise32a, ranoise32b, and compared to SplitMix32 variations. It however does beat 32-bit LCGs, xorshift32, my own earlier simpler code versions, etc., though that's not so impressive. But it's good enough for audio purposes, for sure.