Is there a faster algorithm for max(ctz(x), ctz(y))? – Algorithm

by
Ali Hasan
algorithm bit-manipulation c++ micro-optimization rust-analyzer

The Problem:

Design an efficient algorithm to calculate the maximum of the count of trailing zeros for two given non-negative integers x and y. The count of trailing zeros for a non-negative integer is the number of zero bits at the end of its binary representation. For example, for x = 11 (binary 1011) and y = 3 (binary 0011), the count of trailing zeros for x is 1 and the count of trailing zeros for y is 2. The desired algorithm should have a better time complexity than the naive approach of finding the count of trailing zeros for each integer separately and then finding the maximum. The input integers x and y can be represented as 32-bit or 64-bit unsigned integers.

The Solutions:

Solution 1: Math-Identity

The following formula holds:

max(ctz(x), ctz(y)) = ctz((x|-x) & (y|-y))

The math-identity ctz(x)+ctz(y)-ctz(x|y) is equivalent to the above formula. It can be implemented efficiently using only a few CPU instructions.

int32_t test2(uint64_t x, uint64_t y) {
    return ctz((x|-x) & (y|-y));
}
pub fn test2(x: u64, y: u64) -> u32 {
    x.trailing_zeros().max(y.trailing_zeros())
}

Solution 2: {title}

{solutuon body}

Martin Kealey’s approach arguably combines ideas from the naive approach and the sum approach. We know from above that ctz(x | y) = ctz(min(x, y)) (which we will refer to as t in this comment for brevity). The naive approach calculates ctz(x) and ctz(y) and then selects the maximum value. This is equivalent to calculating ctz(max(x, y) + t.

Kealey’s approach calculates ctz(max(x, y) | t). This works because ctz(x | y) = ctz(max(x, y) + t):

  • If x >= y, then max(x, y) + t = x + t = x | y.
  • If x <= y, then max(x, y) + t = y + t = y | y = y.

This gives an efficient way to calculate the maximum result without actually knowing ctz(x) and ctz(y) separately. It is likely that the optimizer will reduce this directly to ctz(x | y | x - y), which is what Nielsen’s algorithm boils down to with some additional bit hacking.

Solution 3: The bit manipulation approach

We can use bit manipulation techniques to efficiently calculate `max(ctz(x), ctz(y))`. Here’s an explanation of how it works:

  • First, we calculate the `low zeros` of both `x` and `y` using bitwise operations:
  • “`cpp
    uint64_t loxs = ~x & (x-1); // low zeros of x
    uint64_t loys = ~y & (y-1); // low zeros of y
    “`

  • The `low zeros` of a number represent the consecutive trailing zeros in its binary representation. For example, if `x = 0b11010000`, then `lowx = 0b00001111`.
  • Next, we combine the `low zeros` of `x` and `y` using a bitwise OR operation and add 1 to it:
  • “`cpp
    uint64_t combined = (loxs | loys) + 1;
    “`

  • This `combined` value represents the position of the highest non-zero bit in either `x` or `y`. In other words, it’s the position of the most significant bit in the maximum of `ctz(x)` and `ctz(y)`.
  • Finally, we use the `std::countr_zero` function to count the number of trailing zeros in the `combined` value. This gives us `max(ctz(x), ctz(y))`.
  • “`cpp
    return std::countr_zero(combined);
    “`

This bit manipulation approach is generally faster than the naive approach of calling `std::countr_zero` twice and taking the maximum, especially for large values of `x` and `y`.

Solution 4: Least Significant 1s of x and y

This solution introduces the getMaxTzInput function to determine the input for ctz that yields the maximum value. The function operates as follows:

  1. It isolates the least significant 1 (LSB) of x and y using bitwise operations.
  2. It combines the LSBs of x and y using bitwise OR, resulting in a value containing the LSBs of both inputs (which may be identical).
  3. It finds the least significant 1 among the combined LSBs using bitwise operations.
  4. If the LSBs of x and y are distinct, it removes the least significant 1 from the combined LSBs to obtain the next significant 1.
  5. It returns the maximum LSB value among x and y, which serves as the input for ctz to determine the maximum value of ctz(x) and ctz(y).

By precomputing the appropriate input using getMaxTzInput, this solution ensures that ctz is called only once to find the maximum value, potentially improving performance.

Q&A

Are there faster algorithms for max(ctz(x), ctz(y))?

The naive approach works best.

Is ctz((a|-a)&(b|-b)) equivalent to max(ctz(a),ctz(b))?

Yes, and it takes at most 6 CPU instructions.

Is there a non-branching alternative to max(ctz(a),ctz(b))?

Yes, but it takes at least 4 instructions.

Video Explanation:

The following video, titled "Faster algorithm for max(ctz(x), ctz(y))? - YouTube", provides additional insights and in-depth exploration related to the topics discussed in this post.

Play video

Faster algorithm for max(ctz(x), ctz(y))? I hope you found a solution that worked for you 🙂 The Content (except music & images) is licensed ...