The Problem:
Design an efficient algorithm to calculate the maximum of the count of trailing zeros for two given non-negative integers x and y. The count of trailing zeros for a non-negative integer is the number of zero bits at the end of its binary representation. For example, for x = 11 (binary 1011) and y = 3 (binary 0011), the count of trailing zeros for x is 1 and the count of trailing zeros for y is 2. The desired algorithm should have a better time complexity than the naive approach of finding the count of trailing zeros for each integer separately and then finding the maximum. The input integers x and y can be represented as 32-bit or 64-bit unsigned integers.
The Solutions:
Solution 1: Math-Identity
The following formula holds:
max(ctz(x), ctz(y)) = ctz((x|-x) & (y|-y))
The math-identity ctz(x)+ctz(y)-ctz(x|y)
is equivalent to the above formula. It can be implemented efficiently using only a few CPU instructions.
int32_t test2(uint64_t x, uint64_t y) {
return ctz((x|-x) & (y|-y));
}
pub fn test2(x: u64, y: u64) -> u32 {
x.trailing_zeros().max(y.trailing_zeros())
}
Solution 2: {title}
{solutuon body}
Martin Kealey’s approach arguably combines ideas from the naive approach and the sum approach. We know from above that ctz(x | y) = ctz(min(x, y))
(which we will refer to as t
in this comment for brevity). The naive approach calculates ctz(x)
and ctz(y)
and then selects the maximum value. This is equivalent to calculating ctz(max(x, y) + t
.
Kealey’s approach calculates ctz(max(x, y) | t)
. This works because ctz(x | y) = ctz(max(x, y) + t)
:
- If
x >= y
, thenmax(x, y) + t = x + t = x | y
. - If
x <= y
, thenmax(x, y) + t = y + t = y | y = y
.
This gives an efficient way to calculate the maximum result without actually knowing ctz(x)
and ctz(y)
separately. It is likely that the optimizer will reduce this directly to ctz(x | y | x - y)
, which is what Nielsen’s algorithm boils down to with some additional bit hacking.
Solution 3: The bit manipulation approach
We can use bit manipulation techniques to efficiently calculate `max(ctz(x), ctz(y))`. Here’s an explanation of how it works:
- First, we calculate the `low zeros` of both `x` and `y` using bitwise operations:
- The `low zeros` of a number represent the consecutive trailing zeros in its binary representation. For example, if `x = 0b11010000`, then `lowx = 0b00001111`.
- Next, we combine the `low zeros` of `x` and `y` using a bitwise OR operation and add 1 to it:
- This `combined` value represents the position of the highest non-zero bit in either `x` or `y`. In other words, it’s the position of the most significant bit in the maximum of `ctz(x)` and `ctz(y)`.
- Finally, we use the `std::countr_zero` function to count the number of trailing zeros in the `combined` value. This gives us `max(ctz(x), ctz(y))`.
“`cpp
uint64_t loxs = ~x & (x-1); // low zeros of x
uint64_t loys = ~y & (y-1); // low zeros of y
“`
“`cpp
uint64_t combined = (loxs | loys) + 1;
“`
“`cpp
return std::countr_zero(combined);
“`
This bit manipulation approach is generally faster than the naive approach of calling `std::countr_zero` twice and taking the maximum, especially for large values of `x` and `y`.
Solution 4: Least Significant 1s of x and y
This solution introduces the getMaxTzInput
function to determine the input for ctz
that yields the maximum value. The function operates as follows:
- It isolates the least significant 1 (LSB) of
x
andy
using bitwise operations. - It combines the LSBs of
x
andy
using bitwise OR, resulting in a value containing the LSBs of both inputs (which may be identical). - It finds the least significant 1 among the combined LSBs using bitwise operations.
- If the LSBs of
x
andy
are distinct, it removes the least significant 1 from the combined LSBs to obtain the next significant 1. - It returns the maximum LSB value among
x
andy
, which serves as the input forctz
to determine the maximum value ofctz(x)
andctz(y)
.
By precomputing the appropriate input using getMaxTzInput
, this solution ensures that ctz
is called only once to find the maximum value, potentially improving performance.
Q&A
Are there faster algorithms for max(ctz(x), ctz(y))
?
The naive approach works best.
Is ctz((a|-a)&(b|-b))
equivalent to max(ctz(a),ctz(b))
?
Yes, and it takes at most 6 CPU instructions.
Is there a non-branching alternative to max(ctz(a),ctz(b))
?
Yes, but it takes at least 4 instructions.
Video Explanation:
The following video, titled "Faster algorithm for max(ctz(x), ctz(y))? - YouTube", provides additional insights and in-depth exploration related to the topics discussed in this post.
Faster algorithm for max(ctz(x), ctz(y))? I hope you found a solution that worked for you 🙂 The Content (except music & images) is licensed ...
The following video, titled "Faster algorithm for max(ctz(x), ctz(y))? - YouTube", provides additional insights and in-depth exploration related to the topics discussed in this post.
Faster algorithm for max(ctz(x), ctz(y))? I hope you found a solution that worked for you 🙂 The Content (except music & images) is licensed ...