Wednesday, August 19, 2009

Ruby's Complement Operator

So I was doing stuff with bits today and needed to do a bitwise complement. As it seems, Ruby's Fixnum complement operator (~) doesn't actually do bitwise complement. Working with Ruby 1.8.7 on a 32-bit machine, I get the following things:

irb(main):002:0> a=5
=> 5
irb(main):003:0> a.to_s(2)
=> "101"
irb(main):004:0> a.size
=> 4

This tells us that the binary representation of 5 is 101, and that Fixnums are 4 bytes long. Suppose we want to take the bitwise complement of 5. We would expect, for a 32-bit signed integer, that
~5 = ~101
= ~00000000 00000000 00000000 00000101
= 11111111 11111111 11111111 11111010
= -6

Indeed, we see that ~5 = -6, as Ruby shows:

irb(main):005:0> ~a
=> -6

However, when we look at the binary, we see a completely different binary representation:

irb(main):006:0> (~a).to_s(2)
=> "-110"
irb(main):007:0> (-6).to_s(2)
=> "-110"

Indeed, what we see here is not at all a 32-bit two's complement representation of -6, but we see the number 6 (110) with a negative sign in front of it! Apparently, Ruby stores signed integers up to 31 bits, but does not use that last bit for the sign! Instead, it stores the sign separate from the magnitude. So Ruby says, "take the number 6, make a binary representation of it, and then stick a minus sign in front of it to make it negative"...not the universal binary representation you would expect from any other (good) language.

Thus, the ~ operator in Ruby means {-x - 1} instead of {~x}. If a programmer wanted {-x - 1}, he would just write that! It's so rarely needed (in my experience) that it does not make sense to have an operator for it! But if a programmer needs to work with bits in Ruby (which is much more common), he is left without a suitable bitwise complement operator. You could implement something like this:

class Fixnum
def ~
self ^ ((1 << 32) - 1)
end
end

...and you would get (~5).to_s(2) => 11111111 11111111 11111111 11111010, but this is no longer equal to -6. Since it has overflowed 31 bits, it was automatically converted into a Bignum and now represents 4294967290.

Once again, Ruby has taken something that every other language has and removed it by design. For bit manipulation, use C. For string processing, use Perl. For websites and/or database access, use PHP. For passing around arbitrary blocks of code and not knowing what type your variables are, use Ruby.
</RANT>

1 comment:

Matijs van Zuijlen said...

Probably, ~ works as advertised, but .to_s(2) doesn't do what you think it does.

Note for example that:

>> ~5 & 0b11111111
=> 250

So it looks like more bits are set than just '101' and the sign.