'MD5 implementation in Ruby

I am trying to implement MD5 in Ruby, following the pseudo code written in the wiki.

Here is the codes, not working well:

# : All variables are unsigned 32 bit and wrap modulo 2^32 when calculating
# s specifies the per-round shift amounts
s = []
s.concat([7, 12, 17, 22,  7, 12, 17, 22,  7, 12, 17, 22,  7, 12, 17, 22])
s.concat([5,  9, 14, 20,  5,  9, 14, 20,  5,  9, 14, 20,  5,  9, 14, 20])
s.concat([4, 11, 16, 23,  4, 11, 16, 23,  4, 11, 16, 23,  4, 11, 16, 23])
s.concat([6, 10, 15, 21,  6, 10, 15, 21,  6, 10, 15, 21,  6, 10, 15, 21])

# Use binary integer part of the sines of integers (Radians) as constants:
k = 0.upto(63).map do |i|
  (Math.sin(i+1).abs * 2 ** 32).floor
end

# Initialize variables:
a0 = 0x67452301 # A
b0 = 0xefcdab89 # B
c0 = 0x98badcfe # C
d0 = 0x10325476 # D

message = File.read(ARGV[0])

# Pre-processing
# with bit stream "string" (MSB)
bits = message.unpack('B*')[0]

org_len = bits.size

bits << '1' # adding a single 1 bit
bits << '0' while !((bits.size + 64) % 512 == 0) # padding with zeros
bits << (org_len % 2 ** 64).to_s(2).rjust(64, '0')


message32 = [bits].pack('B*').unpack('V*') # ?
# 1. bits.scan(/(.{8})/).flatten.map { |b| b.to_i(2).to_s(16).rjust(2, '0') }.each_slice(16) { |c| puts c.join(' ')} => for test
# 2. [bits].pack('B*').unpack('N*') == bits.scan(/(.{32})/).flatten.map { |b| b.to_i(2).to_s(16).rjust(8, '0').to_i(16) } => true



# custom operations for wrapping the results as modulo 2 ** 32
class Integer
  def r
    self & 0xFFFFFFFF
  end

  def rotate_l(count)
    (self << count).r | (self >> (32 - count))
  end
end

# Process the message in successive 512-bit chunks:
message32.each_slice(16).each do |m|
  a = a0
  b = b0
  c = c0
  d = d0

  0.upto(63) do |i|
    if i < 16
      f = d ^ (b & (c ^ d))
      g = i
    elsif i < 32
      f = c ^ (d & (b ^ c))
      g = (5 * i + 1) % 16
    elsif i < 48
      f = b ^ c ^ d
      g = (3 * i + 5) % 16
    elsif i < 64
      f = c ^ (b | ~d)
      g = (7 * i) % 16
    end

    f = (f + a + k[i] + m[g]).r
    a = d
    d = c
    c = b
    b = (b + f.rotate_l(s[i])).r
  end

  a0 = (a0 + a).r
  b0 = (b0 + b).r
  c0 = (c0 + c).r
  d0 = (d0 + d).r
end

puts [a0, b0, c0, d0].pack('V*').unpack('H*')

I'm testing with messages, well known for collision in just one block:

They are resulted in the same value, but not correct:

❯ ruby md5.rb message1.bin
816922b82e2f8d5bd3abf90777ad72c9

❯ ruby md5.rb message2.bin
816922b82e2f8d5bd3abf90777ad72c9

❯ md5 message*
MD5 (/Users/hansuk/Downloads/message1.bin) = 008ee33a9d58b51cfeb425b0959121c9
MD5 (/Users/hansuk/Downloads/message2.bin) = 008ee33a9d58b51cfeb425b0959121c9

I have an uncertainty about pre-processing steps.

I checked the bit stream after pre-processing, with the comments on the line 34 and 35, the original message written in same and the padding bits are right:

❯ hexdump message1.bin
0000000 4d c9 68 ff 0e e3 5c 20 95 72 d4 77 7b 72 15 87
0000010 d3 6f a7 b2 1b dc 56 b7 4a 3d c0 78 3e 7b 95 18
0000020 af bf a2 00 a8 28 4b f3 6e 8e 4b 55 b3 5f 42 75
0000030 93 d8 49 67 6d a0 d1 55 5d 83 60 fb 5f 07 fe a2
0000040
(byebug) bits.scan(/(.{8})/).flatten.map { |b| b.to_i(2).to_s(16).rjust(2, '0') }.each_slice(16) { |c| puts c.join(' ')}
4d c9 68 ff 0e e3 5c 20 95 72 d4 77 7b 72 15 87
d3 6f a7 b2 1b dc 56 b7 4a 3d c0 78 3e 7b 95 18
af bf a2 00 a8 28 4b f3 6e 8e 4b 55 b3 5f 42 75
93 d8 49 67 6d a0 d1 55 5d 83 60 fb 5f 07 fe a2
80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00
nil

What am I missed?



Solution 1:[1]

The one most classical mistake in implementing MD5 is botching endianness: the padded message and the message length are to be turned to 32-bit words per little-endian convention, so that message 'abc' in ASCII (0x61 0x62 0x63) is turned to a 16-word padded message block with m[0]=0x80636261, m[14]=0x18, and m[1…13,15]=0.

I never wrote anything in Ruby, but I get the feeling the code yields m[0]=0x61626380, m[15]=0x18, and m[1…14]=0.

Also: the 4-word result is to be turned to 16 bytes per little-endian convention too.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1