Helpful Information

Category: Software Design

Fastest hashing algorithm

Hi, I've got to build hash of a string in order to check for changed ones. As I'll have to do this on thousend to hundreds of thousends of strings I'm looking for the fastest algorithm, maybe MD5, CRC32 or other.
Thanks in advance for your help!

Emmh.. I feel sorry for this thread NOT BEING RESPONDED!!
So I ran a benchmark for myself today, and wanted to share it.

On 1.000.000 Iterations (using a 16 character length string as data) - Also, please use this numbers for ratio purposes only and dont rely on them, this changes in CPU and MEMORY speeds also motherboards, etc... It's just for a ratio-ing comparison.

Values are averaged out of 15 tests ran on 2 different systems :)

MD5:
6500ms

CRC32:
70ms

now, with a longer string ( 128 in length )

MD5:
8110ms

CRC32:
515ms

and 256 chars buffer:

MD5:
9110ms

CRC32:
1050ms

As you can see, as the data buffer gets bigger, the CRC32 algorithm gets much slower than MD5. But I would still use it against MD5 Unless you had a good reason (of which I cant decide/comment since I haven't got the details of your project).

I hope it helps someone at least .. And I wish I could of came sooner.

As you can see, as the data buffer gets bigger, the CRC32 algorithm gets much slower than MD5.Care to back up this assertion? In none of your tests does MD5 actually outperform CRC32.

Hi, I've got to build hash of a string in order to check for changed ones. As I'll have to do this on thousend to hundreds of thousends of strings I'm looking for the fastest algorithm, maybe MD5, CRC32 or other.
Thanks in advance for your help!
you probably also want a good hashing algorithm, right? i.e. few collisions, etc?

because otherwise, I have the fastest hashing algorithm ever:
return 42; <-- this number doesn't matter, the point is that it is useless because it gives no info about the input

i think the fastest hashing algorithm is something XOR based, just read in however much you want for your hash, set that as your hash, then for the main loop read in the hash length and XOR it with your hash to make a new one, and just repeat until you have read the whole file

of course this is almost never used because it its probably one of the words hashes with collisions etc, but if the only goal is speed, then this will do, and it is used for that where speed is the main concern (i believe the data checks done on a PCI bus and EEC RAM are basically this hash)

[edit]
hmm, this thread is 3 years old..please don't bring zombies back :)

How many changes on average do you expect to see?

Do you have to deliver even response times, or can degenerate cases suck?

Is memory constrained?

There's not enough information about the problem here to even begin to pick a reasonable solution.

I'd like to thank jorgejones for his answer even if two years late.
I'll not add details as the problem is a bit ... outdated ;) but some questions are definitely reasonable.