As you probably know, BSD and GNU have similar tools, but often completely different codebase. So, for example, there is a BSD grep and a GNU grep, and they are developed by different people and have also different parameters (some of the most popular ones are the same, like -r or -n).
What I didn’t know is that the performance of the two is actually quite different! This is an email written by the creator of GNU grep explaining some of the tricks they use to make it faster. The email is from 2010, so you may think that the BSD implementation could have caught up, but looks like it hasn’t.
According to
this thread (from a few days ago) this is the performance difference.
$ time /usr/local/bin/ggrep “foobarbaz” application.log
real 0m1.319s
user 0m0.948s
sys 0m0.345s
$ time /usr/bin/grep “foobarbaz” application.log
real 0m37.225s
user 0m31.036s
sys 0m1.286s
For the same 1.5GB file size, the GNU implementation finishes in 1.3 seconds, while the BSD implementation takes 37 seconds! That’s a quite important difference!
Also, this has been tested on the latest OSX version, because OSX uses the BSD grep (although I don’t know if it’s the latest version, maybe I should run the same test on the latest BSD version for better comparison).
It’s actually interested the tricks that GNU grep uses to make it faster, and it could be interesting to read the
source code and study the algorithm they use and try to make a similar clone as fast as possible.