armadillo, lapack, and intel’s MKL
I normally use ruby/gsl for everyday numerics because it’s fast to code and extend, and semantically clear, plus you can do cool things like previewing energy spectrum from within irb by interfacing with the gnuplot module—normally you’d write to a tmp file, switch to gnuplot console, and plot it there.
The problem with rb/gsl is speed. The other day, my boss walked in and compiled his fortran code on my machine. It beat my ruby code hands-down—one would think gsl, being a C library, should be on par with lapack, but apparently it’s not, and it’s not (entirely) ruby’s fault since I rewrote the same stuff in C, using gsl, and it’s still about 4 times slower than my boss’s fortran code using lapack. (I guess partly it’s b/c gsl is using its own blas implementation).
So I want something faster but still with natural semantics—e.g., matrix multiplication should be A*B, not dgremm(...A...B...)—so C++ seems a reasonable place to look at. So far I find armadillo suits my needs quite well. It is a c++ linear algebra library interfacing with lapack/blas.
On gentoo, this is what one needs:
- If you have intel cpu, emerge sci-libs/mkl. This is a blas/lapack drop-in replacement from Intel. You need to go to intel’s website and register (free) for a non-commercial usage license. AMD has a similar library. Otherwise, just setup your blas/lapack of choice.
- use eselect blas/cblas/lapack list/set to choose mkl as the blas/lapack implementation to use.
- now emerge sci-libs/armadillo. It is important to install it after 1) and 2) b/c it will decide what library it’s linking against during compilation time
Now, #include <armadillo> in your cpp code, and use
g++ -O3 -larmadillo your_code.cpp -o your_exe
to compile. You don’t have to use -lblas -llapack as these are already taken care of by -larmadillo (which was done in 3 above, that’s why emerge order is important).
On a Mac OS X (my office workstation), the appropriate way to compile is instead:
g++ -O3 -framework Accelerate your_code.cpp -o your_exe
as per armadillo’s readme file.
I transcribed my ruby/gsl code into c++/armadillo, and on the Mac, it actually is 2 times faster than my boss’s fortran code—since both are using the same underlying blas/lapack implementation, I suppose there’s some unnecessary computation in the fortran code. At any rate, it seems safe to say the c++ overhead is not significant so I’ll happily settle for armadillo for the moment.