Friday, November 30, 2012

Magma1.3 + CUDA 5.0 + Ubuntu 11.10

At my work, we are trying to switch to Linear Algebra Solvers that out performs Lapack due to the size of problems we are interested in.

I just successfully installed and tested Magma1.3 along with CUDA 5.0 on an Ubuntu 11.10 server. I hope my note here would be helpful for whom just switched to CUDA 5.0 and want to use Magma 1.3 with the improved CUBLAS library.

1. I downloaded Magma 1.3 from http://icl.cs.utk.edu/magma/software/index.html and extracted it.

2. To compile Magma, I had to provide a make.inc file. Thanks to Magma Team, they'd provided sample make.inc files for different environment. There are four options,
  1. Intel's MKL
  2. AMD's ACML
  3. Netlib's LAPACK + ATLAS
  4. Netlib's LAPACK + GotoBLAS2
For each option, the sample make.inc.$(LIB) file are provided. Following the advice and instructions from  http://www.pavanky.com/installing-magma-with-gotoblas2/, I picked LAPACK and BLAS2 as well. Therefore, we have to install GotoBLAS2 first.

3. GotoBlas2 can be downloaded at Taxes Advance Computing Center. The GOTOBLAS2 is not in active development, so after downloading and untaring, it is not surprising I had to do some miner change. Thankfully, the link I was referencing provided the trick. In f_check file add a new line after line 237,
$link =~ s/\ -rpath\ s+/\ -rpath\@/g; # line 237
$link =~ s/\-l\ /\-l/g; # Add this. 
  
I'm using gcc4.6, it doesn't like empty linking string after "-l". The patch fixed the problem. 
After compilation,  copy libgoto2_barcelonap-r1.13.a to the library folder 
Note: For Intel Xeon-based architecture, we need to uncomment  the Nehalem line in file getarch.c.
Change /* #define FORCE_NEHALEN */ to  #define FORCE_NEHALEN

 
4. I referenced a post on Magma forum for modifying make.inc, to compile Magma test files which uses gcc-4.5 and gfortran-4.5, I had to make soft links from gcc-4.6 to gcc-4.5 and gfortran-4.6 to gfortran. I also had to change -lgoto to -lgoto2 and add flags -lblas, -lgfortran, -ldl and -lstdc++. Ofcourse, other environment variables are set to match my machine as well.
My make.inc is as the following:
 
GPU_TARGET = Tesla
CC = gcc-4.5
NVCC = nvcc
FORT = gfortran-4.5
ARCH = ar
ARCHFLAGS = cr
RANLIB = ranlib
OPTS = -O3 -DADD_
F77OPTS = -O3 -DADD_
FOPTS = -O3 -DADD_ -x f95-cpp-input
NVOPTS = -O3 -DADD_ --compiler-options -fno-strict-aliasing -DUNIX
LDOPTS = -fPIC -Xlinker -zmuldefs
LIB = -lblas -lgfortran -lgoto2 -lpthread -lcublas -lcudart -llapack -lm -ldl -lstdc++
CUDADIR = /usr/local/cuda
LIBDIR = -L/usr/local/cuda/lib64 -L/usr/lib64
INC = -I$(CUDADIR)/include -I/usr/include
                                                   
5. The last things is that for some reason, the /usr/local/cuda-5.0/src/fortran_common.h skipped
defining micro CUBLAS_GFORTRAN for linux based machine so it breaks the compilation for
test files. I added few lines at lines 61 to define the micro CUBLAS_GFORTRAN as the following,
#if defined(_WIN32)
#ifndef CUBLAS_INTEL_FORTRAN
#define CUBLAS_INTEL_FORTRAN
#endif
#elif defined(__linux)
#ifndef CUBLAS_GFORTRAN
#define CUBLAS_GFORTRAN
#endif
#endif
I'm not sure if everybody needs this step, because I suspect myself might havedone something to lead the CUDA 5 installation run script to skip Fortran option a while ago.
If you are following my steps, you should be now good to go. I'm leaving work now to ...hopefully see Stanford get the Pack12 Championship, but this is the the
speed up from a Tesla C1060 and nForce 980a/780a equipped old machine (my playground :p),

  N     CPU Time(s)    GPU Time(s)
===================================
 1024       2.12           0.64
 2048      17.00           1.22
 3072      56.24           3.31
 4032     126.69           6.43
 5184     279.86          12.52



I'm already excited about the speedup. Good luck!




  

No comments:

Post a Comment