Machine Learning With GPU (1): CUDA

ade sueb
The Startup
Published in
3 min readJul 29, 2020

--

If you working with Machine Learning using GPU this story is the answer.

The story behind this is when I try to run a tensorflow python script, it comes with this warning message :

2020-07-29 17:11:59.393693: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory

So i search on google what is libcudart.so. And I found that libcudart is CUDA library from NVIDIA that can use GPU to run.

wow.. that is something for me, because so far i only used GPU for gaming, and now i can code with GPU.

OK.. let’s get started…

CUDA

We need bridge from our code to access GPU. If you used NVIDIA for your GPU then CUDA is the answer

Hardware requirements

Software requirements

The following NVIDIA® software must be installed on your system:

Make sure the CUDA Toolkit and Driver compatibility

Look at the table, before you install CUDA toolkit, make sure the backward compatibility between CUDA Toolkit and NVIDIA driver

How to install CUDA

Easiest way to installCUDA execute the following commands:

$ sudo apt update 
$ sudo apt install nvidia-cuda-toolkit

All should be ready now. Check your CUDA version:

$ nvcc — version 
nvcc: NVIDIA (R) Cuda compiler driver Copyright © 2005–2019 NVIDIA Corporation Built on Sun_Jul_28_19:07:16_PDT_2019 Cuda compilation tools, release 10.1, V10.1.243

noted : If you want to get the latest version or specific version of CUDA you can follow this link

Confirm the installation by compiling an example CUDA C code. Save the following code into a file named eg. hello.cu:

#include <stdio.h>__global__
void saxpy(int n, float a, float *x, float *y)
{
int i = blockIdx.x*blockDim.x + threadIdx.x;
if (i < n) y[i] = a*x[i] + y[i];
}
int main(void)
{
int N = 1<<20;
float *x, *y, *d_x, *d_y;
x = (float*)malloc(N*sizeof(float));
y = (float*)malloc(N*sizeof(float));
cudaMalloc(&d_x, N*sizeof(float));
cudaMalloc(&d_y, N*sizeof(float));
for (int i = 0; i < N; i++) {
x[i] = 1.0f;
y[i] = 2.0f;
}
cudaMemcpy(d_x, x, N*sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(d_y, y, N*sizeof(float), cudaMemcpyHostToDevice);
// Perform SAXPY on 1M elements
saxpy<<<(N+255)/256, 256>>>(N, 2.0f, d_x, d_y);
cudaMemcpy(y, d_y, N*sizeof(float), cudaMemcpyDeviceToHost);float maxError = 0.0f;
for (int i = 0; i < N; i++)
maxError = max(maxError, abs(y[i]-4.0f));
printf("Max error: %f\n", maxError);
cudaFree(d_x);
cudaFree(d_y);
free(x);
free(y);
}

Use nvcc the Nvidia CUDA compiler to compile the code and run the newly compiled binary:

$ nvcc -o hello hello.cu 
$ ./hello
Max error: 0.000000

Max error supposed to 0.00000, if you got 2.0000 then probably there are not matching between your NVIDIA driver and CUDA toolkit.

Run the Script

So now after succeed install CUDA, i can running the script very well without warning message.

Sometimes when you got CUDA 10.2, but Tensorflow requires CUDA 10.1 you can do symlink and change the version like this: ln -s /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2 /usr/lib/x86_64-linux-gnu/libcudart.so.10.1

cuDNN??

Are you working with Machine Learning?? CUDA is not enough for this.. prepare cuDNN, cuDNN is a GPU-accelerated library of primitives for deep neural networks. don’t worry you can follow this link.

--

--

ade sueb
The Startup

Still believe, can change the world with code..