Skip to content

CUDA – Matrix Multiplication

November 2, 2013

First update on my Bachelor’s Project work.

In order to demonstrate the computing power of a GPU, I performed Matrix Multiplication on a CPU and a GPU. As expected, the GPU beat the CPU by a satisfactory ratio (given that this GPU belongs to one of the older generations). I used the Nvidia GeForce 210 for my computation. Given that it only has 16 CUDA cores (2 Multiprocessors and 8 Cores per MP), I did not expect a huge speed-up ratio.

My code allowed non-square matrix multiplications, but I did not compare their results. Instead, I only compared results for square matrices. The following results were obtained:



I avoided going beyond 1000 elements, although its not really a big issue for a processor. I can provide results for a larger order of dimension on demand. So as you can see, a GPU is more efficient than a CPU IF AND ONLY IF your task is computationally expensive. For 3×3 Matrices or even up to 25×25 Matrices a GPU kernel actually ran slower than a CPU function, meaning the task wasn’t really compute oriented and you never really needed parallelism. A plot of the computation times is shown below:


As you cross the 25×25 barrier and go towards a 50×50 dimension, you can see that the GPU makes the most of its parallel architecture. And thereon, the GPU starts dominating. The speed-up ratio keeps increasing at a higher and higher pace – which clearly shows that as the task becomes computationally more expensive, the GPU outperforms a CPU with a bigger margin. A plot of Speed-up ratio versus dimension is shown below:



In case we go beyond the 1000 elements mark, we will see an even greater speed up factor. I am currently working on the Cholesky Decomposition Algorithm for Matrix Inversion. Apparently, a modern GPU can compute large dimensional matrix inversion faster than the inbuilt MATLAB inversion based on Cholesky Algorithm. Lets see.

Note: Code will be provided, only if contacted.


From → GPU & GPGPU, Project

  1. GPUStudent permalink

    Hello Rohit. I was wondering if i could have your email. I need to ask a few questions regarding matrix multiplication on CUDA.

Trackbacks & Pingbacks

  1. 2-D Transient Heat Conduction – Part 1 | The Elancer

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: