Running CUDA on Raspberry Pi

Posted by
/ / Leave a comment

(UPDATE) DISCLAIMER: Please note that V-GPU emulates virtual GPU(s) and dispatch the CUDA kernel to remote GPU server in a transparent manner, so not a single line of CUDA kernel code is actually running on Raspberry Pi (and CUDA will probably never run on any ARM SoC except Maxwell)

We have been working on V-GPU platform for more than a year and half. Sorry about being slow for our V-GPU development progress. One thing we know for sure is that we need to guarantee everything works before the actual release. No crash, no unexpected behaviors, 100% compatibility, and listen to the needs of our customers along the way.

Many people emailed us to see if V-GPU can be used to build cloud gaming (streaming) or VDI application, but honestly, we don’t think that’s the right move for us even if it’s possible and completely doable since NVIDIA releases GeForce GRID and VGX technologies for VDI and cloud gaming specifically. However lots of solution vendors like Citrix and Ubitus have integrated these technologies into their product/service offering with existing customer base. But we designed V-GPU for GPU computing (not gaming) from the very beginning because we believe that’s the future of high performance computing in the cloud, and we believe there will be many non-scientific application for GPU in the very near future.

In the same time, as many of you know, the momentum of ARM-based architecture is expanding from your smartphone to server in data center. Starting this year, we may see micro-servers built based on ARMv8 instruction set with 64bit support. The performance will not be as good as Intel Xeon, but in terms of power efficiency and cost, ARM is the clear winner.

So what if we still need to run computational intensive application whiling saving the power bill? what if we need to provide video analytic based on ARM as a web service if Google Glass really takes off next year?

That’s why NVIDIA combines ARM and powerful GPU cores together in their Maxwell architecture planned in 2014. That’s also why we make V-GPU running on the most inexpensive ARM board, Raspberry Pi, available today.

Let’s see it in action.

The Raspberry Pi Stack

The Stacked Raspberry Pi

CUDA Sample App Running on Raspberry Pi

Pi boards and V-GPU server are connected to the same switch, and since Pi has only 10/100M Ethernet connectivity, the performance is quite limited, especially when there’s a large amount of data transfer between host and virtual GPU. Nevertheless, it works with other ARM board with Gigabit (or even 10G) Ethernet as well, and we use Pi just because it’s cheap and easy to get.

You can run CUDA SDK sample applications (except those use OpenGL API) without any modification to source code. Every CUDA application runs transparently on virutal GPU(s), and a single GPU can be shared among multiple Raspberry Pi boards in the same time, so the total cost can be greatly reduced by improving utilization.

What’s the application? Honestly, we don’t know the best use of running CUDA on Raspberry Pi, but I can list a few here if you find it interested. You may use it to build a classroom for CUDA training without buying GPU card for everyone. You may use it to build a GPU-accelerated Raspberry Pi farm as a flexible, yet powerful cluster. You may use it in your home automation powered by low-cost Pi+Arduino devices and process images using CUDA-accelerated OpenCV in the same process. You may find some other idea on the CARMA forum as well, and if you have any cool idea, let us know (in the comment or email) and we can provide you the V-GPU binary :)

And if you want to see it in person, we are going to exhibit at NVIDIA GPU Technology Conference (GTC) (#B612) this year in San Jose Conventional Center from March 18-21. Welcome to walk by and talk to us!

Leave a comment