Training on GPU with Libtorch and How to Use MPS

Arguably the most useful feature of deep learning libraries is their ability to move workloads to the GPU and PyTorch makes it very easy for users by providing a to() function that you can pass a device object to. The documentation doesn’t make it very clear about it, but actually Libtorch makes it just as easy to do as PyTorch.

In this post, I’ll demonstrate the basics of using a GPU with Libtorch and also show how you can make use of the MPS backend to use an M-series Mac GPU. All examples in this post reference the example code from the Libtorch documentation page, so if are unclear on how to make a model or training loop, start there.

Checking if the Device is Available

First, check if Libtorch can see your device by running this line of code.

C++

// For NVIDIA GPU
torch::cuda::is_available()

// For Mac M1 GPU
torch::mps::is_available()

These functions return 1 for available 0 for not availabe, so you can use std::out to print their response or you can turn them into an if statement to use GPU when available and CPU when not available.

Creating a Device Object

After you know you have a GPU available, you can create a device object. To do this you initialize a new device object with a torch device variable like kCUDA or kMPS. For a more comprehensive list of available device variables, see the variables section on this page.

C++

// To set the device to CPU
torch::Device device(torch::kCPU);

// To set the device to NVIDIA GPU
torch::Device device(torch::kCUDA);

// To set the device to M1 GPU
torch::Device device(torch::kMPS);

Now you have to assign your model and data to this device when running your training loop.

Applying a Device to Your Model

To assign your model to a device you have to apply the device object to your model. This is almost the same as the PyTorch way of doing it. Just pass the device object to the to() function.

C++

// Create a new Net
auto net = std::make_shared<Net>();

// Assign the model to the device
net->to(device);

Next, make sure to do the same with the data so that you don’t hit any errors.

Applying a Device to Your Data

Assigning your data to a device looks like this.

C++

for (auto& batch : *data_loader) {
    batch.data = batch.data.to(device);
    batch.target = batch.target.to(device);
    
    // Conntinue the loop here
    
}

Now when you train, you will be training on your GPU device and it’s as simple as that.

The Next Step

As you write more complicated training scripts with Libtorch, you will most likely want to have a validation and test loop as well. During those loops you will likely want to disable the gradients. To do this you can follow the reference from the official documentation.

A Note about MPS

At the time of writing this I was unable to get a speed up from training with MPS. In fact, my training loop was actually significantly slower than when training on CPU. It’s worth trying it out for fun, but just keep that in mind if you are trying to do anything serious.

Example Code

I have prepared a small example based on the Libtorch basic example that makes use of MPS backend to train. You could also change the variable to kCUDA and train on an NVIDIA GPU if you wanted.

To set it up, you first need to download the MNIST dataset (in my case, I actually used Fashion MNIST) to the data folder and make sure you download the version of Libtorch specifically for M-series Macbooks from the PyTorch website.

For more content on Libtorch, check out some of my articles here!