Getting Started with Libtorch

In the interest of getting as many readers as fast as possible, for the first post on this blog, I’m going to jump straight into the most exciting topic I can think of: C++. Sorry, not sorry.

In the future, I’ll be writing about everything from computer perception and machine learning to more general software engineering topics, but to get things started, I’m going to lead with a topic I’ve been obsessing over a bit lately. Specifically using libtorch for machine learning in C++. If you are just getting started with libtorch or you are curious about what deep learning in C++ looks like, welcome!

Just an early warning though. I am going to assume you already know a programming language of some sort, and you are at least familiar with some of the concepts of deep learning.

Prerequisites:
– Intermediate programming knowledge
– Basic deep learning knowledge
– Ideally will have used PyTorch before.

Note: If you need help setting up Libtorch to start developing, take a look at my guide on setting up a development environment and cmake file.

So what is Libtorch?

Libtorch is the C++ version of the popular deep learning framework, Pytorch. PyTorch itself runs on a C++ backend and the PyTorch team maintains C++ versions of almost all of the components in the PyTorch library, so for the most part anything you can do in PyTorch, you should also be able to do in libtorch.

For anyone who has already used PyTorch before, a lot of the syntax will look fairly familiar. Things like setting the device to be used, creating a training loop, or setting a section of good to be no_grad are all written very similarly to their PyTorch counterparts.

Breaking Down the Main Example

To get started let’s go ahead and look at the main example on the libtorch documentation. The example defines a model and then in the main function initializes the model and an MNIST dataloader before training the model in a training loop.

Creating a Model

Here is a simplified version of the model in the example. If you are coming from PyTorch, the model layers being defined in the constructor and used in the forward function should look really familiar to you.

struct Model : torch::nn::Module {
  Model() {
    fc1 = register_module("fc1", torch::nn::Linear(784, 64));
    fc2 = register_module("fc2", torch::nn::Linear(64, 10));
  }

  torch::Tensor forward(torch::Tensor x) {
    x = torch::relu(fc1->forward(x.reshape({x.size(0), 784})));
    x = torch::log_softmax(fc2->forward(x), /*dim=*/1);
    return x;
  }

  torch::nn::Linear fc1{nullptr}, fc2{nullptr};
};

Something that is a bit different from PyTorch is that you have to create null members for the layers, and then the register_module function is used to register your layer to the member names. Take note that register_module is actually returning a pointer, so the layers are called like fc1->forward(x) in the way a member function of a pointer is called.

For those unfamiliar with C++, the model here is defined as a struct, which is essentially a class without private members. Unlike Python, C++ classes allow you to explicitly make a member private, meaning it cannot be accessed by anything other than other functions of the same class. If you need this behavior in your model, you can also define your model as a class.

Another C++ note, you could create the model definition in a header file (.h) and then define the functions in a .cpp file, although this isn’t required. However, you might see it in C++ projects using libtorch.

Initializing Your Model and Dataloader

This is fairly similar to how you would do it in PyTorch. The model uses a shared pointer which I am not super familiar with. I suspect you could initialize it normally, but we will maintain the original syntax for now.

int batch_size = 64;
float learning_rate = 0.01;

// Instantiating the Model
auto model = std::make_shared<Model>();

// Instantiating the Dataloader
auto dataset = torch::data::datasets::MNIST("./data").map(
          torch::data::transforms::Stack<>())
auto data_loader = torch::data::make_data_loader(dataset, batch_size);

torch::optim::SGD optimizer(model->parameters(), learning_rate);

My version of the code is slightly different than the Libtorch tutorial because I break dataset out into its own variable to make it a bit easier to read. Otherwise, similiar to PyTorch, you just pass your dataset and batch size to the data loader that will then create the batches for you.

After that you also need your optimizer. The general go to optimizer is SGD so that is what we will use. You have to pass your model parameters to it, again using the pointer access to a member function.

The Training Loop

Now we finally get into the main course, the training loop. In the outer loop, it is simply looping for however many times num_epochs is se to. On the inner loop it, it loops over all the batches in the data loader, which is also accessed as a pointer. Note that the batches themselves are not pointers though, so you reference their data and targets with the dot notation.

int num_epochs = 10

for (size_t epoch = 1; epoch <= num_epochs; ++epoch) {
    size_t batch_index = 0;
    
    float epoch_loss = 0.0;
    
    for (auto& batch : *data_loader) {
      
      optimizer.zero_grad();
      
      torch::Tensor prediction = model->forward(batch.data);
      
      torch::Tensor loss = torch::nll_loss(prediction, batch.target);
      
      loss.backward();
      
      optimizer.step();
      
      epoch_loss += loss.item<float>();
      
    }
    
    std::cout << "Epoch: " << epoch << std::endl;
    std::cout << "Loss: " <<  epoch_loss / batch_size << std::endl;
}

After getting your batch, everything is fairly standard deep learning stuff. you zero the optimizer gradients, pass the data to the model’s forward function and collect the outputs. The outputs are used to calculated the loss together with the targets, and then backward is called on the loss to calculate the gradients before the optimizer step function that finishes the backpropagation. You can also use the predictions to calculate the accuracy of the model and like I did, you can accumulate the loss for one epoch to get your epoch loss value.

Conclusion

Despite the slightly high barrier to entry caused by C++, libtorch is a very clean library that allows you to write deep learning code almost as easily as if you were using PyTorch and Python. This post was just a brief overview of a very basic model and training loop using libtorch, but from here on I hope to dive deeper into each of the components and talk about how you could structure your own deep learning project using libtorch.