Einops: Making Life Bit Easier (Mostly)

When working in PyTorch, you are often faced with the need to manipulate tensors of multiple dimensions into various shapes or maybe combine dimensions together. There’s a wealth of functions specifically for that too. To name a few, view, permutate, stack, tile, and concat are some of the most common ones, but the list goes on. One big problem with these functions though is that sometimes it can be really hard to wrap your head around what they are doing to your tensors.

Enter Einops. This module was introduced to me by a friend also working in the CV industry, and it was very clear from the beginning how it can help you to write tensor manipulation code in a way that is a bit easier to reason about, and hopefully clearer to whoever is reading the code afterwards as well.

For a deep dive into all the different things Einops can do, checkout their webpage. In this article, I will cover some clear use cases that I see for Computer Vision and then talk about a few instances where I would probably not use Einops over the default options.

Let’s do a quick setup of a data loader really quick before we started so we can use it for some real world examples.

Python

import torch
from torchvision import datasets
from torchvision.transforms import ToTensor
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt

from einops import rearrange, reduce, repeat, pack, unpack

training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(training_data, batch_size=16, shuffle=True)

images, labels = next(iter(train_dataloader))

Some Potential Use Cases

Stacking Images

Having experimented with the library, there are a few areas I can see in my own projects that would immediately benefit from using einops. The first super obvious situation is joining images into a stack. There are a variety of reasons to do this, but a very simple use case for this is if you want to visualize a batch of images quickly.

For me, someone who writes PyTorch fairly regularly, my first instinct is to grab stack or concat to do this. However, if you attempt that, you’ll remember that stack and concat both expect a list or tuple of tensors rather than a single tensor. Instead, you would want to use view in this situation. Here is what that code might look like:

Python

stacked_image = images.view(images.shape[0] * images.shape[2], images.shape[3])

figure = plt.figure(figsize=(10, 10))
plt.imshow(stacked_image.squeeze(), cmap="gray")

As you can see it is a bit clunky. You have to multiply different values from the shape of the tensor together to get the new tensor shape that you want. You could simplify this a bit by splitting the name into intuitive variables and then using those to calculate the new shape instead. However, it is still a little bit uncomfortable. It isn’t always easy to remember what view does right away. if you use einops not only can you assign an intuitive name to each dimension, but you can do it all in one function and it is very clear that you are “rearranging” the tensor, which is exactly what is happening.

Python

stacked_image = rearrange(images, 'b c h w -> c (b h) w')

figure = plt.figure(figsize=(10, 10))
plt.imshow(stacked_image.squeeze(), cmap="gray")

Manipulating Bounding Boxes

Another situation that comes up a lot for me anyways is the need to join and separate bounding boxes and their corresponding predictions. Generally, if it were me, I would use torch.concat to join the bounding boxes and predictions together. However, torch.concat requires that you specify which dimension to join on and sometimes it can be hard to imagine how your data is getting concatenated at first glance.

Python

torch.concat([bounding_boxes, predictions.unsqueeze(dim=1)], dim=1)

In this case, you could use einops pack method instead. Using the einum inspired notation, you can see how the height is being kept while the widths are being joined together. To me it is slightly debatable if this syntax is actually more readable than a regular concat function, but one thing that is very nice about this is that pack returns a variable called “Packed Shapes” that is essentially information about the way your data was packed.

Python

bbox_and_pred, ps = pack([bounding_boxes, predictions], 'h *')

This can be very useful if you are concatenating and splitting the same data over and over again. To split up my bounding boxes and predictions would require a bit of fussing with tensor indices, but by using unpack, I can pass it my Packed Shapes variable and it splits the data back into the original parts very simply.

Python

bounding_boxes, predictions = unpack(bbox_and_pred, ps, 'h *')

The only downside of this that I can see is having to keep track of the Packed Shapes variable later on. There is a good chance that if you returned your packed data from a function you would accidentally lose the Packed Shapes variable or you would have to work out a way to also return that from the function as well. All in all, I don’t think it is a huge problem though.

Where I Wouldn’t Use Einops

In addition to all of the wonderful examples on the documentation page for einops, there are also some examples of things like MaxPool and image augmentation. To be honest, as much as I think rearrange is clearer than concat or view, I don’t believe that using it in place of MaxPool is easier to reason about.

My recommendation for anyone using einops is to think hard about what is actually more readable. In many cases rearrange and pack can be much more intuitive than the names and usage of default torch methods. However, as in any programming language, attempting to be too clever can result in unreadable code.

Source Code

As always, here is the source code for my post. This time it’s just a jupyter notebook since I’m simply illustrating some of the einops functions.

Conclusión

Einops is a great package for tensor manipulation that can make your code more readable and save you time when wrapping your head around how to manipulate tensors. Just remember to be rational when using it and always use the simplest way to write something rather than the most clever way to write something.

Einops: Making Life Bit Easier (Mostly)

Some Potential Use Cases

Stacking Images

Manipulating Bounding Boxes

Where I Wouldn’t Use Einops

Source Code

Conclusión

Comentarios

Deja un comentario Cancelar respuesta