Understanding Autograd and Gradient Calculation in PyTorch

Introduction

Autograd is PyTorch’s automatic differentiation engine that powers neural network training. It allows you to automatically compute gradients of tensors involved in computations, making it easier to implement backpropagation for deep learning models.

Setting requires_grad

To track operations on tensors for gradient computation, set requires_grad=True when creating them.

import torch

x = torch.tensor([2.0, 3.0], requires_grad=True)
y = x * 2
z = y.sum()

print("z:", z)

Computing Gradients with backward()

Use backward() on a scalar output to compute gradients of all tensors with requires_grad=True.

z.backward()
print("Gradient of x:", x.grad)

Explanation:

x is a tensor with gradients enabled.
z = sum(x * 2) creates a computation graph.
z.backward() computes dz/dx.
x.grad now holds the gradients: [2.0, 2.0].

Disabling Gradient Tracking

Use torch.no_grad() or detach() to temporarily turn off gradient tracking (useful during inference):

with torch.no_grad():
    y = x * 3

# OR

y = x.detach()

Gradient Accumulation

Gradients are accumulated by default. Use zero_() to reset gradients between training steps.

x = torch.tensor([2.0, 3.0], requires_grad=True)
y = x * 2
z = y.sum()
z.backward()

print(x.grad)  # First backward

# Clear gradients before the next backward pass
x.grad.zero_()

z = (x * 3).sum()
z.backward()

print(x.grad)  # Second backward

Visualizing the Computation Graph

Every tensor with requires_grad=True is part of a computation graph that tracks operations to compute gradients. You can inspect a tensor's creator function using .grad_fn.

print(z.grad_fn)  #  or similar

Partial Derivatives with Multiple Outputs

import torch

x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = x * x
z = y.mean()

z.backward()
print(x.grad)

Here, z = mean(x²). The backward step calculates gradients for each xᵢ as dz/dxᵢ = 2xᵢ / n.
Since the tensor has 3 elements, each gradient will be 2xᵢ / 3.

Accessing grad_fn in Autograd

x = torch.tensor([5.0], requires_grad=True)
y = x * 3
z = y + 4

print(z.grad_fn)
print(y.grad_fn)

.grad_fn shows the function that created the tensor as part of the computation graph.
This is useful for understanding the chain of operations during debugging or visualization.

Chaining Multiple Operations

x = torch.tensor(1.0, requires_grad=True)
y = x * 2
z = y * y

z.backward()
print(x.grad)

y = 2x, z = y² = 4x².
dz/dx = 8x → When x=1, gradient becomes 8.0

Use Case: Loss Function Gradient

import torch.nn as nn

x = torch.tensor([0.5, 1.5], requires_grad=True)
target = torch.tensor([1.0, 2.0])
loss_fn = nn.MSELoss()

loss = loss_fn(x, target)
loss.backward()

print(x.grad)

Mean Squared Error is used as the loss function.
The gradient helps us understand how much each input element contributes to the error.

Detaching Without Breaking Computation Chain

x = torch.tensor([2.0], requires_grad=True)
y = x * 5

with torch.no_grad():
    z = y * 3

print(z.requires_grad)  # False

# Or using detach
z = (y * 3).detach()
print(z.requires_grad)  # False

torch.no_grad() or .detach() prevents unnecessary computation graph building.
This is especially helpful during inference to reduce memory and speed up execution.

Use Case: Gradient Accumulation in Training

x = torch.tensor([1.0, 2.0], requires_grad=True)

y1 = x * 2
z1 = y1.sum()
z1.backward()

print("Grad after first backward:", x.grad)

x.grad.zero_()

y2 = x * 3
z2 = y2.sum()
z2.backward()

print("Grad after second backward:", x.grad)

Gradients from the first backward call are accumulated in x.grad.
Use x.grad.zero_() to reset them before starting the next training step.

Use Case: Custom Gradients (Function Override)

from torch.autograd import Function

class MultiplyByTwo(Function):
    @staticmethod
    def forward(ctx, input):
        ctx.save_for_backward(input)
        return input * 2

    @staticmethod
    def backward(ctx, grad_output):
        input, = ctx.saved_tensors
        return grad_output * 2

x = torch.tensor(3.0, requires_grad=True)
y = MultiplyByTwo.apply(x)
y.backward()

print(x.grad)

We define a custom function where both forward and backward passes are explicitly written.
This is used when default gradient behavior needs to be customized for a new operation.

Conclusion

PyTorch’s autograd system simplifies gradient computation, enabling efficient model training and backpropagation. By understanding how to control and inspect gradients, you gain deeper insight into how models learn from data.

Next: Building Neural Networks using nn.Module »

« Previous PyTorch Home Next »

Subhendu Mohapatra

Author

🎥 Join me live on YouTube

Passionate about coding and teaching, I publish practical tutorials on PHP, Python, JavaScript, SQL, and web development. My goal is to make learning simple, engaging, and project‑oriented with real examples and source code.

Subscribe to our YouTube Channel here