Code example for optimizer import from torch.optim in Python

The primary focus is on addressing noisy problems. Next, we will explore the usage of PyTorch’s Adam optimizer. To gain a better understanding, we will examine some examples of PyTorch Adam.


Solution 1:

One can observe that the gradients are constantly zero by the following means:

import torch
z = torch.rand((1,6))
z.requires_grad_(True)
z.retain_grad()
optimizer = torch.optim.SGD([z], lr= 0.1)
criteria = torch.nn.MSELoss()
for i in range(5):
    optimizer.zero_grad()
    # print(z)
    loss = criteria(z, z+torch.rand(1))
    # print(loss)
    loss.backward()
    print(z.grad)
    optimizer.step()
tensor([[0., 0., 0., 0., 0., 0.]])
tensor([[0., 0., 0., 0., 0., 0.]])
...

Although I cannot provide the mathematical evidence, it is likely that the origin of the target

z+torch.rand(1)

is linked to the input

z

through a basic addition operation.


Solution 2:


Despite everything appearing to be fine, your criterion lacks sense. The difference between

z

and

z+rand()

cannot be reduced through optimization as it is completely independent of

z

. The difference between these values remains constant at

rand()

. The reason behind this example is unclear to me.

Experiment with a code such as

loss = criteria(z, torch.zeros_like(z))

to observe a rapid convergence towards 0 within a few iterations, as indicated by

z

.


Solution 3:


The problem in this question was with the targeted values, however, the calculation of gradients was disrupted by a block in the graph due to the

torch.no_grad

wrapper. To assist future users, here are some task specifics.

import torch
z = torch.rand((1,6))
z.requires_grad_(True)
optimizer = torch.optim.SGD([z], lr= 0.1, momentum=0.9, weight_decay=1e-4)
criteria = torch.nn.MSELoss()
l = torch.nn.Linear(6, 6)#.cuda()
p = torch.nn.Linear(6, 6)#.cuda()
#optimizer = torch.optim.SGD(parameters, lr=0.5)
for i in range(5):
    optimizer.zero_grad()
    print(z)
    
    fun = l(z)
    
    out = p(fun)
    loss1 = criteria(z,fun)
    print(loss1)
    
    loss2 = criteria(out,fun)
    #print(loss)
    loss = loss1 + loss2
    loss.backward()
    optimizer.step()
    print(z.grad)

Frequently Asked Questions