The primary focus is on addressing noisy problems. Next, we will explore the usage of PyTorch’s Adam optimizer. To gain a better understanding, we will examine some examples of PyTorch Adam.

Solution 1:

One can observe that the gradients are constantly zero by the following means:

```
import torch
z = torch.rand((1,6))
z.requires_grad_(True)
z.retain_grad()
optimizer = torch.optim.SGD([z], lr= 0.1)
criteria = torch.nn.MSELoss()
for i in range(5):
optimizer.zero_grad()
# print(z)
loss = criteria(z, z+torch.rand(1))
# print(loss)
loss.backward()
print(z.grad)
optimizer.step()
```

```
tensor([[0., 0., 0., 0., 0., 0.]])
tensor([[0., 0., 0., 0., 0., 0.]])
...
```

Although I cannot provide the mathematical evidence, it is likely that the origin of the target

z+torch.rand(1)

is linked to the input

z

through a basic addition operation.

Solution 2:

Despite everything appearing to be fine, your criterion lacks sense. The difference between

z

and

z+rand()

cannot be reduced through optimization as it is completely independent of

z

. The difference between these values remains constant at

rand()

. The reason behind this example is unclear to me.

Experiment with a code such as

loss = criteria(z, torch.zeros_like(z))

to observe a rapid convergence towards 0 within a few iterations, as indicated by

z

.

Solution 3:

The problem in this question was with the targeted values, however, the calculation of gradients was disrupted by a block in the graph due to the

torch.no_grad

wrapper. To assist future users, here are some task specifics.

```
import torch
z = torch.rand((1,6))
z.requires_grad_(True)
optimizer = torch.optim.SGD([z], lr= 0.1, momentum=0.9, weight_decay=1e-4)
criteria = torch.nn.MSELoss()
l = torch.nn.Linear(6, 6)#.cuda()
p = torch.nn.Linear(6, 6)#.cuda()
#optimizer = torch.optim.SGD(parameters, lr=0.5)
for i in range(5):
optimizer.zero_grad()
print(z)
fun = l(z)
out = p(fun)
loss1 = criteria(z,fun)
print(loss1)
loss2 = criteria(out,fun)
#print(loss)
loss = loss1 + loss2
loss.backward()
optimizer.step()
print(z.grad)
```