The basic idea is that we use the derivative of the loss `$L(\theta)$`

with respect to `$\theta$`

and figure out which way the loss is decreasing, then "move" the parameter guess in that direction.

The basic idea is that we use the derivative of the loss `$L(\theta)$`

with respect to `$\theta$`

and figure out which way the loss is decreasing, then "move" the parameter guess in that direction.