The basic idea is that we use the derivative of the loss $L(\theta)$ with respect to $\theta$ and figure out which way the loss is decreasing, then "move" the parameter guess in that direction.