Gradient Descent
An algorithm to find the parameters which cause an equation/curve to output the minimum point. Essentially we [see page 13, pick] a point and then keep travelling along the curve in the negative gradient direction by a given step size until we can't get any lower.
Gradient descent uses the gradient to tell how the error will change given a change in parameters. The curve plots the error on the Y-axis and the current parameter on the X-axis. We want to pick the \( x \) with the smallest error.
Issues with gradient descent algorithms involve:
- Avoiding local minima when searching for a global minimum
- Converging upon the minimum quickly enough.
Definition
We define the new value of the parameter as the sum of the current value and our delta-formula for that value: \[ x \rightarrow x + \Delta{x} \]
Where we define \( \Delta{x} \) as the rate of change of the Error of the system with respect to the parameter \( x \). \[ \Delta{x} = - \eta \frac{\delta E}{\delta x} \]
\(\eta\), the learning-rate, is a hyper-parameter used to tune the rate of descent. If the steps the algorithm takes are too large we might miss the minimum, but if their too small it might take the algorithm too long to converge.