Single Layer Feed-Forward Neural Network
A [see page 4, form] of neural network where [see page 4, each] neuron in the input layer projects to all the neurons in the output layer. There's no back propagation.
Training
Training a neural network involves tuning the weights of the algorithm through [see page 12, least mean square] gradient descent.
We [see page 8, define] the error function (AKA cost function) \( E = F(\text{input}, \text{output}; w) \)
as a function which quantifies how the error will change given a change in weights
( \( \frac{\delta{E}}{\delta{{w}_{ij}}} \)). We extend this definition for the mean
square error to be:
\begin{align} E = \frac{1}{2N}\sum_{\mu}\sum_{i} {({y}_{\mu,i} - {t}_{\mu,i})}^{2} \label{eq:ef} \end{align}
With:
- \( \mu \): indexing over the datapoints (current layer of neurons)
- \( i \): indexing over the the output neurons
- \( y \): the output value of the current neuron
- \( t \): The desired output of the neuron (what we want \( y \) to be)
Note: The \( \frac{1}{2} \) term at the start is just a constant that'll be cancelled out after derivation. Furthermore you don't see any \( w_{ij} \) in the equation above but it does [see page 8, depend] on it through the output of a neuron \( y \).
TODO: Insert summary from [see page 29, com3240-w01-lab].
Through the [see page 21, chain rule] we can derive from eq:ef the back-propagation rule (AKA learning rule, gradient rule) which specifies the actual change in error corresponding to a change in weights:
\begin{align}
\label{eq:up-eq}
\frac{\delta{E}}{\delta{{w}_{ij}}} = \frac{1}{N} \sum_{\mu} \sum_{a} ({y}_{\mu,a} - {t}_{\mu,a}) f'({h}_{\mu,a}) \sum_{b} \frac{\delta{{w}_{ab}}}{\delta{{w}_{ij}}} {x}_{\mu,b} \
\end{align}
Note however that:
\begin{align*} \frac{\delta{{w}_{ab}}}{\delta{{w}_{ij}}} &= \begin{cases}
1 & \text{if $a = i$ \\& $b = j$}\\\\
0 & \text{otherwise} \\\\
\end{cases} \end{align*}
meaning in the sum over \(a\) and \(b\) the only terms that contribute are when \( a=i \) and \( b=j \), (\( {x}_{\mu,j} \)), which leads to a final update equation of:
\begin{align} \label{eq:up-eq-final} \frac{\delta{E}}{\delta{{w}_{ij}}} = \frac{1}{N} \sum_{\mu} ({y}_{\mu,i} - {t}_{\mu,i}) f'({h}_{\mu,i}){x}_{\mu,j} \end{align}
Given eq:up-eq-final we can now define the delta-rule:
\begin{align} \Delta{{w}_{ij}} &= - \eta \frac{\delta{E}}{\delta{{w}_{ij}}} \\
&= \frac{\eta}{N}\sum\_{\mu}({t}\_{\mu,i} - {y}\_{\mu,i})f'({h}\_{\mu,i}){x}\_{\mu,j} \\\\
&= \frac{\eta}{N} \sum\_{\mu} \delta\_{\mu,i} \times {x}\_{\mu,j} \\\\
\end{align}
Where we define:
\begin{align*} \delta_{\mu,i} &= \text{Local gradient of neuron \( i \)} \\
&= ({t}\_{\mu,i} - {y}\_{\mu,i})f'({h}\_{\mu,i})
\end{align*}
Note \( \Delta{w_{ij}} \) decreases the current weights (is negative) for weights we want to be wrong and increases (positive change) for weights we want to be right. For example if we have a 3-class problem with 3 output-layer neurons we can attach a class to each output neuron picking the class which has the neuron with the largest output potential. In this case we'd like to increase the weights leading to a positive change in the correct classes neuron, and decrease the weights which help the incorrect classes neurons.
Matrix Notation
Note We can also express the local gradient of neuron \( i \) in matrix-notation:
\begin{align*} \delta =
\begin{pmatrix}
\delta\_1 \\\\
\delta\_2 \\\\
\end{pmatrix}
=
\begin{pmatrix}
(t\_1 - y\_1) f'(h\_1) \\\\
(t\_2 - y\_2) f'(h\_2) \\\\
(t\_3 - y\_3) f'(h\_3)
\end{pmatrix}
\end{align*}
With the delta rule being:
\begin{align*} \Delta{w} = \frac{\eta}{N} \delta x^T
= \frac{\eta}{N}
\begin{pmatrix}
\delta\_1 \\\\
\delta\_2
\end{pmatrix}
\begin{pmatrix}
x\_1 & x\_2 & x\_3
\end{pmatrix}
\end{align*}
Decision Boundary
The decision boundary for a single-layer perceptron is a [see page 10, linear-equation] defined in terms of the weight vector \( w \), the input sample \( x \) and the boundary intercept \( b \).
\begin{align*} h &= w^T x + b & y &= m x + c \end{align*}
For this decision boundary the weights \( w^T x \) determines how parallel our input and our weights are. \( b \) simply pushes the decision boundary forwards or backward to minimise misclassifications.
Note: A decision boundary is drawn as perpendicular to the weight vector so changing \( b \) doesn't alter the value of \( W^T x \).
Warn: Because the decision boundary for a single-layer network is a linear (straight-line) equation, these networks can't solve non-(/brain/20210216040014-decision_boundary/#linear-decision-boundary) problems.