Brain Dump

Single Layer Feed-Forward Neural Network

Tags
adaptive-intelligence

A [see page 4, form] of neural network where [see page 4, each] neuron in the input layer projects to all the neurons in the output layer. There's no back propagation.

Training

Training a neural network involves tuning the weights of the algorithm through [see page 12, least mean square] gradient descent.

We [see page 8, define] the error function (AKA cost function) \( E = F(\text{input}, \text{output}; w) \) as a function which quantifies how the error will change given a change in weights ( \( \frac{\delta{E}}{\delta{{w}_{ij}}} \)). We extend this definition for the mean square error to be:

\begin{align} E = \frac{1}{2N}\sum_{\mu}\sum_{i} {({y}_{\mu,i} - {t}_{\mu,i})}^{2} \label{eq:ef} \end{align}

With:

  • \( \mu \): indexing over the datapoints (current layer of neurons)
  • \( i \): indexing over the the output neurons
  • \( y \): the output value of the current neuron
  • \( t \): The desired output of the neuron (what we want \( y \) to be)

Note: The \( \frac{1}{2} \) term at the start is just a constant that'll be cancelled out after derivation. Furthermore you don't see any \( w_{ij} \) in the equation above but it does [see page 8, depend] on it through the output of a neuron \( y \).

TODO: Insert summary from [see page 29, com3240-w01-lab].

Through the [see page 21, chain rule] we can derive from eq:ef the back-propagation rule (AKA learning rule, gradient rule) which specifies the actual change in error corresponding to a change in weights:

\begin{align} \label{eq:up-eq} \frac{\delta{E}}{\delta{{w}_{ij}}} = \frac{1}{N} \sum_{\mu} \sum_{a} ({y}_{\mu,a} - {t}_{\mu,a}) f'({h}_{\mu,a}) \sum_{b} \frac{\delta{{w}_{ab}}}{\delta{{w}_{ij}}} {x}_{\mu,b} \
\end{align}

Note however that:

\begin{align*} \frac{\delta{{w}_{ab}}}{\delta{{w}_{ij}}} &= \begin{cases}

1 & \text{if $a = i$ \\& $b = j$}\\\\
0 & \text{otherwise} \\\\

\end{cases} \end{align*}

meaning in the sum over \(a\) and \(b\) the only terms that contribute are when \( a=i \) and \( b=j \), (\( {x}_{\mu,j} \)), which leads to a final update equation of:

\begin{align} \label{eq:up-eq-final} \frac{\delta{E}}{\delta{{w}_{ij}}} = \frac{1}{N} \sum_{\mu} ({y}_{\mu,i} - {t}_{\mu,i}) f'({h}_{\mu,i}){x}_{\mu,j} \end{align}

Given eq:up-eq-final we can now define the delta-rule:

\begin{align} \Delta{{w}_{ij}} &= - \eta \frac{\delta{E}}{\delta{{w}_{ij}}} \\

          &= \frac{\eta}{N}\sum\_{\mu}({t}\_{\mu,i} - {y}\_{\mu,i})f'({h}\_{\mu,i}){x}\_{\mu,j} \\\\
          &= \frac{\eta}{N} \sum\_{\mu} \delta\_{\mu,i} \times {x}\_{\mu,j} \\\\

\end{align}

Where we define:

\begin{align*} \delta_{\mu,i} &= \text{Local gradient of neuron \( i \)} \\

      &= ({t}\_{\mu,i} - {y}\_{\mu,i})f'({h}\_{\mu,i})

\end{align*}

Note \( \Delta{w_{ij}} \) decreases the current weights (is negative) for weights we want to be wrong and increases (positive change) for weights we want to be right. For example if we have a 3-class problem with 3 output-layer neurons we can attach a class to each output neuron picking the class which has the neuron with the largest output potential. In this case we'd like to increase the weights leading to a positive change in the correct classes neuron, and decrease the weights which help the incorrect classes neurons.

Matrix Notation

Note We can also express the local gradient of neuron \( i \) in matrix-notation:

\begin{align*} \delta =

    \begin{pmatrix}
        \delta\_1 \\\\
        \delta\_2 \\\\
    \end{pmatrix}
=
    \begin{pmatrix}
        (t\_1 - y\_1) f'(h\_1) \\\\
        (t\_2 - y\_2) f'(h\_2) \\\\
        (t\_3 - y\_3) f'(h\_3)
    \end{pmatrix}

\end{align*}

With the delta rule being:

\begin{align*} \Delta{w} = \frac{\eta}{N} \delta x^T

   = \frac{\eta}{N}
     \begin{pmatrix}
         \delta\_1 \\\\
         \delta\_2
     \end{pmatrix}
     \begin{pmatrix}
       x\_1 & x\_2 & x\_3
     \end{pmatrix}

\end{align*}

Decision Boundary

The decision boundary for a single-layer perceptron is a [see page 10, linear-equation] defined in terms of the weight vector \( w \), the input sample \( x \) and the boundary intercept \( b \).

\begin{align*} h &= w^T x + b & y &= m x + c \end{align*}

For this decision boundary the weights \( w^T x \) determines how parallel our input and our weights are. \( b \) simply pushes the decision boundary forwards or backward to minimise misclassifications.

Note: A decision boundary is drawn as perpendicular to the weight vector so changing \( b \) doesn't alter the value of \( W^T x \).

Warn: Because the decision boundary for a single-layer network is a linear (straight-line) equation, these networks can't solve non-(/brain/20210216040014-decision_boundary/#linear-decision-boundary) problems.