Brain Dump

Saturation

Tags
adaptive-intelligence

Is a property of a function which means that the output value is asymptotically approaching 0 or 1 as the input goes to -infinity and +infinity respectively. That is, if the input is already large then increasing the input slightly will change the output by only a small amount compared to region close to 0.

For a numerical example:

  • If \(h = 0\), then \(sigmoid(0) = 0.5\), increasing the input by 1 gives \(sigmoid(1) = 0.732\). So it has increased by \(0.231\)
  • If \(h = 5\), then \(sigmoid(5) = 0.09933\), again increasing the input by 1 now gives \(sigmoid(6) = 0.9975\). So only by \(0.0042\).

This applies to the logistic sigmoid or tanh functions where from the plot you can see it quickly approaching the limits. This can be seen in the gradient as well since for both of these activation functions the gradient aymptotically approaches 0 for both large positive and negative values. ReLU doesn't have this problem for large positive inputs as it is linear but if it is badly initialised (e.g so all inputs are negative) then it can run into problems as well.

Links to this note