Epsilon-Greedy Reward Policy

A reward policy which uses a random value (\(\epsilon\)) to influence action selection.

The exact algorithm is taking the greedy action with probability \( 1 - \epsilon \) and any other random action with probability \( \epsilon \).

\( \epsilon \) could start very high and then decrease over iterations.