Epsilon-Greedy Reward Policy
A reward policy which uses a random value (\(\epsilon\)) to influence action selection.
The exact algorithm is taking the greedy action with probability \( 1 - \epsilon \) and any other random action with probability \( \epsilon \).
\( \epsilon \) could start very high and then decrease over iterations.