Q-learning
A [see page 11, variant] of the SARSA algorithm that instead of deferring to the policy for the future-reward action, instead always uses the greedy policy (pick the future action which maximises the reward policy).
Known as an off-policy algorithm because it doesn't depend on the policy.