
Sutton's Paradox
Sutton's Paradox concerns reinforcement learning, where an agent learns optimal behavior through rewards. The paradox highlights that introducing a small constant reward for simply waiting or doing nothing can, counterintuitively, discourage the agent from acting promptly, because the agent might prefer to wait until the larger reward appears. In essence, giving a continuous small reward without practical benefit can make the agent less motivated to act immediately, revealing how the structure of rewards influences decision-making. This paradox underscores the importance of carefully designing reward signals to ensure optimal and timely behavior.