TD(λ)

TD(λ), or Temporal Difference Learning with parameter λ, is a reinforcement learning method that helps an agent learn to predict future rewards based on experience. It updates its predictions gradually as it observes outcomes, balancing between short-term results (TD(0)) and long-term outlooks (Monte Carlo methods). The λ parameter controls this balance: closer to zero emphasizes immediate outcomes, while closer to one considers longer-term trends. This approach enables more flexible, efficient learning by combining immediate feedback with broader patterns, improving the agent’s ability to make accurate predictions and decisions over time.