Markov Decision Process

A Markov Decision Process (MDP) is a mathematical framework used to model situations where decisions lead to different outcomes, often with some randomness involved. It involves states (current situations), actions (choices), and rewards (benefits). At each step, based on the current state and chosen action, the system transitions to a new state and earns a reward. MDPs help determine the best sequence of decisions to maximize rewards over time, assuming future states depend only on the current state and action, not past history. They're widely used in areas like robotics, finance, and artificial intelligence for decision-making under uncertainty.