Simulated blackjack hands

broken image
broken image

The environment is usually modeled as an MDP, which is defined by a set of states, actions, transition probabilities, and expected rewards (Sutton & Barto, 1998).

broken image

Feedback takes the form of a numerical reward signal, and guides the agent in developing its policy. Reinforcement learning techniques rely on feedback from the environment in order to learn.

broken image