The environment is usually modeled as an MDP, which is defined by a set of states, actions, transition probabilities, and expected rewards (Sutton & Barto, 1998).
Feedback takes the form of a numerical reward signal, and guides the agent in developing its policy. Reinforcement learning techniques rely on feedback from the environment in order to learn.