Question

3. Q Learning Suppose that we have 4 rooms in a building connected by doors as shown in the figure below. We numbered the rooms as 1 to 4. The outside

of the building can be thought of as one big room with number 5. Notice that doors 2 and 3 lead into the building from room 5 (the outside). 5 E 2 3 4 2 Goal State Build the final R and Q matrices, and draw the final state diagram with rewards assuming: • The doors that lead immediately to the goal have an instant reward of 100. Other doors not directly connected to the target room have zero reward. • Each arrow contains an instant reward value as shown below: Learning rate = 0.8, and initial state is at Room 1. €

Question image 1