Search for question
Question

1:17 5G Assignment details DS 402, Section 001: TREND IN DATA SCI (22411--UP---P-D... Objective This lab is meant to get you to be familiar with the process of Q-Learning. This will include investigating the impacts of hyperparameters on the performance of a Q-Learning Agent, as well as how we measure performance via learning curves. (A "learning curve" is a line chart in which x-axis is the epoch and y-axis is the average rewards. We have seen examples in the AlphaZero paper, see Figure 1) Recipe Ingredients Add the following files to your project from Lab2: ⚫ agent/QLearningAgent.py testLab3.py A folder called "output" (we will have files landing here) Arrange the QLearningAgent.py in the agent folder. The other files should be in the top level directory of your code project. Your Tasks Submit assignment ◄ Previous Next ▸ 12 51 Dashboard Calendar To-do Notifications Inbox 1:18 5G Assignment details DS 402, Section 001: TREND IN DATA SCI (22411--UP---P-D... Your Tasks 1. Task 1: Understand the updates to the Q- value table: A. Investigate QLearningAgent.py's implementation to understand how Q- learning updates the table, as well as how it uses the table to select actions. B. Run test 1 in main Lab3.py to see the updates of the Q-value table. C. Understand the policy (what action the agent selects in each state) the Q-value table encodes D. (TURN THIS IN) Provide a written interpretation comparing and contrasting the learned values and policy after the 1st trajectory with the final learned values and policy (after the 2nd trajectory). 2. Task 2: Create a learning curve of the Q- learning agent on the parking MDP: A. Run test2 in mainLab3.py to understand the training of the Q-learning on a small Parking MDP B. (TURN THIS IN) Provide a written interpretation comparing and contrasting the learned values and policy after the 1st traiectory with the Submit assignment ◄ Previous Next ▸ 12 51 Dashboard Calendar To-do Notifications Inbox 1:18 5G Assignment details DS 402, Section 001: TREND IN DATA SCI (22411--UP---P-D... trit ti ailing of the Qicaniing on a small Parking MDP B. (TURN THIS IN) Provide a written interpretation comparing and contrasting the learned values and policy after the 1st trajectory with the final learned values and policy (after the second trajectory). Be sure to discuss differences and similarities with what you observed in Task 1 part D. C. Run test3 to train the Q-learning on a hard Parking MDP D. (TURN THIS IN) Create a learning curve for the output of test3 (look in the output folder). The average reward of each training epoch is stored in BasicQLearner_Large Parking MDP_rewardCurveData.txt, and the full list is available in the other file by the same name-stem, but ending in "...rewardList.txt". 3. Task 3: Investigate the impacts of the hyperparameters on the performance of Q- learning: A. Run test4 to run Q-learning algorithms with three different probabilities of making a greedy action choice (probGreedy) B. (TURN THIS IN) Create learning Submit assignment ◄ Previous Next▸ 12 51 Dashboard Calendar To-do Notifications Inbox 1:18 5G Assignment details DS 402, Section 001: TREND IN DATA SCI (22411--UP---P-D... learning: A. Run test4 to run Q-learning algorithms with three different probabilities of making a greedy action choice (probGreedy) B. (TURN THIS IN) Create learning curves for the three Q-Learning training processes found in test4 (varying greed). C. Run test5 to see how Q-learning behaves with three different learningRate values. D. (TURN THIS IN) Create learning curves for the three Q-Learning training processes found in test5 (varying learning rate). E. (TURN THIS IN) Provide a written interpretation comparing and contrasting what you observe in your learning curves where we vary greed and learning rate hyperparameters. 4. Task 4: Test the Q-learning on different MDPS (previously we had held the MDP fixed): A. Run test6 to train an agent on on three Parking MDPs of varying difficulty. B. (TURN THIS IN) Create learning curves for the three training process of the Q-learning agents found in test6. C. Run test 7 to see impacts of probGreedy Submit assignment ◄ Previous Next ▸ 12 51 Dashboard Calendar To-do Notifications Inbox 1:18 5G Assignment details DS 402, Section 001: TREND IN DATA SCI (22411--UP---P-D... and learning rate hyperparameters. 4. Task 4: Test the Q-learning on different MDPs (previously we had held the MDP fixed): A. Run test6 to train an agent on on three Parking MDPs of varying difficulty. B. (TURN THIS IN) Create learning curves for the three training process of the Q-learning agents found in testó. C. Run test7 to see impacts of probGreedy on Q-learning agent for different Parking MDPs of varying difficulty (note that the difference between test7 and test4 is that test7 varies BOTH agent and MDP.) D. (TURN THIS IN) Create learning curves for the training processes found in test7. E. (TURN THIS IN) Provide a written interpretation about what you see in your charts. Be sure to discuss how parameters defining the MDP affect the Q-learning as well as how hyperparameters controlling the Q- Learning Agent affect performance on different MDPS. 5. Task 5: Leading toward Line and Grid Search: We have now seen how to vary MDP parameters, as well as test multiple agents of Submit assignment ◄ Previous Next ▸ 12 51 Dashboard Calendar To-do Notifications Inbox