1 17 5g assignment details ds 402 section 001 trend in data sci 22411
Search for question
Question
1:17
5G
Assignment details
DS 402, Section 001: TREND IN DATA SCI (22411--UP---P-D...
Objective
This lab is meant to get you to be familiar with the
process of Q-Learning. This will include
investigating the impacts of hyperparameters on
the performance of a Q-Learning Agent, as well as
how we measure performance via learning curves.
(A "learning curve" is a line chart in which x-axis is
the epoch and y-axis is the average rewards. We
have seen examples in the AlphaZero paper, see
Figure 1)
Recipe Ingredients
Add the following files to your project from Lab2:
⚫ agent/QLearningAgent.py
testLab3.py
A folder called "output" (we will have files
landing here)
Arrange the QLearningAgent.py in the agent folder.
The other files should be in the top level directory of
your code project.
Your Tasks
Submit assignment
◄ Previous
Next ▸
12
51
Dashboard Calendar
To-do
Notifications
Inbox 1:18
5G
Assignment details
DS 402, Section 001: TREND IN DATA SCI (22411--UP---P-D...
Your Tasks
1. Task 1: Understand the updates to the Q-
value table:
A. Investigate QLearningAgent.py's
implementation to understand how Q-
learning updates the table, as well as
how it uses the table to select actions.
B. Run test 1 in main Lab3.py to see the
updates of the Q-value table.
C. Understand the policy (what action the
agent selects in each state) the Q-value
table encodes
D. (TURN THIS IN) Provide a written
interpretation comparing and
contrasting the learned values and
policy after the 1st trajectory with the
final learned values and policy (after the
2nd trajectory).
2. Task 2: Create a learning curve of the Q-
learning agent on the parking MDP:
A. Run test2 in mainLab3.py to understand
the training of the Q-learning on a small
Parking MDP
B. (TURN THIS IN) Provide a written
interpretation comparing and
contrasting the learned values and
policy after the 1st traiectory with the
Submit assignment
◄ Previous
Next ▸
12
51
Dashboard Calendar
To-do
Notifications
Inbox 1:18
5G
Assignment details
DS 402, Section 001: TREND IN DATA SCI (22411--UP---P-D...
trit ti ailing of the Qicaniing on a small
Parking MDP
B. (TURN THIS IN) Provide a written
interpretation comparing and
contrasting the learned values and
policy after the 1st trajectory with the
final learned values and policy (after the
second trajectory). Be sure to discuss
differences and similarities with what
you observed in Task 1 part D.
C. Run test3 to train the Q-learning on a
hard Parking MDP
D. (TURN THIS IN) Create a learning
curve for the output of test3 (look in the
output folder). The average reward of
each training epoch is stored in
BasicQLearner_Large Parking
MDP_rewardCurveData.txt, and the full
list is available in the other file by the
same name-stem, but ending in
"...rewardList.txt".
3. Task 3: Investigate the impacts of the
hyperparameters on the performance of Q-
learning:
A. Run test4 to run Q-learning algorithms
with three different probabilities of
making a greedy action choice
(probGreedy)
B. (TURN THIS IN) Create learning
Submit assignment
◄ Previous
Next▸
12
51
Dashboard Calendar
To-do
Notifications
Inbox 1:18
5G
Assignment details
DS 402, Section 001: TREND IN DATA SCI (22411--UP---P-D...
learning:
A. Run test4 to run Q-learning algorithms
with three different probabilities of
making a greedy action choice
(probGreedy)
B. (TURN THIS IN) Create learning
curves for the three Q-Learning training
processes found in test4 (varying greed).
C. Run test5 to see how Q-learning
behaves with three different
learningRate values.
D. (TURN THIS IN) Create learning
curves for the three Q-Learning training
processes found in test5 (varying
learning rate).
E. (TURN THIS IN) Provide a written
interpretation comparing and
contrasting what you observe in your
learning curves where we vary greed
and learning rate hyperparameters.
4. Task 4: Test the Q-learning on different MDPS
(previously we had held the MDP fixed):
A. Run test6 to train an agent on on three
Parking MDPs of varying difficulty.
B. (TURN THIS IN) Create learning
curves for the three training process of
the Q-learning agents found in test6.
C. Run test 7 to see impacts of probGreedy
Submit assignment
◄ Previous
Next ▸
12
51
Dashboard Calendar
To-do
Notifications
Inbox 1:18
5G
Assignment details
DS 402, Section 001: TREND IN DATA SCI (22411--UP---P-D...
and learning rate hyperparameters.
4. Task 4: Test the Q-learning on different MDPs
(previously we had held the MDP fixed):
A. Run test6 to train an agent on on three
Parking MDPs of varying difficulty.
B. (TURN THIS IN) Create learning
curves for the three training process of
the Q-learning agents found in testó.
C. Run test7 to see impacts of probGreedy
on Q-learning agent for different
Parking MDPs of varying difficulty (note
that the difference between test7 and
test4 is that test7 varies BOTH agent
and MDP.)
D. (TURN THIS IN) Create learning
curves for the training processes found
in test7.
E. (TURN THIS IN) Provide a written
interpretation about what you see in
your charts. Be sure to discuss how
parameters defining the MDP affect the
Q-learning as well as how
hyperparameters controlling the Q-
Learning Agent affect performance on
different MDPS.
5. Task 5: Leading toward Line and Grid Search:
We have now seen how to vary MDP
parameters, as well as test multiple agents of
Submit assignment
◄ Previous
Next ▸
12
51
Dashboard Calendar
To-do
Notifications
Inbox