In this part of the assignment, you will adapt the Q-Learning code from Part 1 to an Atari game environment of your choosing. Additionally,
you will implement the Proximal Policy Optimization (PPO) algorithm and evaluate it on the same Atari game environment.
Objective: Adapt the Q-Learning code to an Atari game environment and implement the PPO algorithm for comparison.
Tasks:
1. Choose an Atari game environment from the OpenAl Gym library[1].
2. Adapt the Q-Learning code from Part 1 to work with the chosen Atari game environment.
3. Train your Q-Learning agent on the Atari game environment.
4. Implement the PPO algorithm, following the guidelines provided here and here. You may use other implementations of PPO as a
reference, but you must write your own code. Please cite any references you use.
5. Train your PPO agent on the same Atari game environment.
6. Compare the performance of the Q-Learning and PPO agents on the chosen Atari game environment by looking at the mean reward
over time.
Fig: 1