Search for question
Question

(60 points) Part 2: Proximal Policy Optimization on an Atari Game Environment

In this part of the assignment, you will adapt the Q-Learning code from Part 1 to an Atari game environment of your choosing. Additionally,

you will implement the Proximal Policy Optimization (PPO) algorithm and evaluate it on the same Atari game environment.

Objective: Adapt the Q-Learning code to an Atari game environment and implement the PPO algorithm for comparison.

Tasks:

1. Choose an Atari game environment from the OpenAl Gym library[1].

2. Adapt the Q-Learning code from Part 1 to work with the chosen Atari game environment.

3. Train your Q-Learning agent on the Atari game environment.

4. Implement the PPO algorithm, following the guidelines provided here and here. You may use other implementations of PPO as a

reference, but you must write your own code. Please cite any references you use.

5. Train your PPO agent on the same Atari game environment.

6. Compare the performance of the Q-Learning and PPO agents on the chosen Atari game environment by looking at the mean reward

over time.

Fig: 1