Search for question
Question

Reinforcement Learning (RL) & Machine Learning In the context of the arcade game "Pac-Man," the titular game character navigates a maze, collecting small pellets for score accumulation while evading pursuing ghosts.

Similar to assignment 1, in this assignment also, we aim to develop an intelligent automated planner based on reinforcement learning and machine learning. This Al planner's objective is to maximise score in playing the game. Our intention is to harness specific theoretical constructs such as Markov decision processes, Q-learning reinforcement learning, and supervised learning. There are two parts to this assignment: • Part 1 is all about different reinforcement learning techniques, i.e. Value-iteration and Q-learning. You will tackle a number of different maze navigation challenges that ask you to find the best policy for Pac-Man to maximise its score. • Part 2 is all about supervised learning. You will train a supervised model based on a simplified neural network (single layer perceptron). The model is used to suggest the next action for Pac-Man while playing a classic version of the game. This assignment is worth 32% of your total mark. You will be evaluated based on 1. The performance of your Al model on the various game challenges. 2. The quality of your report, which describes and gives justification for your implemented techniques./nFor this assignment we work with a simulated version of the Pac-Man game, written entirely in python and developed at The University of Berkeley, Instructions for how to download and install this software can be found in our Getting Started Guide (available on Moodle). In brief overview, the software works as follows: 1. The state of the game is tracked and advanced by a Pac-Man game simulator. 2. The simulator divides games into a series of discrete iterations or timesteps. 3. In each iteration a controller object must specify valid actions for each agent in the game. The possible actions are: Up, Down, Left, Right and STOP (Except for Question 1, which has only Up, Down, Left and Right actions). 4. Given a valid action, the game simulator updates the position of each agent and, if necessary, updates the score (invalid actions have no effect). 5. The game simulator continues iterating until one of the following termination conditions are reached: o For part 1 (Questions 1 & 2) only: Pac-Man eats a single food. o For part 2 (Question 3) only: The game board has been cleared of all food dots (i.e., the problem is solved). o For all the questions: Pac-Man is caught by a ghost (i.e., the game is over, - Fire Pit Exit terminal). You will implement a series of different Pac-Man controllers using reinforcement learning or machine learning, each one solving a different kind of problem. Solution quality is measured in terms of total score/rewards. Each time you solve a problem: 1) Part 1 - Question 1: the corresponding values "V" 2) Part 1 - Question 2: the corresponding Q-values "Q" 3) Part 2 - Question 3: the single layer perceptron's weights as a vector are saved to model files in the logs directory. You must submit a collection of such files, together with your report and code. The rest of this document explains the challenge problems and the outputs you need to generate as part of your assignment submission. If you have questions about any part of this assignment specification, the starter code or its supporting documentation, please reach out to the teaching team. We suggest Ed as a first stop, followed by consultations and then email. You can also raise a bug in our Bitbucket Issue Tracker, if you believe you have discovered a fault in the software. We will do our best to respond to your questions as effectively as possible. PAY CAREFUL ATTENTION TO THE INSTRUCTIONS IN THIS SPECIFICATION • Remember to submit all required files and documents! • Do not modify any code that we do not ask you to! • Each submission must be your own original work!

Fig: 1

Fig: 2