DQN-RL DONE BY LIU DONGYANG

Demo Gif

Background Research

BipedalWalker-v3 is a very hard environment in the Gym. The agent should run very fast, should not trip himself off, should use as little energy as possible.

The environment requires an average total reward of over 300 over 100 consecutive episodes to be considered as finished, which is incredibly difficult (less than 10 people solved it on Gym).

Environment and walking strategies

Reward is given for moving forward, total 300+ points up to the far end.

If the robot falls, it gets -100. Applying motor torque costs a small amount of points, more optimal agent will get better score.

State consists of hull angle speed, angular velocity, horizontal speed, vertical speed, position of joints and joints angular speed, legs contact with ground, and 10 lidar rangefinder measurements.

There's no coordinates in the state vector.

Usually, when we are programming a standard Reinforcement Learning agent, we don't care what is the observation space, our agent should fit to whatever it is, but it's better to know what are the inputs in case if our agent is not learning.

Also, you may have noticed that there is no information about the terrain in the state. This means that our agent, don't know anything about the way where he is running. He must use lidar to scan the terrain (I think so).

Action Space

BipedalWalker has 2 legs. Each leg has 2 joints. We have to teach the Bipedal-walker to walk by applying the torque on these joints. Therefore the size of our action space is 4 which is the torque applied on 4 joints. You can apply the torque in the range of (-1, 1), as shown in the following table:

Reward

  • The agent gets a positive reward proportional to the distance walked on the terrain. It can get a total of 300+ reward all the way up to the end.
  • If the agent tumbles, it gets a negative reward of -100.
  • There is some negative reward proportional to the torque applied on the joint. So that agent learns to walk smoothly with minimal torque.

Also, I must mention that there are 2 versions of the Bipedal environment based on terrain type

  • Slightly uneven terrain (BipedalWalker-v3);
  • Hardcore terrain with ladders, stumps, and pitfalls (BipedalWalkerHardcore-v3).

Walking Strategies

There are 4 major strategies to walk, usually, our agent tries all of them during the training process

Importing Libraries

In [ ]:
import time
from multiprocessing import Process, Pipe
from threading import Thread, Lock
import copy
from tensorflow.keras import backend as K
from tensorflow.keras.optimizers import Adam, RMSprop, Adagrad, Adadelta
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model, load_model
from tensorboardX import SummaryWriter
import tensorflow as tf
import numpy as np
import gym
import random
import imageio
import glob
import os
import matplotlib.pyplot as plt
import pyvirtualdisplay
from IPython.core.display import Video
In [ ]:
_display = pyvirtualdisplay.Display(visible=False,  # use False with Xvfb
                                    size=(1400, 900))
_ = _display.start()

System Environments & Packages Version

In [ ]:
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'  # -1:cpu, 0:first gpu
# tf.config.experimental_run_functions_eagerly(True) # used for debuging and development
In [ ]:
# list of available gpus
gpus = tf.config.experimental.list_physical_devices('GPU')
# usually using this for fastest performance
tf.compat.v1.disable_eager_execution()
In [ ]:
print('Tensorflow Version: '+tf.__version__)
print('Gym Version: '+gym.__version__)
Tensorflow Version: 2.4.1
Gym Version: 0.17.3

Get a list of available GPUs, if more than one gpu, then just use the first gpu, so as not to exhaust all computational power on the laptop.

In [ ]:
if len(gpus) > 0:
    print(f'GPUs {gpus}')
    try:
        # Only the first gpu
        tf.config.experimental.set_memory_growth(gpus[0], True)
    except RuntimeError:
        pass
In [ ]:
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

Comment out this block of codes to run multiple workers to train the agent asynchronously.

In [ ]:
# class Environment(Process):
#     def __init__(self, env_idx, child_conn, env_name, state_size, action_size, visualize=False):
#         super(Environment, self).__init__()
#         self.env = gym.make(env_name)
#         self.is_render = visualize
#         self.env_idx = env_idx
#         self.child_conn = child_conn
#         self.state_size = state_size
#         self.action_size = action_size

#     def run(self):
#         super(Environment, self).run()
#         state = self.env.reset()
#         state = np.reshape(state, [1, self.state_size])
#         self.child_conn.send(state)
#         while True:
#             action = self.child_conn.recv()
#             # if self.is_render and self.env_idx == 0:
#             # self.env.render()

#             state, reward, done, info = self.env.step(action)
#             state = np.reshape(state, [1, self.state_size])

#             if done:
#                 state = self.env.reset()
#                 state = np.reshape(state, [1, self.state_size])

#             self.child_conn.send([state, reward, done, info])

Run the game with random actions.

In [ ]:
env = gym.make("BipedalWalker-v3")
env = gym.wrappers.Monitor(env, '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/train_video/random/', force=True)

reward_list=[]
def Random_games():
    # Each of this episode is its own game.
    action_size = env.action_space.shape[0]
    for episode in range(2):
        print('Episode: {}'.format(episode))
        env.reset()
        # this is each frame, up to 500...but we wont make it that far with random.
        while True:
            # This will display the environment
            # Only display if you really want to see it.
            # Takes much longer to display it.
            env.render()
            
            # This will just create a sample action in any environment.
            # In this environment, the action can be any of one how in list on 4, for example [0 1 0 0]
            action = np.random.uniform(-1.0, 1.0, size=action_size)

            # this executes the environment with an action, 
            # and returns the observation of the environment, 
            # the reward, if the env is over, and other info.
            next_state, reward, done, info = env.step(action)
            
            # lets print everything in one line:
            print('Reward: {}'.format(reward))
            print('Done: {}'.format(done))
            print('Info: {}'.format(info))
            print('Actions: {}'.format(action))
            # Append the reward to the list
            reward_list.append(reward)
            if done:
                break
/usr/local/lib/python3.6/dist-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))

In the above code, the main code line is: action = np.random.uniform(-1.0, 1.0, size=action_size), where four random numbers between -1 and 1 are generated in NumPy list form:

In [ ]:
Random_games()
Episode: 0
Reward: -0.07159557025012221
Done: False
Info: {}
Actions: [ 0.37093781  0.04467148 -0.52047019  0.42449219]
Reward: -0.06777469860820882
Done: False
Info: {}
Actions: [ 0.86797327 -0.86810834 -0.28427233 -0.06327547]
Reward: -0.06167830697187383
Done: False
Info: {}
Actions: [-0.56712858 -0.85322579  0.39751426 -0.03528485]
Reward: -0.126249152974594
Done: False
Info: {}
Actions: [-0.93248714 -0.42735792  0.52235897  0.89253594]
Reward: -0.16580603269937577
Done: False
Info: {}
Actions: [-0.26363885  0.02073754 -0.70147969  0.75105589]
Reward: -0.18502122210893812
Done: False
Info: {}
Actions: [ 0.45691425  0.84745735 -0.59098827 -0.10441873]
Reward: -0.17148390710086514
Done: False
Info: {}
Actions: [ 0.29642464  0.08579142 -0.26455209  0.92479206]
Reward: -0.1068889037115354
Done: False
Info: {}
Actions: [0.87328353 0.25753478 0.34624331 0.73468157]
Reward: -0.08705175390385832
Done: False
Info: {}
Actions: [-0.22941384  0.64019083  0.26243311 -0.69267867]
Reward: -0.043627789938637765
Done: False
Info: {}
Actions: [ 0.69456595 -0.53574757  0.57539877  0.73731105]
Reward: -0.12595810228381957
Done: False
Info: {}
Actions: [-0.7459106  -0.39926929 -0.87709608 -0.06217508]
Reward: -0.05835975440433122
Done: False
Info: {}
Actions: [0.96623693 0.04242701 0.1817048  0.33070774]
Reward: -0.10524388163845647
Done: False
Info: {}
Actions: [-0.9348322  -0.80089655  0.90647221 -0.38956739]
Reward: -0.12093070927945658
Done: False
Info: {}
Actions: [ 0.14748526  0.91920978 -0.72659194  0.11994515]
Reward: -0.06487726780320188
Done: False
Info: {}
Actions: [0.0418942  0.52340586 0.53295364 0.39852132]
Reward: -0.03302795914740059
Done: False
Info: {}
Actions: [0.27263018 0.32862022 0.17600096 0.74407425]
Reward: -0.14944637975653816
Done: False
Info: {}
Actions: [-0.10083959 -0.94730832 -0.86727585 -0.84349721]
Reward: -0.2475810096502743
Done: False
Info: {}
Actions: [-0.85661856 -0.47950486 -0.83167672  0.32241467]
Reward: -0.1714466022525554
Done: False
Info: {}
Actions: [0.30381447 0.12412524 0.18037169 0.35405707]
Reward: -0.12048387282208183
Done: False
Info: {}
Actions: [ 0.97524912 -0.00514396  0.02463026 -0.94516817]
Reward: -0.17732746724026624
Done: False
Info: {}
Actions: [-0.72050072  0.4980953  -0.80395496  0.54375159]
Reward: -0.10134070460590976
Done: False
Info: {}
Actions: [-0.18312004 -0.00966661  0.49059912 -0.11477607]
Reward: -0.2021776806385088
Done: False
Info: {}
Actions: [-0.2968205  -0.44425764 -0.7122217  -0.28256151]
Reward: -0.179092119393278
Done: False
Info: {}
Actions: [-0.12325055 -0.94192941  0.55244229  0.51690914]
Reward: -0.16903509655736715
Done: False
Info: {}
Actions: [-0.98901886  0.44485619 -0.04326776 -0.68290395]
Reward: -0.15666763186635294
Done: False
Info: {}
Actions: [ 0.51998422  0.26616838 -0.42902671  0.95688922]
Reward: -0.1267686834893049
Done: False
Info: {}
Actions: [-0.04049474 -0.61156495 -0.94820874  0.31196407]
Reward: -0.0940773462680094
Done: False
Info: {}
Actions: [ 0.30916521  0.64384804  0.24625745 -0.90436765]
Reward: -0.10125801189610335
Done: False
Info: {}
Actions: [ 0.48071099  0.61028396 -0.51135872 -0.68430375]
Reward: -0.07248654015441743
Done: False
Info: {}
Actions: [-0.5542927   0.0631047   0.67846434  0.31928224]
Reward: -0.08136640866036862
Done: False
Info: {}
Actions: [-0.71258627  0.14525891  0.4103094   0.69592911]
Reward: -0.03218778392954618
Done: False
Info: {}
Actions: [-0.0441442  -0.5374028   0.92744058 -0.33873421]
Reward: -0.1487676074030807
Done: False
Info: {}
Actions: [-0.78540944  0.47460871 -0.89767736  0.8330022 ]
Reward: -0.20363795020333014
Done: False
Info: {}
Actions: [-0.73070175  0.98930104 -0.80267808 -0.36762295]
Reward: -0.16741576170488306
Done: False
Info: {}
Actions: [-0.53314831  0.94340377  0.69608028 -0.49827923]
Reward: -0.13847680703923199
Done: False
Info: {}
Actions: [ 0.16262755 -0.7383715  -0.10684908  0.33465099]
Reward: -0.11626637180940963
Done: False
Info: {}
Actions: [-0.12970326 -0.15677236  0.94763109  0.8876871 ]
Reward: -0.06152383886533361
Done: False
Info: {}
Actions: [ 0.38262224 -0.47885003  0.27811299 -0.77041025]
Reward: -0.04616966592468465
Done: False
Info: {}
Actions: [ 0.01687994  0.14857205 -0.12434724 -0.81319847]
Reward: 0.010316102588997351
Done: False
Info: {}
Actions: [ 0.62536294 -0.47325978  0.85211767  0.13696377]
Reward: 0.02519760055385068
Done: False
Info: {}
Actions: [-0.01853971  0.64695691  0.07238013 -0.97314287]
Reward: 0.003729897520764928
Done: False
Info: {}
Actions: [-0.48454349  0.6049499   0.62549561 -0.00799616]
Reward: 0.020665410227516102
Done: False
Info: {}
Actions: [ 0.80948664 -0.08675749 -0.49795224 -0.0685235 ]
Reward: -0.0059744479706876265
Done: False
Info: {}
Actions: [ 0.87362692 -0.59504439 -0.24264784  0.53601743]
Reward: 0.04722368834861138
Done: False
Info: {}
Actions: [ 0.71896144  0.24481869 -0.07735875 -0.07455821]
Reward: -0.009507371255724885
Done: False
Info: {}
Actions: [ 0.75485622  0.08284638 -0.42589775  0.85303627]
Reward: -0.07707655374600553
Done: False
Info: {}
Actions: [-0.49594074 -0.10368466 -0.53180432 -0.65839166]
Reward: -0.13379097813557644
Done: False
Info: {}
Actions: [-0.86923634 -0.89383198  0.6801008   0.78415588]
Reward: -0.20008885255593725
Done: False
Info: {}
Actions: [-0.13307957  0.80870691 -0.93309969  0.64510716]
Reward: -0.05369461820135479
Done: False
Info: {}
Actions: [0.99218466 0.21167624 0.78544739 0.16152967]
Reward: -0.01492250080160714
Done: False
Info: {}
Actions: [ 0.38375856  0.87421769  0.13302816 -0.46290268]
Reward: -0.06764333761927506
Done: False
Info: {}
Actions: [-0.435585   -0.03175851 -0.2341553   0.95571976]
Reward: -0.11099938355677333
Done: False
Info: {}
Actions: [-0.95977142 -0.56542082  0.44751885  0.02320572]
Reward: -0.024121250204909753
Done: False
Info: {}
Actions: [ 0.79674037 -0.46249221  0.67545304 -0.14550463]
Reward: -0.03628980487224828
Done: False
Info: {}
Actions: [-0.0114813   0.40717439 -0.34743503  0.42052257]
Reward: -0.08918203267147018
Done: False
Info: {}
Actions: [ 0.72054235  0.98757074 -0.52204475  0.81191541]
Reward: -0.14016223189715318
Done: False
Info: {}
Actions: [ 0.3661635   0.0836694  -0.98992525  0.86165958]
Reward: -0.04225468568574643
Done: False
Info: {}
Actions: [ 0.82651374  0.78032923  0.69929422 -0.58983463]
Reward: -0.007543798631992863
Done: False
Info: {}
Actions: [-0.38621737 -0.49022157  0.82784061 -0.05322218]
Reward: -0.016502865462360575
Done: False
Info: {}
Actions: [-0.3951165  -0.1725359   0.32309112 -0.62619995]
Reward: -0.05173434184284332
Done: False
Info: {}
Actions: [ 0.95650379  0.6590396  -0.89213046 -0.10418301]
Reward: -0.06416796768767595
Done: False
Info: {}
Actions: [ 0.78055022 -0.83796107 -0.89467384 -0.42262998]
Reward: 0.04479345928009307
Done: False
Info: {}
Actions: [0.0015356  0.18692923 0.82027323 0.01311928]
Reward: -0.04985377867804382
Done: False
Info: {}
Actions: [-0.44530661 -0.36237738 -0.49148277 -0.70288361]
Reward: -0.01256985906282953
Done: False
Info: {}
Actions: [ 0.46998329 -0.76933106 -0.01161509  0.46058013]
Reward: -0.11821967067263944
Done: False
Info: {}
Actions: [-0.88700564 -0.83436654 -0.3112775   0.83437833]
Reward: -0.12503392027849852
Done: False
Info: {}
Actions: [-0.96905545 -0.15151307  0.60688771 -0.61427066]
Reward: -0.07050526301102734
Done: False
Info: {}
Actions: [ 0.27689893  0.93099198  0.4768508  -0.44322648]
Reward: -0.027006615130110986
Done: False
Info: {}
Actions: [-0.0495649   0.11966882  0.88142659 -0.73146709]
Reward: -0.07078449037610321
Done: False
Info: {}
Actions: [-0.54347809 -0.96149268  0.55976478  0.92967483]
Reward: -0.0709194663108154
Done: False
Info: {}
Actions: [ 0.25429984  0.40356713  0.52491407 -0.14128337]
Reward: -0.10146105801983471
Done: False
Info: {}
Actions: [-0.66434032  0.42568487 -0.20405569 -0.40142381]
Reward: -0.09479048888892208
Done: False
Info: {}
Actions: [-0.46717883  0.87547016  0.49446425 -0.46788248]
Reward: -0.14026689662596467
Done: False
Info: {}
Actions: [-0.14171683 -0.67736354 -0.72539196 -0.76478195]
Reward: -0.0748570920633449
Done: False
Info: {}
Actions: [ 0.4708606   0.21093868  0.34243105 -0.98152189]
Reward: -0.039461345154777554
Done: False
Info: {}
Actions: [ 0.36434973 -0.68126036  0.05276555 -0.20594254]
Reward: -0.059941964206386214
Done: False
Info: {}
Actions: [-0.57231342 -0.04359168  0.56992047 -0.6731176 ]
Reward: 0.002142219782734437
Done: False
Info: {}
Actions: [ 0.69442615 -0.87122396  0.55648301  0.08056685]
Reward: -0.04583548225364855
Done: False
Info: {}
Actions: [-0.55935987 -0.29914087 -0.4298731  -0.14330444]
Reward: -0.08721701676546613
Done: False
Info: {}
Actions: [-0.31795415  0.69043595 -0.59534862  0.09689262]
Reward: -0.13662717031270483
Done: False
Info: {}
Actions: [-0.88532712 -0.78305339 -0.62694762  0.85097091]
Reward: -0.06347336171218348
Done: False
Info: {}
Actions: [ 0.01566908 -0.44461915  0.84287813  0.91793112]
Reward: -0.017951703117822626
Done: False
Info: {}
Actions: [-0.66551836  0.53967386  0.81444021  0.67167396]
Reward: -0.026549585557757237
Done: False
Info: {}
Actions: [ 0.74389688 -0.59846646 -0.86802261 -0.3752922 ]
Reward: -0.04100481910500971
Done: False
Info: {}
Actions: [-0.31650075  0.97390844 -0.70711589 -0.18606704]
Reward: 0.009501626224813764
Done: False
Info: {}
Actions: [ 0.58532877  0.6107692  -0.12803289  0.6538255 ]
Reward: 0.03897579592672237
Done: False
Info: {}
Actions: [ 0.88216094 -0.2894364  -0.27028373 -0.42340226]
Reward: 0.029094049521830624
Done: False
Info: {}
Actions: [ 0.9956676  -0.81044545  0.79202653 -0.87389664]
Reward: 0.10851333001032959
Done: False
Info: {}
Actions: [ 0.99778724 -0.26028636  0.46173499 -0.06231585]
Reward: 0.05317798301019504
Done: False
Info: {}
Actions: [-0.44235811 -0.18183365 -0.17512488 -0.2760189 ]
Reward: 0.0666463077960184
Done: False
Info: {}
Actions: [ 0.57696648  0.82954008 -0.28326566 -0.18382351]
Reward: 0.10318073095891907
Done: False
Info: {}
Actions: [ 0.31945811  0.30568037 -0.19311423  0.95512581]
Reward: 0.12143240928911639
Done: False
Info: {}
Actions: [ 0.56657904 -0.01761731 -0.37079559 -0.77931072]
Reward: 0.1591880333000507
Done: False
Info: {}
Actions: [ 0.12427991 -0.40782282  0.29305121  0.82978672]
Reward: 0.18282584817125616
Done: False
Info: {}
Actions: [-0.25732006 -0.76912375  0.97897042  0.67046943]
Reward: 0.21669186785844097
Done: False
Info: {}
Actions: [ 0.69765345 -0.77733373 -0.26696162 -0.87005814]
Reward: 0.30411564226674065
Done: False
Info: {}
Actions: [0.5761661  0.42401816 0.78846975 0.41168377]
Reward: 0.2998731851425731
Done: False
Info: {}
Actions: [-0.24855536 -0.99934991  0.82569174 -0.32564989]
Reward: 0.25671838174360895
Done: False
Info: {}
Actions: [ 0.357924   -0.27596636 -0.99139165 -0.69626978]
Reward: 0.2120711743306908
Done: False
Info: {}
Actions: [-0.08979926 -0.6031107  -0.88865738 -0.05253821]
Reward: 0.17463428223874014
Done: False
Info: {}
Actions: [ 0.07218987 -0.52606941 -0.50066234 -0.99776455]
Reward: 0.22381651974562694
Done: False
Info: {}
Actions: [-0.74629689  0.75308012  0.31198232  0.30810429]
Reward: 0.21196309813958006
Done: False
Info: {}
Actions: [-0.71073582  0.72957926 -0.91307589 -0.21568556]
Reward: 0.23958624138195292
Done: False
Info: {}
Actions: [-0.47678041  0.10721134  0.33207552 -0.00402879]
Reward: 0.30635824857568833
Done: False
Info: {}
Actions: [ 0.55234344 -0.00069238  0.62736669  0.06616013]
Reward: 0.2474176670219874
Done: False
Info: {}
Actions: [-0.96743348  0.04460718  0.32834389 -0.44393305]
Reward: 0.27858442408923917
Done: False
Info: {}
Actions: [0.54260728 0.85414978 0.22523361 0.9213465 ]
Reward: 0.3659402528444757
Done: False
Info: {}
Actions: [ 0.72246218  0.26902778  0.4078113  -0.55528974]
Reward: 0.3138989038695721
Done: False
Info: {}
Actions: [-0.40158345 -0.02437596 -0.44159932  0.78270255]
Reward: 0.3461036974807663
Done: False
Info: {}
Actions: [ 0.88200688  0.16473154 -0.41699649  0.28324176]
Reward: 0.4241943183176007
Done: False
Info: {}
Actions: [0.77965205 0.84541206 0.8770646  0.8239924 ]
Reward: 0.32450319145241974
Done: False
Info: {}
Actions: [-0.51099384  0.81545143 -0.89180291 -0.97937327]
Reward: 0.3682839307027473
Done: False
Info: {}
Actions: [ 0.95236947  0.55015021 -0.57161468  0.72316295]
Reward: 0.43684649017187627
Done: False
Info: {}
Actions: [-0.69872049  0.24984131  0.85455346 -0.00926559]
Reward: 0.46336447571506045
Done: False
Info: {}
Actions: [ 0.55733125 -0.34392364 -0.12495819 -0.02394   ]
Reward: 0.4511432546937581
Done: False
Info: {}
Actions: [-0.31027138  0.44093586 -0.65727296  0.37011533]
Reward: 0.07780435750952922
Done: False
Info: {}
Actions: [-0.82905891  0.78255777  0.44457703 -0.40130698]
Reward: -0.08164979192841629
Done: False
Info: {}
Actions: [-0.6308359  -0.44267362 -0.41076396  0.869649  ]
Reward: -0.22676478369400588
Done: False
Info: {}
Actions: [ 0.02086279  0.79220945  0.14235779 -0.27370726]
Reward: -0.32638087692731943
Done: False
Info: {}
Actions: [ 0.93591754 -0.53940422  0.93756679  0.36672433]
Reward: -0.23101641730325173
Done: False
Info: {}
Actions: [-0.38662046 -0.72093874 -0.58501353 -0.3369468 ]
Reward: -0.2806665742553592
Done: False
Info: {}
Actions: [ 0.88895587  0.97299031 -0.41845624 -0.59371809]
Reward: -0.31093595946159136
Done: False
Info: {}
Actions: [ 0.18695139 -0.35713605  0.73593212  0.99231618]
Reward: -0.22910861505340774
Done: False
Info: {}
Actions: [-0.83690418 -0.3729515   0.11831622  0.13052737]
Reward: -0.23870199277040702
Done: False
Info: {}
Actions: [ 0.77811957  0.76100129 -0.10536308  0.05859225]
Reward: -0.215409901635533
Done: False
Info: {}
Actions: [-0.72552888 -0.38655259  0.57215957  0.27923767]
Reward: -100
Done: True
Info: {}
Actions: [-0.54131367 -0.95852452  0.85783114  0.72194234]
Episode: 1
Reward: -0.10338991883755279
Done: False
Info: {}
Actions: [-0.86445287  0.48953121  0.89212867  0.94203682]
Reward: -0.08077176220036239
Done: False
Info: {}
Actions: [ 0.07791344  0.05448123 -0.47139035  0.61612436]
Reward: -0.10198583479294285
Done: False
Info: {}
Actions: [-0.09935486 -0.81621023 -0.07169101 -0.88691178]
Reward: -0.16084228438727344
Done: False
Info: {}
Actions: [ 0.16431755  0.9180528  -0.87544632  0.77109687]
Reward: -0.039385414888743
Done: False
Info: {}
Actions: [0.73855795 0.63565983 0.63873405 0.4310252 ]
Reward: -0.030294588771119776
Done: False
Info: {}
Actions: [ 0.74829554 -0.09550847 -0.61927627  0.99482209]
Reward: 0.05999549053484117
Done: False
Info: {}
Actions: [0.16219376 0.82369116 0.98826179 0.43379131]
Reward: 0.024636841331387847
Done: False
Info: {}
Actions: [ 0.19988403 -0.64848391 -0.12655365  0.68492477]
Reward: -0.2174766917467734
Done: False
Info: {}
Actions: [-0.49638562  0.50942762  0.95175324 -0.68390354]
Reward: -0.11367777830341941
Done: False
Info: {}
Actions: [-0.79190385 -0.23932623 -0.09106769 -0.91293541]
Reward: -0.16986779887900313
Done: False
Info: {}
Actions: [ 0.7476304  -0.77747639 -0.25560893 -0.81419241]
Reward: -0.21042364917110695
Done: False
Info: {}
Actions: [ 0.39947054 -0.74874746  0.96016923 -0.90536005]
Reward: -0.14453495576733233
Done: False
Info: {}
Actions: [-0.59992796  0.40698109  0.98453253  0.10887914]
Reward: -0.31263584848038634
Done: False
Info: {}
Actions: [0.91151554 0.92327902 0.66360329 0.4383457 ]
Reward: -0.18792504097342358
Done: False
Info: {}
Actions: [-0.1061868   0.33966042 -0.41741437 -0.04977709]
Reward: -0.16394348033384246
Done: False
Info: {}
Actions: [-0.53914881  0.486969    0.42859926 -0.21402906]
Reward: -0.2256046817991672
Done: False
Info: {}
Actions: [ 0.6775358  -0.41939944  0.98823275 -0.65070378]
Reward: -0.19506754228070847
Done: False
Info: {}
Actions: [ 0.78957253 -0.58467083 -0.15589944 -0.48519108]
Reward: -0.20383749597909673
Done: False
Info: {}
Actions: [-0.54954386  0.39422204  0.78265031  0.5254821 ]
Reward: -0.2973350240643107
Done: False
Info: {}
Actions: [ 0.66704979 -0.09819766  0.81547405  0.22854859]
Reward: -0.24188701943285257
Done: False
Info: {}
Actions: [-0.89129362  0.43814935  0.71620597  0.25689503]
Reward: -0.2629179883244906
Done: False
Info: {}
Actions: [ 0.42371834  0.20891391 -0.06758589  0.95576504]
Reward: -0.29471826682882973
Done: False
Info: {}
Actions: [ 0.37549182 -0.57113304  0.09131485 -0.99475328]
Reward: -0.301949244408366
Done: False
Info: {}
Actions: [ 0.61823423  0.83486889 -0.70366015 -0.85253233]
Reward: -0.32107251082780613
Done: False
Info: {}
Actions: [ 0.34369125  0.9373038   0.09273156 -0.51044191]
Reward: -0.369241450417043
Done: False
Info: {}
Actions: [ 0.12052877  0.15876842  0.9914293  -0.72644046]
Reward: -0.3217415880660914
Done: False
Info: {}
Actions: [ 0.56186066 -0.68375168 -0.52118097  0.80328259]
Reward: -0.3297329259147604
Done: False
Info: {}
Actions: [-0.20082929  0.01803954  0.64377751  0.4082379 ]
Reward: -0.31137392357732807
Done: False
Info: {}
Actions: [-0.52398725 -0.7569606   0.10551421 -0.4867684 ]
Reward: -0.2335746273830624
Done: False
Info: {}
Actions: [-0.44212537 -0.50718052 -0.2428935   0.78880777]
Reward: -0.25229771582085087
Done: False
Info: {}
Actions: [-0.4896192  -0.11523653  0.70726043  0.57884746]
Reward: -0.1530444834247403
Done: False
Info: {}
Actions: [ 6.60593521e-05 -2.51645734e-01 -7.51672262e-01 -2.11714153e-01]
Reward: -0.2302314755098603
Done: False
Info: {}
Actions: [-0.79657837  0.09115219  0.79975066 -0.91573787]
Reward: -0.20635625106683947
Done: False
Info: {}
Actions: [ 0.63484657  0.3687399  -0.8003358  -0.69808856]
Reward: -0.24816311323487708
Done: False
Info: {}
Actions: [-0.69813341 -0.56769845  0.92560213 -0.69664979]
Reward: -0.13098891066267376
Done: False
Info: {}
Actions: [-0.81039313 -0.7113996  -0.80456302 -0.5404359 ]
Reward: -0.1306175589235794
Done: False
Info: {}
Actions: [-0.83050926  0.07155871  0.05503547 -0.77629968]
Reward: -0.10796019331258057
Done: False
Info: {}
Actions: [-0.29310423 -0.27554054  0.02378786  0.79796382]
Reward: -0.14233905775056668
Done: False
Info: {}
Actions: [-0.28820609 -0.5242216   0.10702782 -0.65384127]
Reward: -0.18076228583148118
Done: False
Info: {}
Actions: [ 0.53982608 -0.36378889 -0.17280403  0.47965374]
Reward: -0.2862379661136153
Done: False
Info: {}
Actions: [0.1517556  0.96662885 0.85258273 0.51758248]
Reward: -0.15407702545409716
Done: False
Info: {}
Actions: [ 0.35570364 -0.28564818  0.45689975  0.99353982]
Reward: -0.09262536140153896
Done: False
Info: {}
Actions: [ 0.36366169  0.44589005 -0.99879642  0.82528242]
Reward: 0.010742488792629602
Done: False
Info: {}
Actions: [-0.99086725 -0.39057711 -0.24421447  0.59566069]
Reward: -0.010832307802430063
Done: False
Info: {}
Actions: [ 0.89774707  0.43210598 -0.83232283 -0.95655194]
Reward: -0.052697811732023134
Done: False
Info: {}
Actions: [ 0.00980128 -0.52876334 -0.84436616  0.73721616]
Reward: -0.12834352571056856
Done: False
Info: {}
Actions: [ 0.20109317  0.90290142  0.78855005 -0.02293233]
Reward: -0.2236018243800169
Done: False
Info: {}
Actions: [ 0.72875223 -0.24265105  0.3577217   0.86163546]
Reward: -0.2589383017207967
Done: False
Info: {}
Actions: [ 0.98062911 -0.6917452   0.51702688 -0.77200411]
Reward: -0.18091884560474064
Done: False
Info: {}
Actions: [ 0.7613845  -0.33633396 -0.71366713 -0.17600758]
Reward: -0.22868620025621303
Done: False
Info: {}
Actions: [0.86517145 0.68363325 0.09933059 0.37734218]
Reward: -0.24207400352783032
Done: False
Info: {}
Actions: [-0.81943066  0.81377656  0.63298676 -0.31235514]
Reward: -0.1985464668466052
Done: False
Info: {}
Actions: [-0.27085904 -0.02154145  0.51652363 -0.65453515]
Reward: -0.15396581873781665
Done: False
Info: {}
Actions: [-0.51336963  0.6239747   0.77271032  0.94364217]
Reward: -0.14950700441669962
Done: False
Info: {}
Actions: [ 0.67569198  0.97167752 -0.9250606  -0.32553064]
Reward: -100
Done: True
Info: {}
Actions: [-0.23706275  0.17968761  0.41037182  0.62414336]

Display the video in the google drive, random games is being play by the agent,

In [ ]:
Video("/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/train_video/random/openaigym.video.0.7332.video000064.mp4",embed=True)
Out[ ]:

The reward can give a huge different for each action.

For example, when the walker is not moving but not falling, it will have a reward of about 0

But when it falls, it will have a negative reward of -100.

So the plot is constantly around zero while the walker is balanced, but drops immediately when it falls, this is just random actions so it is not improving.

In [ ]:
plt.plot(range(len(reward_list)),reward_list)
plt.title('Reward for each actions in the 10 episode.')
plt.show()

The Actor Model

The Actor model performs the task of learning what action to take under a particular observed state of the environment.

In the BipedalWalker-v3 case, it takes 24 values list (mentioned before) of the game as input which represents the current state of our walker and gives particular actions what legs to move

PPO is a policy gradients method that makes policy updates using a surrogate loss function to avoid catastrophic drops in performance.

The algorithm is robust in that hyperparameter initializations are a bit more forgiving and it can work out of the box on a wide variety of RL tasks.

Reference: www.medium.com

In [ ]:
class Actor_Model:
    def __init__(self, input_shape, action_space, lr, optimizer):
        # Take inputs
        X_input = Input(input_shape)
        self.action_space = action_space
        # Build network model
        X = Dense(512, activation="relu", kernel_initializer=tf.random_normal_initializer(
            stddev=0.01))(X_input)
        X = Dense(256, activation="relu",
                  kernel_initializer=tf.random_normal_initializer(stddev=0.01))(X)
        X = Dense(64, activation="relu",
                  kernel_initializer=tf.random_normal_initializer(stddev=0.01))(X)
        # Output a probability
        output = Dense(self.action_space, activation="tanh")(X)
        # Compile model
        self.Actor = Model(inputs=X_input, outputs=output)
        self.Actor.compile(loss=self.ppo_loss_continuous,
                           optimizer=optimizer(lr=lr))
        print(self.Actor.summary())
    """This function uses ppo to calculate the loss"""
    def ppo_loss_continuous(self, y_true, y_pred):
        advantages, actions, logp_old_ph, = y_true[:, :1], y_true[:,
                                                                  1:1+self.action_space], y_true[:, 1+self.action_space]
        LOSS_CLIPPING = 0.2
        logp = self.gaussian_likelihood(actions, y_pred)

        ratio = K.exp(logp - logp_old_ph)

        p1 = ratio * advantages
        p2 = tf.where(advantages > 0, (1.0 + LOSS_CLIPPING)*advantages,
                      (1.0 - LOSS_CLIPPING)*advantages)  # minimum advantage

        actor_loss = -K.mean(K.minimum(p1, p2))

        return actor_loss

    def gaussian_likelihood(self, actions, pred):  # for keras custom loss
        log_std = -0.5 * np.ones(self.action_space, dtype=np.float32)
        pre_sum = -0.5 * (((actions-pred)/(K.exp(log_std)+1e-8))
                          ** 2 + 2*log_std + K.log(2*np.pi))
        return K.sum(pre_sum, axis=1)
    """This function calls the model itself to make predictions and return probabilities"""
    def predict(self, state):
        return self.Actor.predict(state)

The Critic model

The main role of the Critic model is learning to evaluate if the action taken by the Actor led our environment to be in a better state or not and give its feedback to the Actor.

Then we send the action predicted by the Actor to our environment and observe what happens in the game. If something positive happens as a result of our action, the environment sends back a positive response in the form of a reward and vice versa if we receive a negative reward. These rewards are taken in by training our Critic model.

In [ ]:
class Critic_Model:
    def __init__(self, input_shape, action_space, lr, optimizer):
        X_input = Input(input_shape)
        old_values = Input(shape=(1,))
        # Define neural network model
        V = Dense(512, activation="relu", kernel_initializer=tf.random_normal_initializer(
            stddev=0.01))(X_input)
        V = Dense(256, activation="relu",
                  kernel_initializer=tf.random_normal_initializer(stddev=0.01))(V)
        V = Dense(64, activation="relu",
                  kernel_initializer=tf.random_normal_initializer(stddev=0.01))(V)
        value = Dense(1, activation=None)(V)

        self.Critic = Model(inputs=[X_input, old_values], outputs=value)
        self.Critic.compile(loss=[self.critic_PPO2_loss(
            old_values)], optimizer=optimizer(lr=lr))
    # Calculating loss
    def critic_PPO2_loss(self, values):
        def loss(y_true, y_pred):
            LOSS_CLIPPING = 0.2
            clipped_value_loss = values + \
                K.clip(y_pred - values, -LOSS_CLIPPING, LOSS_CLIPPING)
            v_loss1 = (y_true - clipped_value_loss) ** 2
            v_loss2 = (y_true - y_pred) ** 2

            value_loss = 0.5 * K.mean(K.maximum(v_loss1, v_loss2))
            # value_loss = K.mean((y_true - y_pred) ** 2) # standard PPO loss
            return value_loss
        return loss
    # Predict actions to take
    def predict(self, state):
        return self.Critic.predict([state, np.zeros((state.shape[0], 1))])

Model Training

This PPOAgent class integrates the two models and other functions such as saving models and figures.

In [ ]:
class PPOAgent:
    # PPO Main Optimization Algorithm
    def __init__(self, env_name, model_name=""):
        # Initialization
        # Environment and PPO parameters
        self.env_name = env_name
        self.env = gym.make(env_name)
        self.env = gym.wrappers.Monitor(env, "/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/train_video/real/", force=True)

        self.action_size = self.env.action_space.shape[0]
        self.state_size = self.env.observation_space.shape
        self.EPISODES = 150  # total episodes to train through all environments
        self.episode = 1  # used to track the episodes total count of episodes played through all thread environments
        self.max_average = 0  # when average score is above 0 model will be saved
        self.lr = 0.00025
        self.epochs = 10  # training epochs
        self.shuffle = True
        self.Training_batch = 512
        # self.optimizer = RMSprop
        self.optimizer = Adam
        self.score_list = []
        self.avg_score_list = []

        self.replay_count = 0
        self.writer = SummaryWriter(
            comment="_"+self.env_name+"_"+self.optimizer.__name__+"_"+str(self.lr))

        # Instantiate plot memory
        self.scores_, self.episodes_, self.average_ = [], [], []  # used in matplotlib plots

        # Create Actor-Critic network models
        self.Actor = Actor_Model(
            input_shape=self.state_size, action_space=self.action_size, lr=self.lr, optimizer=self.optimizer)
        self.Critic = Critic_Model(
            input_shape=self.state_size, action_space=self.action_size, lr=self.lr, optimizer=self.optimizer)

        self.Actor_name = f"{self.env_name}_PPO_Actor.h5"
        self.Critic_name = f"{self.env_name}_PPO_Critic.h5"
        # self.load() # uncomment to continue training from old weights

        # do not change bellow
        self.log_std = -0.5 * np.ones(self.action_size, dtype=np.float32)
        self.std = np.exp(self.log_std)

    def act(self, state):
        # Use the network to predict the next action to take, using the model
        pred = self.Actor.predict(state)

        low, high = -1.0, 1.0  # -1 and 1 are boundaries of tanh
        action = pred + \
            np.random.uniform(low, high, size=pred.shape) * self.std
        action = np.clip(action, low, high)

        logp_t = self.gaussian_likelihood(action, pred, self.log_std)

        return action, logp_t

    def gaussian_likelihood(self, action, pred, log_std):
        # https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/sac/policies.py
        pre_sum = -0.5 * (((action-pred)/(np.exp(log_std)+1e-8))
                          ** 2 + 2*log_std + np.log(2*np.pi))
        return np.sum(pre_sum, axis=1)

    def discount_rewards(self, reward):  # gaes is better
        # Compute the gamma-discounted rewards over an episode
        # We apply the discount and normalize it to avoid big variability of rewards
        gamma = 0.99    # discount rate
        running_add = 0
        discounted_r = np.zeros_like(reward)
        for i in reversed(range(0, len(reward))):
            running_add = running_add * gamma + reward[i]
            discounted_r[i] = running_add

        discounted_r -= np.mean(discounted_r)  # normalizing the result
        # divide by standard deviation
        discounted_r /= (np.std(discounted_r) + 1e-8)
        return discounted_r

    def get_gaes(self, rewards, dones, values, next_values, gamma=0.99, lamda=0.90, normalize=True):
        deltas = [r + gamma * (1 - d) * nv - v for r, d,
                  nv, v in zip(rewards, dones, next_values, values)]
        deltas = np.stack(deltas)
        gaes = copy.deepcopy(deltas)
        for t in reversed(range(len(deltas) - 1)):
            gaes[t] = gaes[t] + (1 - dones[t]) * gamma * lamda * gaes[t + 1]

        target = gaes + values
        if normalize:
            gaes = (gaes - gaes.mean()) / (gaes.std() + 1e-8)
        return np.vstack(gaes), np.vstack(target)

    def replay(self, states, actions, rewards, dones, next_states, logp_ts):
        # reshape memory to appropriate shape for training
        states = np.vstack(states)
        next_states = np.vstack(next_states)
        actions = np.vstack(actions)
        logp_ts = np.vstack(logp_ts)

        # Get Critic network predictions
        values = self.Critic.predict(states)
        next_values = self.Critic.predict(next_states)

        # Compute discounted rewards and advantages
        # discounted_r = self.discount_rewards(rewards)
        # advantages = np.vstack(discounted_r - values)
        advantages, target = self.get_gaes(
            rewards, dones, np.squeeze(values), np.squeeze(next_values))

        # pylab.plot(advantages, '.')
        # pylab.plot(target, '-')
        # ax = pylab.gca()
        # ax.grid(True)
        # pylab.subplots_adjust(left=0.05, right=0.98, top=0.96, bottom=0.06)
        # pylab.savefig('trainning_process/'+self.env_name +
        #               "_"+str(self.episode)+".png")
        plt.plot(range(len(advantages)), advantages)
        plt.plot(range(len(target)), target)
        plt.savefig('/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/'+self.env_name +
                    "_"+str(self.episode)+".png")
        # stack everything to numpy array
        # pack all advantages, predictions and actions to y_true and when they are received
        # in custom loss function we unpack it
        y_true = np.hstack([advantages, actions, logp_ts])

        # training Actor and Critic networks
        a_loss = self.Actor.Actor.fit(
            states, y_true, epochs=self.epochs, verbose=0, shuffle=self.shuffle)
        c_loss = self.Critic.Critic.fit(
            [states, values], target, epochs=self.epochs, verbose=0, shuffle=self.shuffle)

        # calculate loss parameters (should be done in loss, but couldn't find working way how to do that with disabled eager execution)
        pred = self.Actor.predict(states)
        log_std = -0.5 * np.ones(self.action_size, dtype=np.float32)
        logp = self.gaussian_likelihood(actions, pred, log_std)
        approx_kl = np.mean(logp_ts - logp)
        approx_ent = np.mean(-logp)

        self.writer.add_scalar('Data/actor_loss_per_replay',
                               np.sum(a_loss.history['loss']), self.replay_count)
        self.writer.add_scalar('Data/critic_loss_per_replay',
                               np.sum(c_loss.history['loss']), self.replay_count)
        self.writer.add_scalar('Data/approx_kl_per_replay',
                               approx_kl, self.replay_count)
        self.writer.add_scalar('Data/approx_ent_per_replay',
                               approx_ent, self.replay_count)
        self.replay_count += 1

    def load(self, model_name):
        self.Actor.Actor.load_weights('/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/'+model_name+'Actor.h5')
        self.Critic.Critic.load_weights(
           '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/'+ model_name+'Critic.h5')

    def save(self, name):
        self.Actor.Actor.save_weights(name+'Actor.h5')
        self.Critic.Critic.save_weights(name+'Critic.h5')

    # pylab.figure(figsize=(18, 9))
    # pylab.subplots_adjust(left=0.05, right=0.98, top=0.96, bottom=0.06)

    def PlotModel(self, score, episode, save=True,test=False):
        if test==True and episode==1:
          self.episodes_=[]

        self.scores_.append(score)
        self.episodes_.append(episode)
        self.average_.append(sum(self.scores_[-50:]) / len(self.scores_[-50:]))
        # pylab.plot(self.episodes_, self.scores_, 'b')
        # pylab.plot(self.episodes_, self.average_, 'r')
        # pylab.ylabel('Score', fontsize=18)
        # pylab.xlabel('Steps', fontsize=18)

        # plt.plot(self.episodes_,self.scores_)
        # plt.plot(self.episodes_,self.average_)
        # plt.ylabel('Score', fontsize=18)
        # plt.xlabel('Steps', fontsize=18)
        # plt.savefig('plot_model/'+self.env_name+".png")

        # Save model every 5 epoch
        if self.episode % 100 == 0 and save:
            self.save('/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/model_progress/'+str(episode))
        # saving best models
        if self.average_[-1] >= self.max_average and save:
            self.max_average = self.average_[-1]
            SAVING = "SAVING"
        # decreaate learning rate every saved model
        # self.lr *= 0.99
        # K.set_value(self.Actor.Actor.optimizer.learning_rate, self.lr)
        # K.set_value(self.Critic.Critic.optimizer.learning_rate, self.lr)
        else:
            SAVING = ""

        return self.average_[-1], SAVING

    def run_batch(self):
        state = self.env.reset()
        state = np.reshape(state, [1, self.state_size[0]])
        done, score, SAVING = False, 0, ''
        while True:
            # Instantiate or reset games memory
            states, next_states, actions, rewards, dones, logp_ts = [], [], [], [], [], []
            for t in range(self.Training_batch):
                self.env.render()
                # Actor picks an action
                action, logp_t = self.act(state)
                # Retrieve new state, reward, and whether the state is terminal
                next_state, reward, done, _ = self.env.step(action[0])
                # Memorize (state, next_states, action, reward, done, logp_ts) for training
                states.append(state)
                next_states.append(np.reshape(
                    next_state, [1, self.state_size[0]]))

                actions.append(action)
                rewards.append(reward)
                dones.append(done)
                logp_ts.append(logp_t[0])

                # Update current state shape
                state = np.reshape(next_state, [1, self.state_size[0]])
                score += reward
                if done:
                    average, SAVING = self.PlotModel(score, self.episode)
                    self.score_list.append(score)
                    self.avg_score_list.append(average)
                    print("episode: {}/{}, score: {}, average: {:.2f} {}".format(
                        self.episode, self.EPISODES, score, average, SAVING))
                    self.episode += 1
                    self.writer.add_scalar(
                        f'Workers:{1}/score_per_episode', score, self.episode)
                    self.writer.add_scalar(
                        f'Workers:{1}/learning_rate', self.lr, self.episode)
                    self.writer.add_scalar(
                        f'Workers:{1}/average_score',  average, self.episode)

                    state, done, score, SAVING = self.env.reset(), False, 0, ''
                    state = np.reshape(state, [1, self.state_size[0]])

            self.replay(states, actions, rewards, dones, next_states, logp_ts)
            if self.episode >= self.EPISODES:
                break

        self.env.close()
    """This functions will train the agent using multiple workers."""
    def run_multiprocesses(self, num_worker=4):
        works, parent_conns, child_conns = [], [], []
        for idx in range(num_worker):
            parent_conn, child_conn = Pipe()
            work = Environment(idx, child_conn, self.env_name,
                               self.state_size[0], self.action_size, True)
            work.start()
            works.append(work)
            parent_conns.append(parent_conn)
            child_conns.append(child_conn)

        states = [[] for _ in range(num_worker)]
        next_states = [[] for _ in range(num_worker)]
        actions = [[] for _ in range(num_worker)]
        rewards = [[] for _ in range(num_worker)]
        dones = [[] for _ in range(num_worker)]
        logp_ts = [[] for _ in range(num_worker)]
        score = [0 for _ in range(num_worker)]

        state = [0 for _ in range(num_worker)]
        for worker_id, parent_conn in enumerate(parent_conns):
            state[worker_id] = parent_conn.recv()

        while self.episode < self.EPISODES:
            # get batch of action's and log_pi's
            action, logp_pi = self.act(np.reshape(
                state, [num_worker, self.state_size[0]]))

            for worker_id, parent_conn in enumerate(parent_conns):
                parent_conn.send(action[worker_id])
                actions[worker_id].append(action[worker_id])
                logp_ts[worker_id].append(logp_pi[worker_id])

            for worker_id, parent_conn in enumerate(parent_conns):
                next_state, reward, done, _ = parent_conn.recv()

                states[worker_id].append(state[worker_id])
                next_states[worker_id].append(next_state)
                rewards[worker_id].append(reward)
                dones[worker_id].append(done)
                state[worker_id] = next_state
                score[worker_id] += reward

                if done:
                    average, SAVING = self.PlotModel(
                        score[worker_id], self.episode)
                    # Append the socres to a list for plots
                    self.score_list.append(score)
                    self.avg_score_list.append(average)
                    print("episode: {}/{}, worker: {}, score: {}, average: {:.2f} {}".format(
                        self.episode, self.EPISODES, worker_id, score[worker_id], average, SAVING))
                    self.writer.add_scalar(
                        f'Workers:{num_worker}/score_per_episode', score[worker_id], self.episode)
                    self.writer.add_scalar(
                        f'Workers:{num_worker}/learning_rate', self.lr, self.episode)
                    self.writer.add_scalar(
                        f'Workers:{num_worker}/average_score',  average, self.episode)
                    score[worker_id] = 0
                    if(self.episode < self.EPISODES):
                        self.episode += 1

            for worker_id in range(num_worker):
                if len(states[worker_id]) >= self.Training_batch:
                    self.replay(states[worker_id], actions[worker_id], rewards[worker_id],
                                dones[worker_id], next_states[worker_id], logp_ts[worker_id])

                    states[worker_id] = []
                    next_states[worker_id] = []
                    actions[worker_id] = []
                    rewards[worker_id] = []
                    dones[worker_id] = []
                    logp_ts[worker_id] = []

        # terminating processes after a while loop
        works.append(work)
        for work in works:
            work.terminate()
            print('TERMINATED:', work)
            work.join()
    """This function is to test the agent by loading an trained model"""
    def test(self, model_name=None, test_episodes=10):  # evaluate

        test_score=[]
        test_avg_score=[]
        self.load(model_name)
        for e in range(1,11,1):
            state = self.env.reset()
            state = np.reshape(state, [1, self.state_size[0]])
            done = False
            score = 0
            while not done:
                self.env.render()
                action = self.Actor.predict(state)[0]
                state, reward, done, _ = self.env.step(action)
                state = np.reshape(state, [1, self.state_size[0]])
                score += reward
                if done:
                 
                    average, SAVING = self.PlotModel(score, e, save=False,test=True)

                    test_score.append(score)
                    test_avg_score.append(average)
                    print("episode: {}/{}, score: {}, average{}".format(e,
                                                                        test_episodes, score, average))
                    break
        return test_score,test_avg_score
        self.env.close()

Input 24 environment factors as shown above. Output 4 actions of the joint of the leg.

In [ ]:
# newest gym fixed bugs in 'BipedalWalker-v2' and now it's called 'BipedalWalker-v3'
env_name = 'BipedalWalker-v3'
agent = PPOAgent(env_name)
# agent.run_multiprocesses(num_worker = 16)  # train PPO multiprocessed (fastest)
/usr/local/lib/python3.6/dist-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 24)]              0         
_________________________________________________________________
dense (Dense)                (None, 512)               12800     
_________________________________________________________________
dense_1 (Dense)              (None, 256)               131328    
_________________________________________________________________
dense_2 (Dense)              (None, 64)                16448     
_________________________________________________________________
dense_3 (Dense)              (None, 4)                 260       
=================================================================
Total params: 160,836
Trainable params: 160,836
Non-trainable params: 0
_________________________________________________________________
None
In [ ]:
agent.run_batch()  # train as PPO
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
episode: 1/150, score: -96.70841142549942, average: -96.71 
episode: 2/150, score: -96.18213878019124, average: -96.45 
episode: 3/150, score: -113.60274577007492, average: -102.16 
episode: 4/150, score: -101.64186234751968, average: -102.03 
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:2325: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  warnings.warn('`Model.state_updates` will be removed in a future version. '
episode: 5/150, score: -99.29335295846565, average: -101.49 
episode: 6/150, score: -54.289581144686785, average: -93.62 
episode: 7/150, score: -111.1148422555082, average: -96.12 
episode: 8/150, score: -100.79285623404569, average: -96.70 
episode: 9/150, score: -118.49688563175502, average: -99.12 
episode: 10/150, score: -116.84897950473615, average: -100.90 
episode: 11/150, score: -102.83115818841067, average: -101.07 
episode: 12/150, score: -99.35755676393403, average: -100.93 
episode: 13/150, score: -98.00540781214987, average: -100.71 
episode: 14/150, score: -98.81116460797628, average: -100.57 
episode: 15/150, score: -94.31076760484083, average: -100.15 
episode: 16/150, score: -114.88040928867733, average: -101.07 
episode: 17/150, score: -117.1851804896435, average: -102.02 
episode: 18/150, score: -113.51343967227676, average: -102.66 
episode: 19/150, score: -96.21292738178242, average: -102.32 
episode: 20/150, score: -96.6538930725354, average: -102.04 
episode: 21/150, score: -117.72958373634913, average: -102.78 
episode: 22/150, score: -101.54474026153552, average: -102.73 
episode: 23/150, score: -110.18934037026344, average: -103.05 
episode: 24/150, score: -98.7447069133161, average: -102.87 
episode: 25/150, score: -99.16410339545213, average: -102.72 
episode: 26/150, score: -41.72864683174557, average: -100.38 
episode: 27/150, score: -116.5500237663118, average: -100.98 
episode: 28/150, score: -120.53932362096175, average: -101.68 
episode: 29/150, score: -47.71365561965456, average: -99.82 
episode: 30/150, score: -97.2008611457684, average: -99.73 
episode: 31/150, score: -96.6904639768208, average: -99.63 
episode: 32/150, score: -43.00103037482685, average: -97.86 
episode: 33/150, score: -101.5663062657105, average: -97.97 
episode: 34/150, score: -49.153986008293636, average: -96.54 
episode: 35/150, score: -45.85089670497923, average: -95.09 
episode: 36/150, score: -118.63094899045595, average: -95.74 
episode: 37/150, score: -120.49061176457201, average: -96.41 
episode: 38/150, score: -47.354970735713806, average: -95.12 
episode: 39/150, score: -101.38259739623862, average: -95.28 
episode: 40/150, score: -104.18275495562312, average: -95.50 
episode: 41/150, score: -95.22715396012494, average: -95.50 
episode: 42/150, score: -41.76074662540236, average: -94.22 
episode: 43/150, score: -40.21312322501921, average: -92.96 
episode: 44/150, score: -39.018339532597174, average: -91.74 
episode: 45/150, score: -117.62723398064443, average: -92.31 
episode: 46/150, score: -111.2118221047278, average: -92.72 
episode: 47/150, score: -43.453362871378594, average: -91.67 
episode: 48/150, score: -112.21684993336821, average: -92.10 
episode: 49/150, score: -109.47352497494856, average: -92.46 
episode: 50/150, score: -49.164164830551236, average: -91.59 
episode: 51/150, score: -42.064308911314086, average: -90.50 
episode: 52/150, score: -115.63458838593381, average: -90.89 
episode: 53/150, score: -36.85976239351413, average: -89.35 
episode: 54/150, score: -115.02918565940256, average: -89.62 
episode: 55/150, score: -38.50652895110032, average: -88.40 
episode: 56/150, score: -104.2008215817748, average: -89.40 
episode: 57/150, score: -104.228954791218, average: -89.26 
episode: 58/150, score: -97.02094415140718, average: -89.19 
episode: 59/150, score: -45.23199659353394, average: -87.72 
episode: 60/150, score: -96.23112798249724, average: -87.31 
episode: 61/150, score: -37.61888752207884, average: -86.01 
episode: 62/150, score: -110.70428077869882, average: -86.23 
episode: 63/150, score: -99.24699722987859, average: -86.26 
episode: 64/150, score: -49.50889109305066, average: -85.27 
episode: 65/150, score: -118.76896291552752, average: -85.76 
episode: 66/150, score: -119.07713186652823, average: -85.85 
episode: 67/150, score: -37.32554036355647, average: -84.25 
episode: 68/150, score: -96.87761101820882, average: -83.92 
episode: 69/150, score: -39.3864695971982, average: -82.78 
episode: 70/150, score: -97.29812077142276, average: -82.79 
episode: 71/150, score: -39.36934934105836, average: -81.22 
episode: 72/150, score: -32.69496490019245, average: -79.85 
episode: 73/150, score: -119.0452131734378, average: -80.02 
episode: 74/150, score: -98.980967322988, average: -80.03 
episode: 75/150, score: -35.177354813434214, average: -78.75 
episode: 76/150, score: -113.98942794442966, average: -80.20 
episode: 77/150, score: -45.34790752278158, average: -78.77 
episode: 78/150, score: -100.04588648006018, average: -78.36 
episode: 79/150, score: -41.47041120191249, average: -78.24 
episode: 80/150, score: -93.7047017647494, average: -78.17 
episode: 81/150, score: -97.03527183564768, average: -78.17 
episode: 82/150, score: -41.411318335954974, average: -78.14 
episode: 83/150, score: -96.15832290010428, average: -78.03 
episode: 84/150, score: -114.29385686089928, average: -79.34 
episode: 85/150, score: -99.0547230617013, average: -80.40 
episode: 86/150, score: -119.13795189839054, average: -80.41 
episode: 87/150, score: -103.26713849332566, average: -80.07 
episode: 88/150, score: -46.22125914818613, average: -80.04 
episode: 89/150, score: -110.27894172466561, average: -80.22 
episode: 90/150, score: -98.96477691487357, average: -80.12 
episode: 91/150, score: -97.20052025992126, average: -80.16 
episode: 92/150, score: -105.94575085022356, average: -81.44 
episode: 93/150, score: -115.85148270723191, average: -82.95 
episode: 94/150, score: -111.1536197435532, average: -84.40 
episode: 95/150, score: -38.46289075359567, average: -82.81 
episode: 96/150, score: -112.74146090703186, average: -82.84 
episode: 97/150, score: -111.8406425027287, average: -84.21 
episode: 98/150, score: -111.39535205116383, average: -84.19 
episode: 99/150, score: -39.19556386695167, average: -82.79 
episode: 100/150, score: -114.44126641299457, average: -84.09 
episode: 101/150, score: -102.14005835054287, average: -85.30 
episode: 102/150, score: -44.51500879150836, average: -83.87 
episode: 103/150, score: -40.19666506362845, average: -83.94 
episode: 104/150, score: -106.8481798934904, average: -83.78 
episode: 105/150, score: -44.49808489750161, average: -83.90 
episode: 106/150, score: -99.8341793603597, average: -83.81 
episode: 107/150, score: -93.92901936662139, average: -83.60 
episode: 108/150, score: -42.14336574377958, average: -82.51 
episode: 109/150, score: -113.09653661923912, average: -83.86 
episode: 110/150, score: -40.887024032281126, average: -82.76 
episode: 111/150, score: -102.11927353567881, average: -84.05 
episode: 112/150, score: -103.20148206397238, average: -83.90 
episode: 113/150, score: -111.34206312279413, average: -84.14 
episode: 114/150, score: -45.68279900964141, average: -84.06 
episode: 115/150, score: -37.84229149659506, average: -82.44 
episode: 116/150, score: -39.97656299342217, average: -80.86 
episode: 117/150, score: -44.44555974831816, average: -81.00 
episode: 118/150, score: -96.17620413790459, average: -80.99 
episode: 119/150, score: -37.34592598138078, average: -80.95 
episode: 120/150, score: -99.37186245210863, average: -80.99 
episode: 121/150, score: -119.86163675811495, average: -82.60 
episode: 122/150, score: -96.34688917088654, average: -83.87 
episode: 123/150, score: -95.98478638466523, average: -83.41 
episode: 124/150, score: -35.497720083355766, average: -82.14 
episode: 125/150, score: -97.97935588072923, average: -83.40 
episode: 126/150, score: -95.97526399111294, average: -83.04 
episode: 127/150, score: -37.01589624143312, average: -82.87 
episode: 128/150, score: -37.99071343159799, average: -81.63 
episode: 129/150, score: -35.16838369399628, average: -81.50 
episode: 130/150, score: -118.48329753338874, average: -82.00 
episode: 131/150, score: -97.28903147912601, average: -82.00 
episode: 132/150, score: -96.0992229729475, average: -83.10 
episode: 133/150, score: -104.06724106860496, average: -83.26 
episode: 134/150, score: -112.46043810785135, average: -83.22 
episode: 135/150, score: -36.125422102450536, average: -81.96 
episode: 136/150, score: -111.93682200295808, average: -81.82 
episode: 137/150, score: -98.15126568539344, average: -81.71 
episode: 138/150, score: -117.89061539936398, average: -83.15 
episode: 139/150, score: -105.63353398450879, average: -83.05 
episode: 140/150, score: -35.84701997660616, average: -81.79 
episode: 141/150, score: -102.51391573378356, average: -81.90 
episode: 142/150, score: -112.01738551718329, average: -82.02 
episode: 143/150, score: -35.69791055090391, average: -80.42 
episode: 144/150, score: -36.26693301993069, average: -78.92 
episode: 145/150, score: -45.920831834100255, average: -79.07 
episode: 146/150, score: -43.78148473434997, average: -77.69 
episode: 147/150, score: -37.45830625431095, average: -76.20 
episode: 148/150, score: -49.35629485654965, average: -74.96 
episode: 149/150, score: -40.82912168027792, average: -74.99 

Plot of the scores of the agent overtime. The average score of the agent is increasing overtime.

In [ ]:
plt.figure(1,figsize=(15,5))
plt.subplot(121)
plt.title('Line Plot of the Agent\'s Score of Each Episode')
plt.plot(range(len(agent.score_list)),agent.score_list)
# plt.annotate(f'Max Score: ({max(agent.score_list):.3f})', xy=(10,max(agent.score_list)), xytext=(10,max(agent.avg_score_list)), fontweight='bold',
#                 fontsize=15, arrowprops=dict(facecolor='black', width=3.5, headwidth=15, shrink=0))
plt.subplot(122)
plt.plot(range(len(agent.avg_score_list)),agent.avg_score_list)
plt.title('Line Plot of the Agent\'s Average Score Over Time')
plt.annotate(f'Max Average Score: ({max(agent.avg_score_list):.3f})', xy=(140,max(agent.avg_score_list)), xytext=(10,max(agent.avg_score_list)), fontweight='bold',
                fontsize=15, arrowprops=dict(facecolor='black', width=3.5, headwidth=15, shrink=0))
plt.show()
In [ ]:
# Generating the gif
anim_file = 'rl.gif'

with imageio.get_writer(anim_file, mode='I') as writer:
  filenames = glob.glob('/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/*.png')
  filenames = sorted(filenames)
  print(filenames)
  last = -1
  for i,filename in enumerate(filenames):
    frame = 2*(i**0.5)
    if round(frame) > round(last):
      last = frame
    else:
      continue
    image = imageio.imread(filename)
    writer.append_data(image)
['/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_0.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_1.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_10.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_100.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_101.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_102.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_103.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_104.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_105.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_106.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_107.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_108.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_109.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_11.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_110.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_111.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_112.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_113.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_114.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_115.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_116.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_117.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_118.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_119.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_12.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_120.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_121.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_123.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_124.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_125.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_126.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_127.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_128.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_129.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_13.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_130.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_131.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_132.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_133.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_134.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_135.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_136.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_137.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_138.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_139.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_14.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_140.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_141.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_142.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_143.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_144.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_145.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_146.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_147.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_148.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_149.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_15.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_150.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_151.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_153.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_154.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_155.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_156.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_157.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_158.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_16.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_160.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_161.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_164.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_165.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_166.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_167.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_168.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_169.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_17.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_171.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_173.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_174.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_175.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_176.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_177.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_178.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_179.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_18.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_180.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_181.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_182.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_183.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_184.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_185.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_186.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_187.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_188.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_189.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_19.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_190.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_191.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_192.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_193.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_194.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_195.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_196.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_197.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_198.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_2.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_20.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_200.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_201.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_202.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_203.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_204.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_205.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_206.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_207.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_208.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_209.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_21.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_210.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_211.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_212.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_213.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_214.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_215.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_216.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_217.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_219.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_22.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_220.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_222.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_223.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_225.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_226.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_227.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_228.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_229.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_23.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_230.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_231.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_232.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_234.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_235.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_236.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_238.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_239.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_24.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_240.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_241.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_242.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_243.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_245.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_246.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_247.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_248.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_249.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_25.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_250.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_251.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_252.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_253.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_254.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_255.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_257.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_258.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_259.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_26.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_260.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_261.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_262.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_263.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_264.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_265.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_266.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_267.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_268.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_269.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_27.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_270.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_271.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_272.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_273.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_274.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_275.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_276.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_277.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_278.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_279.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_28.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_280.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_281.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_282.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_283.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_284.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_285.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_286.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_287.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_288.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_289.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_29.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_290.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_291.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_292.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_293.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_294.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_295.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_296.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_297.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_298.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_299.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_3.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_30.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_300.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_301.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_302.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_303.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_304.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_305.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_306.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_307.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_308.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_309.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_31.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_310.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_311.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_312.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_313.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_314.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_315.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_316.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_317.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_318.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_319.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_32.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_320.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_321.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_322.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_323.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_324.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_325.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_328.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_329.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_33.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_330.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_331.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_333.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_335.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_336.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_337.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_338.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_34.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_341.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_342.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_343.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_344.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_346.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_347.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_349.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_35.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_350.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_352.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_354.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_355.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_356.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_357.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_358.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_359.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_36.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_360.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_361.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_362.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_363.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_364.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_366.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_367.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_368.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_37.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_370.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_371.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_372.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_373.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_374.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_376.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_377.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_378.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_379.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_38.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_380.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_381.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_382.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_383.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_384.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_385.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_386.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_387.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_388.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_389.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_39.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_391.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_392.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_393.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_394.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_395.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_396.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_397.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_398.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_399.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_4.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_40.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_400.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_401.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_402.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_403.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_404.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_405.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_406.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_407.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_408.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_409.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_41.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_411.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_412.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_414.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_415.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_417.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_418.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_419.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_42.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_420.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_422.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_423.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_424.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_425.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_426.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_427.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_428.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_429.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_43.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_431.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_432.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_433.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_434.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_435.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_437.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_438.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_439.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_44.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_440.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_443.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_444.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_446.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_447.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_449.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_45.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_450.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_451.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_452.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_453.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_454.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_455.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_456.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_457.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_458.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_46.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_460.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_461.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_462.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_464.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_467.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_468.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_469.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_47.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_471.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_472.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_473.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_475.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_477.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_478.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_479.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_48.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_481.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_482.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_484.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_485.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_486.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_487.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_488.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_489.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_49.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_490.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_491.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_492.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_493.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_494.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_495.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_496.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_497.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_499.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_5.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_50.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_500.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_501.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_502.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_503.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_504.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_505.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_506.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_508.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_509.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_51.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_510.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_511.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_512.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_513.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_514.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_516.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_517.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_518.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_519.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_52.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_520.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_521.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_523.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_524.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_525.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_526.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_527.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_529.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_53.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_530.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_531.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_532.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_533.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_534.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_535.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_537.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_538.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_539.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_54.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_540.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_541.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_543.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_544.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_545.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_546.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_548.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_549.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_55.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_552.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_553.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_554.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_556.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_557.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_558.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_559.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_56.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_560.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_561.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_562.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_564.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_565.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_566.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_567.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_568.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_569.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_57.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_570.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_571.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_572.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_573.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_574.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_575.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_576.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_577.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_578.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_579.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_58.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_580.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_581.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_582.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_583.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_584.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_585.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_586.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_587.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_588.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_589.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_59.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_590.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_591.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_592.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_593.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_594.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_595.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_596.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_597.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_598.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_599.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_6.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_60.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_600.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_601.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_602.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_604.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_605.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_606.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_607.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_608.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_609.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_61.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_610.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_611.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_612.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_613.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_614.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_615.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_616.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_617.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_618.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_619.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_62.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_620.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_621.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_622.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_623.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_624.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_625.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_626.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_627.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_628.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_629.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_63.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_630.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_631.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_632.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_633.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_634.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_636.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_64.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_641.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_646.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_649.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_65.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_651.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_653.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_655.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_656.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_659.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_66.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_660.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_662.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_663.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_664.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_669.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_67.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_671.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_673.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_675.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_679.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_68.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_684.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_685.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_686.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_689.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_69.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_690.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_696.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_697.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_7.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_70.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_701.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_702.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_704.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_705.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_707.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_709.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_71.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_712.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_714.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_715.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_717.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_719.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_72.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_721.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_722.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_723.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_724.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_726.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_727.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_728.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_729.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_73.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_732.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_733.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_736.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_738.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_739.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_74.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_741.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_747.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_75.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_750.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_751.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_752.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_753.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_754.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_758.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_76.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_760.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_761.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_762.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_763.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_765.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_768.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_77.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_770.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_772.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_773.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_775.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_776.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_777.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_78.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_780.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_783.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_784.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_785.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_786.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_787.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_788.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_79.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_790.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_791.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_792.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_794.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_796.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_798.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_799.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_8.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_80.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_800.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_801.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_803.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_805.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_808.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_81.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_810.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_813.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_814.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_816.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_819.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_82.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_823.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_824.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_826.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_828.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_829.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_83.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_831.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_832.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_836.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_84.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_840.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_841.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_844.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_845.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_85.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_86.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_87.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_88.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_89.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_9.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_90.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_91.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_92.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_93.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_94.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_95.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_96.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_97.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_98.png', '/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/trainning_process/BipedalWalker-v3_99.png']

Display the scores over training time

Visualize Model Performance Overtime

This function plots the testing scores for different models passed in.

In [ ]:
def test_model_performance(model_name):
    test_score,test_avg_score=agent.test(model_name)
    plt.figure(1,figsize=(20,5))
    plt.subplot(121)
    plt.title('Testing Scores of the Agent Over Epochs')
    plt.plot(range(len(test_score)),test_score,label='Score Per Epoch')
    plt.legend()
    plt.subplot(122)
    plt.title('Average Testing Scores of the Agent Over Epochs')
    plt.plot(range(len(test_avg_score)),test_avg_score,label='Avg Score Over Epochs')
    plt.legend()
    plt.show()

At 5th epoch.

In [ ]:
test_model_performance('model_progress/50')
episode: 1/10, score: -100.16129921944625, average-82.5226103131212
episode: 2/10, score: -100.17864369016576, average-82.48379771621096
episode: 3/10, score: -100.2906102154969, average-82.42558027924144
episode: 4/10, score: -112.74838402273879, average-82.45370669724034
episode: 5/10, score: -112.63375520121306, average-83.79272582107177
episode: 6/10, score: -112.19177047617362, average-85.27971540066333
episode: 7/10, score: -112.20927829441925, average-86.72436970668326
episode: 8/10, score: -112.15187351739779, average-88.07849598206487
episode: 9/10, score: -112.05237711293995, average-88.39601944156558
episode: 10/10, score: -111.92986941251966, average-89.88769831018838
In [ ]:
test_model_performance('model_progress/125')
episode: 1/10, score: -93.42149028310862, average-89.7686908668084
episode: 2/10, score: -93.328944255072, average-89.23803701674751
episode: 3/10, score: -93.41191388111748, average-89.17933751095214
episode: 4/10, score: -93.45567024194511, average-89.12875518809773
episode: 5/10, score: -93.46355074169611, average-90.28807180126452
episode: 6/10, score: -93.46217831168697, average-90.1977282498837
episode: 7/10, score: -93.38528482103597, average-90.14592866648216
episode: 8/10, score: -93.47303060712169, average-91.27507135379594
episode: 9/10, score: -93.45067157081887, average-92.38427051658034
episode: 10/10, score: -93.45510286536688, average-93.55000490000775

Display the game envirnment for visualizations. While training, the model tries to get maximum award (balance himself and move).

In [ ]:
Video("/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/train_video/real/openaigym.video.1.7332.video000000.mp4",embed=True)
Out[ ]:
In [ ]:
Video("/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/train_video/real/openaigym.video.1.7332.video000008.mp4",embed=True)
Out[ ]:
In [ ]:
Video("/content/drive/MyDrive/Deep-Learning-CA2/RL/RL/train_video/real/openaigym.video.1.7332.video000064.mp4",embed=True)
Out[ ]: