Entropic and Fractal Intelligence: Solved atari games

Wednesday 28 June 2017

Solved atari games

The list of environments from OpenAI that we have already played so far is steadily growing, so I had to made a list to keep track of them. Here we keep and share this growing list, along with it scorings and how it compares with the "second-best" algorithm in the OpenAI gym.

For the interested people: We base all on our work on the "Causal Entropic Forces" by Alexander Wissner-Gross and apply the ideas outlined in the G.A.S. algorithm. We are not actually learning in any way, so all games are independent from each other and first-ever played game is as godd as the 100th game.

First, we list the already finished environments. They include 100 games played and an official scoring in OpenAI gym as the average of those 100 games:

1. Atari - MsPacman-ram-v0: average score of 11.5k vs 9.1k (x1.2)

This was the first env. to be finished and uploaded, so it represent our first official record. We decided to use the "ram" version (instead of the image version) because it is irrelevant for our algorithm but not for a more standard approach, so we had an extra punch.

The main issue here was a dead Pacman takes about 15 frames to be noticeable on screen (there is a short animation) so you need to think in advance at least those 15 frames (ticks) in order to start detecting death.

2. Atari - Qbert-ram-v0: avg. score of 18.4k vs 4.2k (x4.3)

Next one was Qbert, just because it solved quickly. Here action is not so fast, once you decide to jump left, it takes a considerable amount of frames for the decision to complete so you can take a second decision. That is why, in this case, we scan one frame every two (Pacman used all frames).

Note: One "frame" is internally composed of 2 consecutive frames as the atari screen seems to use an interlaced schema (odd frames only fill odd lines, line in PAL or SECAM tv standards).

The rest of the cases have not completed the mandatory 100 games, it takes too long to make it (Guillem's laptop is causing about 1.3% of the actual global warming ;-) so they do not have an official scoring. Instead, I will use the aprox. average of completed games.

3. Atari - MsPacman-v0: avg. score (4 games) of ~9k vs 6.3k (x1.4)

Our first ever test was done on this environment. After 4 games the results where too evident to waste more CPU (adding more resources we could beat it at any level we wish, at the expense of CPU time) so we moved on to the "ram" version to explore.

4. Atari - Tennis-ram-v0: avg. score (1 game) of ~8 vs 0.01 (x800)

This time it is quite difficult to compare with others algorithms, as beating the embeded "old days AI" is quite complicated for a general algorithm (it was designed to play against human level players after all): most of the algorithms in OpenAI scored just 0.0, and only one was able to marginally win (due to the biased scoring system of picking the best 100 episodes, so the more you play, the higest your score gets).

Scoring x800 times as good as the second one is, in this case, just meaning "I am the fisrt and only algorithm actually solving it, so you can not compare me".

5. Atari - VideoPinball-ram-v0: avg. score (21 games) of ~500k vs 28k (x17.8)

The CPU power we applied here made the AI almost unbeatable, with about +50% it would never die.

6. Classic Control - CartPole-v0: avg. score (10 games) of 0.0 vs 0.0 (x1)

This one is not froman atari game, so not really in the list, but is a classic example we wanted to share anyhow.

The scoring here is weird, you only need to survive for a given amount of time (about 4 seconds) to get 200 points and win this game. You solved the game when the average of the last 100 games is above 195 points, and the number of games/episodes you played before those 100 "good ones" is your showed score.

So a score of 0.0 means that your first 100 games averaged above 195 and the game was cosidered solved. You can not get more that this, so here the absolute record was already reached.

That is all by now, I will be updating this post to reflect any new acomplishment in our quest to solve as many environments form OpenAI as we can.

8 comments:

Unknown4 July 2017 at 11:39
I would like to know how well this performs against AIXI (or more precisely, an approximation to incomputable AIXI). The CTW approximated AIXI is also a general algorithm without need to train internal parameters. It can also solve toy problems such as PacMan and has a very grounded, albeit different, theoretical basis.
ReplyDelete
Replies
Unknown7 November 2017 at 13:01
Hi
All things considered, great techniques to inexact incomputable SI would be an intriguing exploration point. A credulous one would be Levin seek, at that point there's Hutter Hunt and Juergen Schmidhuber's Uh oh and Goedel Machine. Be that as it may, these models are fundamentally of hypothetical intrigue. Practically speaking they generally simply take care of some toy issues and neglects to scale.
By and by I think specifically looking in the space of all Turing Machine tend is recalcitrant. However the rule of hunting down straightforward models could apply to other world modelers.

Regards,
AI
ReplyDelete
Replies
Unknown12 January 2018 at 05:05
very impressive work and would like to learn more. saw your earlier GAS paper but it didnt seem to be connected to games. looks like you havent written up the game results in a paper yet? lots to wonder about eg the sensory features etc.

it appears to me we are looking into similar ideas, from different origins. it is a struggle to come up with agreed on terminology at this point. think there is some shared vocabulary emerging for some new previously unobserved phenomena. anyway heres a sketch of a theory for AGI such that it looks like some of your work has already confirmed some of the hypotheses therein. will be checking back on your work/ progress!

https://vzn1.wordpress.com/2018/01/04/secret-blueprint-path-to-agi-novelty-detection-seeking/
ReplyDelete
Replies

Add comment

Pages

Wednesday 28 June 2017

Solved atari games

8 comments: