For the interested people: We base all on our work on the "Causal Entropic Forces" by Alexander Wissner-Gross and apply the ideas outlined in the G.A.S. algorithm. We are not actually learning in any way, so all games are independent from each other and first-ever played game is as godd as the 100th game.
First, we list the already finished environments. They include 100 games played and an official scoring in OpenAI gym as the average of those 100 games:
1. Atari - MsPacman-ram-v0: average score of 11.5k vs 9.1k (x1.2)
This was the first env. to be finished and uploaded, so it represent our first official record. We decided to use the "ram" version (instead of the image version) because it is irrelevant for our algorithm but not for a more standard approach, so we had an extra punch.
The main issue here was a dead Pacman takes about 15 frames to be noticeable on screen (there is a short animation) so you need to think in advance at least those 15 frames (ticks) in order to start detecting death.
2. Atari - Qbert-ram-v0: avg. score of 18.4k vs 4.2k (x4.3)
Next one was Qbert, just because it solved quickly. Here action is not so fast, once you decide to jump left, it takes a considerable amount of frames for the decision to complete so you can take a second decision. That is why, in this case, we scan one frame every two (Pacman used all frames).
Note: One "frame" is internally composed of 2 consecutive frames as the atari screen seems to use an interlaced schema (odd frames only fill odd lines, line in PAL or SECAM tv standards).
The rest of the cases have not completed the mandatory 100 games, it takes too long to make it (Guillem's laptop is causing about 1.3% of the actual global warming ;-) so they do not have an official scoring. Instead, I will use the aprox. average of completed games.
3. Atari - MsPacman-v0: avg. score (4 games) of ~9k vs 6.3k (x1.4)
This time it is quite difficult to compare with others algorithms, as beating the embeded "old days AI" is quite complicated for a general algorithm (it was designed to play against human level players after all): most of the algorithms in OpenAI scored just 0.0, and only one was able to marginally win (due to the biased scoring system of picking the best 100 episodes, so the more you play, the higest your score gets).
Scoring x800 times as good as the second one is, in this case, just meaning "I am the fisrt and only algorithm actually solving it, so you can not compare me".
5. Atari - VideoPinball-ram-v0: avg. score (21 games) of ~500k vs 28k (x17.8)
The CPU power we applied here made the AI almost unbeatable, with about +50% it would never die.
6. Classic Control - CartPole-v0: avg. score (10 games) of 0.0 vs 0.0 (x1)
This one is not froman atari game, so not really in the list, but is a classic example we wanted to share anyhow.
The scoring here is weird, you only need to survive for a given amount of time (about 4 seconds) to get 200 points and win this game. You solved the game when the average of the last 100 games is above 195 points, and the number of games/episodes you played before those 100 "good ones" is your showed score.
So a score of 0.0 means that your first 100 games averaged above 195 and the game was cosidered solved. You can not get more that this, so here the absolute record was already reached.
That is all by now, I will be updating this post to reflect any new acomplishment in our quest to solve as many environments form OpenAI as we can.