Wednesday 27 May 2015

4 Fractal Ninjas

Tonight four of my new "fractal minded" rockets have been playing "Ninja fight". As before, you can compare it with the "linear/entropic" version of the algorithm here.

The rules are simple: If rocket "A" touch "B" with the tip of its main thruster flame, it takes energy from it.



There is no new "goals", nothing instructed them to avoid other's flames or use theirs to fight and gain energy, in the same way no one told them how to land to fill the tank: they decided what to do and how to behave by their own, just based on how good/bad is getting or loosing energy ( abasic goal all players have).

The resulting behaviour, in my highly biased opinion (IMHBO?), is near perfect. The parameters were: 300 futures to scan the future consecuences, 20 seconds ahead thinking. It was not quite too much, yellow one needed a little more to survive a hard situation, and when at the end of the video I relaxed it to 200f 15s, the winner ended up crashing. My fault!

I will try again today with more futures, 500 will be a sweet point I spect, so I can check "visually" for the behaviour and determine if it "looks optimal" to me or not.

This is a very hard problem for any AI, they follow two really conflicting goals: take energy from others to avoid starving, but avoid them from killing you.

As they all share the exact same parameters, no one is better than other. I think it is a extreme MCDM (Multiple Criteria Decision Making) problem, so solving it right is a way to show an efective, general way to crack those family of (basically open) problems.

In the last third of the video, I switch on the visual debuging options for the white rocket, showing overimpossed the traces of the paths it is thinking about (fractal paths) and red and green spots where the AI predicts good or bad outcomes.
Good outcomes correspond to paths in witch it can take energy from others, as gaining energy is a basic goal, while red dots correspond to event in witch rocket loose energy (other rocket hit him) or places where it crash with the walls.

It was a pitty I relaxed the AI parameters after only one was left. I wanted it to land just to have a "nice ending" for the video, but I shouldn't had relaxed it at all, as it was slightly too low for it to land. My condolences for the rocket family.

I have also keept some frames of the video so you can inspect with some details what is going on under the hood.

Image 1 - This first image show the white rocket lose/gain feelings. As commented, red spots correspond to expected future loses, so it will actively avoid them, while green ones are gain feelings (getting energy is always a gain for it), so it will actively try to get to the green zones.

Image 1 - Gain and Lose Feelings
Image 2 - This one is quite a mesh! On top of the red/green feelings sopts, you see fractal paths the intelligence is following. The most important thingto notice here is fractal gets more dense in green areas, so better options are scanned more deeply in a very natural way.

Image 2 - Fractal paths

Image 3 - This one is interesting as it shows how, in the hardest situations, like here where most probably it will crash into ground if it doesn't react quickly, the fractal shape adapts. Compare it with the previous "relaxed" frame to spot the differences.

Basically, the branching/cloning proccess has speed up a lot, meaning the fractal bifurcate much more times per second. Each bifurcation is marked with a red and black dot, and this is what makes the paths in this frame so dense: they are bifurcating almost at every pixel.

These "adaptative" behaviour of the fractal itself is key to make a fractal algorithm do someting useful, but not only the cloning proccess needs to be dynamically balanced, there are other parameters that need something similar.
 
Image 3 - Stressing the rocket
Before going on with the algorithm developement (I have some ideas to expand it a little further into... consciousness?), I plan to make some more videos: A better fight scenario (droping bombs may be), a cooperative scenario (a hive of bees working for the community and fighting hive enemies by their own).

Ah! And I still have to show you all a little about the "quantum physics" fractal, one that tries to mimic the QED of the friend Feynman. It is quite out of my quantum physics level of understanding, so don't spect "real physic behaviour", just something close to it (watching Feynman lectures did help just a little).

4 comments:

  1. In this example, how are the different rockets modelling the future states of their competitors? That is, how are they predicting what their competitors will do? Do they just take the competitor's current action state as fixed when predicting? Clearly you can't have an infinite regress where one ship predicts another ship which is predicting the first ship and they are all taking actions based on this stack of predictions.

    ReplyDelete
    Replies
    1. They cheat. Well, I do I suppose!

      A and B are imagining their sets of n next states forming their n future paths in a set of n "shared envs", so they see what the other is doing in that particular future path.

      Then, after each step of the paths, each rocket decides to clone states or not in turns, so when A clones its i future to j, so does B, but for B it is just a random clone, noise.

      Actually, it means A is seeing B performs what is better for B, so A decides its best knowing B will do the same, and vice versa. MinMax standard approach, but easier.

      Without cheating, A would need to predict B moves, so A would need a NN equivalent to learn to do that in a standard RL schema, but I can take best actions for A and B simultaneously, even when they fight, in that simple way.

      It is more beneficial when A and B wants the same and cooperate, so simulations where they collaborate are more relevant and yield way better results.

      Btw, making future i of agent A be shared with B makes detecting future collisions in a probabilistic way as easy as detecting simple pair collision and killing both. They then clone to different futures where collision never hapened and voila, they repel and avoid colliding for free.

      Delete
  2. I see, thanks for replying so fast.

    So let me see if I understand what is going on. Essentially, the different ships are all being controlled by one single instance of the fractal algorithm, which just changes its reward function to that of the given ship when it is deciding an action for that ship, taking turns?

    I'm trying to adapt the github code to play a board game (ultimate tic tac toe), and the principal difficulty is in modelling what my opponent does after FAI takes an action. Right now I have it taking a random action (basically the traditional MCTS approach), but I'm wondering if the approach used here can be adapted. If you have any ideas, let me know! Have you ever tried to use FAI to play board games? Or, rather, I already have a MCTS (in fact, UCT) that plays this game. In what way can I use the principles of fractal monte carlo to improve on MCTS?

    ReplyDelete
    Replies
    1. Exactly as you said. When it is B turn, B moves and A stay, then B clones whole states including A positions, then A does the same and so on. A sigle fmc plays both sides but cloning depending on B rewards or A rewards depending on who moved last.

      Let me know how it works!

      Delete