vbstudio.hu

Growing an AI with NEAT

Let's grow an artificial brain! The idea came after watching this video by Seth Bling [1], and another one by Code Bullet [2]. They both implement a solution called NeuroEvolution of Augmenting Topologies, or NEAT for short. It is a genetic algorithm for the generation of evolving artificial neural networks [3].

Unlike them, I went with an existing NEAT implementation, and since my main language is C#, I chose SharpNEAT [4]. It provides a nice framework and handles all the neuro-evolution parts, so I only have to tweak a few parameters and I can focus on implementing my simulation.

Let's Get Started

My ultimate goal is to grow an artificial brain that can play Wolfenstein 3D [5]. For that, I intend to recycle my old Codename Z code and add a parser for actual Wolf3D maps. But, I figured I should start with something simple so I can have some sense of achievement in the beginning before I rage-quit. For that I made a simple tile-based maze using Randomized Prim's algorithm [6], and the "player" (or rather the agent) could only step a tile at a time, no in-between positions.

A tile based maze using Randomized Prim's algorithm [6]. "S" and "E" mark the Start and Exit. This is a "perfect" maze, meaning there's always a single path between any two points.

A tile based maze using Randomized Prim's algorithm [6]. "S" and "E" mark the Start and Exit. This is a "perfect" maze, meaning there's always a single path between any two points.

I/O Layers

Since NEAT is all about figuring out the inner connections on its own through neuro-evolution, I only need to provide the inputs, outputs, and a fitness evaluation.

The input is what the brain senses. In this case the four neighboring wall's state, whether it is a wall (1) or a free space (0) where it can go next. The output is what comes out of this neural network. It is the consequence of both the input and internal state, and tells what next step the brain chooses to take.

Only the input and output layers are defined. The inner connections and additional neurons will grow on their own by random chance and positive reinforcement.

Only the input and output layers are defined. The inner connections and additional neurons will grow on their own by random chance and positive reinforcement.

For the outputs, each value is taken as a floating point number between 0 and 1, and interpreted as true if greater than 0.5. By design, the agent does not accept conflicting commands, so only one direction can be active at a time, otherwise no movement is made.

The Simulation

During the simulation, each agent performs the same tasks, and in the end a fitness score is assigned based on its performance. The agent has no idea about its fitness during the simulation. When a whole generation of agents has been evaluated, the ones with the best scores are used to create offspring for the next generation. This happens by copying the neural network of the parent, then randomly adding or removing neurons and connections, and mutating existing ones. This way by random chance they might become better or worse than their parents.

NEAT can also handle multiple simultaneous species. These are separate neural networks that develop independently of each other, but in each generation a certain number of children can inherit properties from two different species. This way they have a chance to inherit and combine beneficial features from both parents and become more successful than them alone.

To measure the fitness I used a single metric, that is the distance to the exit. The closer it got, the better the fitness is. I got it scaled so the starting tile is 0 and the exit is 1. Some dead-ends can be further from the exit than the start, producing a negative value, so those values are clipped to 0 and a fitness score will always be between 0 and 1.

Since most agents will not make it out, I have to cut the simulation short. Each agent has 750 steps to finish the maze, otherwise they are terminated, and a fitness score is assigned based on their position at termination.

First Generations

I started with a population of 1000, and default genome parameters (mutation rate). The very first generation got it figured out that it needs to go right and down to advance, and then got stuck there for the next some thousand generations.

The exit is placed at the bottom-right corner and the agent starts in the opposite corner. The first generation could only go in the general direction of the exit, down and right. The light green tint marks the fitness values for each tile, the greener the better.

The exit is placed at the bottom-right corner and the agent starts in the opposite corner. The first generation could only go in the general direction of the exit, down and right. The light green tint marks the fitness values for each tile, the greener the better.

1st Milestone: Discovering Up and Left

The first evolutional milestone came several hours later, when the agent figured out how to go the other two directions, and started finishing most of the mazes.

Goes in the general direction of the exit, also turns back when hitting a dead end, but it just cannot figure out how to continue on the right path.

Goes in the general direction of the exit, also turns back when hitting a dead end, but it just cannot figure out how to continue on the right path.

At this point, it only evaluated each agent against 20 mazes that were random each time (not just for each generation but for each agent). I realized it was not doing much good, since some mazes are just a straight line across while others require backtracking, which made the fitness based on the probability of getting simpler mazes, rather than actual performance. To fix this, I specified a fixed seed for each maze, and also raised the number of mazes from 20 to 100.

While I was at it I figured I might as well cache it too, along with the fitness value (represented by the green tiles). That sped up the simulation significantly.

This change also meant I had to start over with a fresh population, but this time around it was much faster because of the caching. With the discovery of up and left, the best agents in each generation could finish about 70% of the mazes.

2nd Milestone: Full Backtracking

That 70% completion got up by 1% every 100 to 200 generations until it was done. Now, it does not mean it can solve every possible maze, only that it can solve this particular 100. When tested on a million mazes, I got a 98.52% success rate.

Here are some of the results:

Wonderful backtracking there. I deliberately picked the ones with the most backtracking, since these scenarios were those that took the most time for the algorithm to learn. Each generation being able to backtrack a little bit more, while still being able to finish the mazes it could do before.

Wonderful backtracking there. I deliberately picked the ones with the most backtracking, since these scenarios were those that took the most time for the algorithm to learn. Each generation being able to backtrack a little bit more, while still being able to finish the mazes it could do before.

Another one with backtracking.

Another one with backtracking.

Conclusions

Backtracking was ultimately the most time consuming feature to learn, slowly extending how much it can spring back after hitting a dead end. It was interesting to see whenever the process was struggling to develop that next milestone, the number of extra neurons and connections got up. And when the problem was solved, it gradually got simpler. This simplification and complexification is a key feature of NEAT, that instead of having a predefined number of neurons and connections, it "grows" to the complexity of the task. Take a look at these two examples:

The neural network while struggling to figure out backtracking. (Greens are inputs, pinks are outputs. The 5th input is the bias)

The neural network while struggling to figure out backtracking. (Greens are inputs, pinks are outputs. The 5th input is the bias)

Final simplified neural network with working solution.

Final simplified neural network with working solution.

Interestingly it did not figure out the easiest solution, that is to stick to either the left or right wall, and just track along that. My theory is, that it would require a different input/output principle, where the agent has a looking direction, and every sense and movement is relative to that.

SharpNEAT provides a few parameters to tweak regarding the rate of the mutation, number of simultaneous species and the number of cross-species offspring in each generation. Keeping the mutation rate low has the risk of the offspring not being able to mutate enough in one generation to make a difference in the fitness score, while having a higher mutation rate has the risk of generating more defective offspring and possibly devolve the species. During the simulations I constantly changed the values, but these are the ones I generally used:

  • Number of species: 30
  • Interspecies mating: 0.01% - 0.1%
  • Mutate connection weights: 60% - 80% (chance of mutating connections between neurons)
  • Mutate add neuron: 0.01% - 0.25% (chance of adding a new neuron)
  • Mutate add connection: 0.01% - 0.25% (chance of adding a new connection between two neurons)
  • Mutate delete connection: around half of the previous value

It's hard to tell how many generations it ultimately took to finish, because I stopped and tweaked and resumed it so many times, but it's somewhere around 5,000. Maybe for the next one I add a continuing counter.

Source Code

You can download the code for the entire project here, as well as the genomes for the final population:

Links

  1. MarI/O - Machine Learning for Video Games by SethBling
  2. A.I. LEARNS to Play Hill Climb Racing by Code Bullet
  3. NeuroEvolution of Augmenting Topologies (NEAT) on Wikipedia
  4. SharpNEAT on sourceforge.net
  5. Wolfenstein 3D on Steam
  6. Randomized Prim's algorithm on Wikipedia