Saturday, 8 August 2020

AI playing battleship.

 The game battleship is a two-player game where each player places 5 ships (a size 2 ship, two size three ships, a size 4 ship and a size 5 ship) on a 10 by 10 grid, without showing their configuration to the other players. Then, players take turns each selecting a grid cell, and the other player communicates back if this is a miss or a hit, and if it is a hit, if it fully sinks a ship. The first player that fully sinks all other ships wins.

In this blog post, we will only consider the method to sink the other player's ships as fast as possible, we do not consider strategies to keep our own ships alive.

Intuitively, it might make sense to go for the tiles that our AI thinks have the highest chance to contain a battleship, but this is actually a version of the Multi-armed Bandit problem. Early in the game, we would want to go for squares that give us the most information, not squares that have the highest chance of hitting.

But in practice, this problem does not actually arise for two reasons. Firstly, the chance to hit a tile will be lower than one half in most cases, which means that the squares that give the most information and the squares that have the highest chance to hit align. And secondly, even if a certain tile gives us more information, we will still have to take an extra shot to score that hit later, so unless this information can save us two misses later, it will not be worth it.

In other words, one can see the action of minimising the number of shots as minimizing the number of misses, as the number of hits will always be the same (the number of ship tiles), so every time we need to make the action that minimises the chance on a miss.

Our AI will make use of the greedy strategy, which means that on every turn we will maximise the chance to hit on that turn, we will not try to take actions that maximise further chances to hit. This strategy will still be effective, because scoring hits gives us way more information then scoring misses, so hitting now will increase our chances to hit in the future.

To calculate the chance to hit, one uses the following approach. We take the set of all possible configurations of the enemy ships that have not been ruled out and count for each square how many configurations have a ship on that square divided by the total number of configurations.

It is not possible to calculate all possible states, so we will use Monte Carlo. we simulate 1000 possible configurations and calculate the chances given these simulated configurations. 

In the following video, you can see the AI in action. Each blue square is a miss, each red square is a hit, and the other squares are marked from white to green, where white is low chance to hit and green is a high chance.