18 Jan 2018 |
World innovation news |
Information and Communications Technologies
Measuring the Historical Impact of AlphaGO vs the World Champion
The ancient Chinese game of Go has simple rules: two players take turns to place black or white stones on a 19 x 19 board, trying to capture the opponent’s stones or surround empty space to score points. This game is considered one of the most challenging for artificial intelligence (AI) due to its massive search space, the board positions, and moves available. In March 2016, for the first time in history, Lee Sedol, a professional Go player was defeated by a computer agent called AlphaGo, created by Google’s DeepMind.
AlphaGo requires some knowledge about the game tree to play effectively which means that given a particular position on the board, it analyzes the different legal moves it can make to win. In other words, if the game agent can efficiently go through the game tree, then it can “decides” which move is optimal to win the game. The figure 1 below is an example of a game tree for the first 2 plays of the game tic-tac-toe.
At every stage of the game, a player can choose between many possible moves ̶ approximately 250 ̶ and a typical game can be completed in around 150 moves. The number of possible games where all positions are legal has been estimated at approximately 10170 . So what makes Go challenging for AI? The number of possible configurations on the Go board is massive; it is greater than the number of atoms in the universe ̶ approximately ~1080 . Therefore, it is impossible for the agent to analyze all of the available plays in reasonable time.
Google DeepMind: Ground-breaking AlphaGo masters the game of Go
The DeepMind team created a new search algorithm by combining an advanced search tree called Monte Carlo Tree Search (MCTS) with two deep neural networks. MCTS is a heuristic search algorithm which helps the game agent during the decision process. It focuses only on the branches that are more promising in achieving a victory. As a result, the agent does not have to go through each branch of the search tree. As for the neural networks, they use the pattern on the Go board as input and process it through a number of different network layers. One of the neural networks called “policy network”, selects the next move to play, whereas the second neural network called “value network”, evaluates the board positions to predict the winner of the game .
More specifically, the DeepMind team feed AlphaGo a large number of strong amateur games to help it develop its own understanding of how humans play the game. Also, the agent played against different versions of itself a multitude of times. Each time, the agent learned from its mistakes and improved until it became stronger and better. This process is called reinforcement learning.
The DeepMind team published a paper explaining this original approach in greater technical detail.
To conclude, in March 2016, Lee Sedol’s 1-4 defeat against AlphaGo was watched by over 200 million people worldwide. Experts agreed that the event was a landmark in Artificial Intelligence since the achievement occurred a decade ahead of its time.
Marie-Anne Valiquette obtained a Bachelor's degree in Mechanical Engineering at the École de technologie supérieure (ÉTS) in Montreal. She lives in Silicon Valley, California where she studies artificial intelligence through online platforms like Udacity and deeplearning.ai.
Program : Mechanical Engineering