22 Jan 2018 |
World innovation news |
Intelligent and Autonomous Systems
AlphaGo Zero – Discovering Knowledge Without Human Intervention
The previous article, “Measuring the Historical Impact of AlphaGo vs the World Champion,” explains the reason why DeepMind’s AlphaGo game agent is such an important win. On October 18, 2017, Google’s team DeepMind unveiled the latest version of the AlphaGo Zero program. The improved program is a significantly better player than the version that beat the game’s world champion in March 2016 but, more importantly, it is entirely self-taught.
The DeepMind team fed the original game agent, AlphaGo, with data from hundreds of thousands of games played by human experts. Conversely, AlphaGo Zero, also developed by DeepMind, started with nothing but a blank board and game rules. For three days, it learned “simply” by playing millions of games against itself, using what it learned in each game to improve. In games against the 2015 version, which famously beat South Korean grandmaster Lee Sedol, AlphaGo Zero won 100 to 0 in the following year .
According to DeepMind, in three hours the program played as a human beginner. It had tried a greedy strategy: capturing as many stones as possible. In 19 hours, it learned the fundamentals of more advanced Go strategies, and in 70 hours, it played at superhuman levels.
The program used a form of reinforcement learning, where it was its own teacher, and started with a neural network that knew nothing about the game of Go. Then, it played against itself, and the neural network tuned itself and updated to predicting moves. In each iteration, the updated neural recombined with the search algorithm to create an upgraded version of itself. The performance improved each time and got stronger and better . As mentioned, it started as an amateur player and, as it played against itself, it learned strategic moves used by an expert player. After three days, the program had discovered brand new moves ̶ unconventional strategies that today’s players are now studying. It rediscovered Go strategies developed by human players for over 1,000 years. It might prove to be a more powerful way of learning than other learning approaches that depend on human expertise or on finding patterns in large data sets. It shows that the development of algorithms does not depend on a huge amount of data. Moreover, by not using human expertise, the algorithm does not harbour the constraints of human knowledge.
For the most part, artificial intelligence (AI) software on the market today is using real-world data usually generated by humans. A large amount of data can be expensive, unavailable or biased.
DeepMind’s mission is to maximize the positive and transformative impact of AI on society. The development of AlphaGo Zero is a critical step towards this goal. AlphaGo Zero was not only designed to understand and play the game of Go, but to be used in different fields such as developing drugs to cure diseases, protein folding, and new material design .
Marie-Anne Valiquette obtained a Bachelor's degree in Mechanical Engineering at the École de technologie supérieure (ÉTS) in Montreal. She lives in Silicon Valley, California where she studies artificial intelligence through online platforms like Udacity and deeplearning.ai.
Program : Mechanical Engineering