New level of intelligence: A new AI can teach itself with a “reinforcement learning algorithm” resulting in “superhuman” abilities within hours
Sunday, December 24, 2017 by: Zoey Sky
Even though DeepMind’s AlphaZero played against itself for only four hours, it managed to “synthesize the chess knowledge of one and a half millennium.” AlphaZero managed to surpass both human players and the reigning World Computer Champion Stockfish with 28 wins to 0 in a 100-game match.
Demis Hassabis, DeepMind’s co-founder, is a former chess prodigy. Along with his team, Hassabis aimed to defeat Go, a game that humans had the edge over artificial intelligence (AI). Little did they know that a chess engine would soon be able to learn fast enough to beat us at another game. (Related: Technology in the classroom: Robots could replace teachers in 10 years.)
AlphaZero’s superhuman abilities were documented in the academic paper Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, which was published on December 5, 2017. The paper revealed that the DeepMind team successfully confirmed that a generic version of their algorithm, which had no specific knowledge other than the rules of the game, “could train itself for four hours at chess, two hours in shogi (Japanese chess) or eight hours in Go and then beat the reigning computer champions – i.e. the strongest known players of those games.”
Stockfish, the reigning TCEC computer chess champion, still failed to make the final this year even though it won 51 games. But when faced with the chess-trained AlphaZero, Stockfish lost 28 games and won none while the remaining 72 games were drawn.
When playing white, AlphaZero scored an outstanding 25 wins and 25 draws. Playing Black, the algorithm only scored 3 wins and 47 draws.
Aside from the rules of chess, AlphaZero was a blank slate. It then played games via a Monte-Carlo algorithm, where “initially random moves would be tried out until a neural network began to learn which options were likely to be more promising.” As it trained, AlphaGo had access to “5,000 first generation TPUs to generate self-play games and 64 2nd-generation TPUs to train the neural networks.”
TPUs are tensor processing units, and they aren’t publicly available yet because Google developed them specifically to handle the calculations required by machine learning. On the other hand, the trained algorithm ran on a single machine that had four TPUs. DeepMind explained that their approach was efficient, and AlphaZero only looked for 80,000 positions per second, which was a marked improvement in contrast to the 70 million positions for Stockfish.
Even though AlphaZero based its computations on a lower number of evaluations, it compensated for the difference by using its deep neural network to hone in on the variations that held more promise, which is definitely a more “human-like” approach to search.
Generic machine-learning algorithms are game-changers, not just for the chess world but the world around us. While there are many roadblocks ahead, we can soon develop a kind of basic consciousness and intelligence, or what is called true AI. It’s also possible that reinforced learning will soon shape AI into “the most intelligent entity in our known universe.”
Future developments hinge on how strongly DeepMind wants to keep their chess-trained algorithm active.
The paper is further detailed on CDN.Chess24.com.
The pros and cons of AI
Whether you’re for them or against them, we can’t deny that AI has various pros and cons:
- Pros – Early diagnosis of diseases, driverless cars, and voice-operated assistants.
- Cons – Rise of overdeveloped AI (e.g. sports or video games), job automation, and security issues. (Click to Source)