Subcategory:
Category:
Words:
434Pages:
2Views:
317AlphaGo AI from inside All right All right By now everyone knows that DeepMind's AlphaGo defeated 18 times world champion Lee Sedol on March 9 2016 at the ancient Chinese game Go The game of Go has as many possible moves as there are atoms in the universe DeepMind is a British Artificial Intelligence AI company found in September 2010 as DeepMind technologies It was later acquired by Google in 2014 DeepMind s goal is to solve intelligence You can check more at their website https deepmind com Coming back to AlphaGo it defeating the professional Go champions is considered HUGE for AI Like REALLY HUGE It shocked scientists who were thinking that something like this wouldn't happen until at least another decade It equally shocked experts in the Artificial Intelligence community Machine that learning on its own is a huge leap for technology The way DeepMind started off is that they showed AlphaGo a hundred thousand games downloaded from internet that strong amateurs played In the first version AlphaGo was designed to mimic the player The goal was to make AlphaGo stronger and compete with top professionals They took this version that has already learnt to mimic human play they made it play itself 30 million times It used Reinforcement Learning It means that it is not preprogrammed and learns from experience
Using Reinforcement Learning the system learnt to improve incrementally by avoiding errors By the end of this they had a new version that could beat the old version The reinforcement learning is model free that means it doesn t need a structure or rules to work The interesting part is after getting knowledge of few games it is able to transfer the knowledge across more games The first version The first version of AlphaGo used two neural networks that cooperated to choose its moves Both are Convolutional Neural Networks CNN with 12 layers It is used for classification of images It can take images as inputs and output class probability after being trained on labeled image dataset They learn the mapping between inputs and outputs The first network is called the policy network its job is to take board positions as inputs and decide next best move to make DeepMind trained the policy network on million examples of moves made by strong human players The goal was to replicate the choices of strong human players
The breadth first search was memory intensive So what Monte Carlo search does is it instead scatters the order in which the tree is searched TO MIN IMISE THE CHANCCE that there is very promising part of the tree we could have discovered earlier than we slogged through the search in prescribed ordering The latest version AlphaGo Zero still uses Monte Carlo tree search but instead of using a separate policy network to select the next move to play and value network to predict the winner of the game they integrated both into a single neural network that evaluates positions Unlike previous versions that were trained of human games zero skips the steps and learn by playing against itself starting from completely random play And you know what after three days of training zero bear the previous version of AlphaGo the one that defeated 18 time world champion by 100 games to zero and after 40 days It outperformed a later version that defeated number one Many question this by asking Is this an alarm I guess only future can answer that The makers aim to use the algorithm used in the software in healthcare and science to improve the speed of breakthroughs of those areas by helping human experts achieve more