Essay Example on AlphaGo AI from Inside

Subcategory:

Category:

Words:

434

Pages:

2

Views:

212

AlphaGo AI from inside All right All right By now everyone knows that DeepMind's AlphaGo defeated 18 times world champion Lee Sedol on March 9 2016 at the ancient Chinese game Go The game of Go has as many possible moves as there are atoms in the universe DeepMind is a British Artificial Intelligence AI company found in September 2010 as DeepMind technologies It was later acquired by Google in 2014 DeepMind s goal is to solve intelligence You can check more at their website https deepmind com Coming back to AlphaGo it defeating the professional Go champions is considered HUGE for AI Like REALLY HUGE It shocked scientists who were thinking that something like this wouldn't happen until at least another decade It equally shocked experts in the Artificial Intelligence community Machine that learning on its own is a huge leap for technology The way DeepMind started off is that they showed AlphaGo a hundred thousand games downloaded from internet that strong amateurs played In the first version AlphaGo was designed to mimic the player The goal was to make AlphaGo stronger and compete with top professionals They took this version that has already learnt to mimic human play they made it play itself 30 million times It used Reinforcement Learning It means that it is not preprogrammed and learns from experience 



Using Reinforcement Learning the system learnt to improve incrementally by avoiding errors By the end of this they had a new version that could beat the old version The reinforcement learning is model free that means it doesn t need a structure or rules to work The interesting part is after getting knowledge of few games it is able to transfer the knowledge across more games The first version The first version of AlphaGo used two neural networks that cooperated to choose its moves Both are Convolutional Neural Networks CNN with 12 layers It is used for classification of images It can take images as inputs and output class probability after being trained on labeled image dataset They learn the mapping between inputs and outputs The first network is called the policy network its job is to take board positions as inputs and decide next best move to make DeepMind trained the policy network on million examples of moves made by strong human players The goal was to replicate the choices of strong human players 



After training it was able to match moves that strong human Go players would make up to 57 of the time To improve this they used Reinforcement Learning It was fast enough to pick one good move but needed to check thousands of possible move before making a decision SO they modified the network so instead of looking at entire 19x19 board it looked at the smaller window around the opponents previous move and the new movie it is considering This helped it compute the next best move a thousand times faster The second network is called the value network it answers the diff question than what move to play next instead of suggesting the next move it estimates the chance of each player winning the game given a board position it provides overall binary positional judgment that means it classifies future potential positions as either good or bad If value network says a particular variation looks bad the AI can skip reading anymore moves along that line of play In addition to the two networks mentioned above AlphaGo uses an algorithm called Monte Carlo tree search to help read sequences of future moves effectively If we attempt tree search one way to do it is depth first that means all the way to the end branches of the tree before backtracking to the next level

The breadth first search was memory intensive So what Monte Carlo search does is it instead scatters the order in which the tree is searched TO MIN IMISE THE CHANCCE that there is very promising part of the tree we could have discovered earlier than we slogged through the search in prescribed ordering The latest version AlphaGo Zero still uses Monte Carlo tree search but instead of using a separate policy network to select the next move to play and value network to predict the winner of the game they integrated both into a single neural network that evaluates positions Unlike previous versions that were trained of human games zero skips the steps and learn by playing against itself starting from completely random play And you know what after three days of training zero bear the previous version of AlphaGo the one that defeated 18 time world champion by 100 games to zero and after 40 days It outperformed a later version that defeated number one Many question this by asking Is this an alarm I guess only future can answer that The makers aim to use the algorithm used in the software in healthcare and science to improve the speed of breakthroughs of those areas by helping human experts achieve more



Write and Proofread Your Essay
With Noplag Writing Assistance App

Plagiarism Checker

Spell Checker

Virtual Writing Assistant

Grammar Checker

Citation Assistance

Smart Online Editor

Start Writing Now

Start Writing like a PRO

Start