Not sure I understand the Simulation step. Presumably each environment/game requires that step to be custom tailored to generate some outcome based on a game state. And for many environment/games, the outcome is dependent upon a long chain of steps (which, at this point, aren't executed). Does it just randomly pick steps after that until an end state is reached? That doesn't seem like it should work. Thanks for the talk!
From my understanding it seems to be so, it will randomly pick steps until all resources are exhausted. For systems that have infinite resources, I have no idea, I guess the programmer would have to explicitly state the maximum amount of steps the simulated tree should take before it stops.
For Go this would mean simulating random legal moves for both sides until the game ends and then registering the result of the game. But I could be wrong.
jeffbradberry.com/posts/2015/09/intro-to-monte-carlo-tree-search/ ^ this article is a good summary which also links to his Python code examples. Multiple games including Connect4 and Reversi.
Selection and Expansion follow a "Tree Policy" which deterministically chose the nodes with the best values. Simulation follows a "Rollout Policy" which you can design yourself using tools of choice/heuristics. If it's selecting actions randomly you get a really robust but slow learning Tree Policy. If you replace it with a function approximator you don't even have to do full rollouts (the results of those are approximated) so faster learning Tree Policy but also more biased.
In the first example I see each node is a 3*3 square, so does it mean the next move of AI is only limited in the 3*3 area? If we enlarge the square, it will increase the time complexity right?
Great explanation of how the MCTS works. I have one question: At 10:34, what is meant by "after enough simulations the MCTS converges to look like values in the minimax"? Suppose minimax returns "1" for winning positions and "0" for loosing positions. Suppose for two moves "a" and "b" the MCTS returns the values "9990/10000" and "10/1000000" respectively. Does "9990/10000" correspond to "1" in the minimax tree? Should we choose "b" because it has more simulations? Or will MCTS never output such values?
Like he said the tree node with more simulations will be the one that is picked. A node that is clearly losing won’t have that many simulations either as shown by the formula. All he means by converging is that the algorithms will eventually agree on what the best move is and the second and third best etc. they will converge to the same move ranking from best to worst.
One question. If there is no eval function, to be able to say "good or bad" after doing simulation, do we have to arrive at a final state so that we can say it's a win or loss?
assume you dont have any knowledge of the "promising" evaluation function, only the win/lose ending condition, you have to randomly put somehow, only recording the position placed, diff chains, game recordings
Interesting video but the presenter does not know much about games and makes a number of inaccuracies. 1. Kasparov beat Deep Blue in 1996 but presenter says the computer won 2. Chess cannot be, and will never be, brute-forced; there are too many possible combinations 3. It's not true that it took longer for a Go computer to beat the top human player (presenter says "world champion", but Go does not have official world championships like in chess), one can only say it happened later. But that was because there was not much interest in making a Go computer, whereas Deep Blue was the culmination of 40 years of making chess computers. Deep Blue was a significant investment for IBM, including one of the fastest supercomputers of its time, a top and large programming team, hiring of a full-time chess grandmaster, hiring several other grandmasters as consultants, etc. No one expended any resources into a Go computer until Deepmind. AlphaGo was also magnitudes faster than Deep Blue, so if talking about computing speed (presenter's reference to brute force), it applies much more so to AlphaGo. There certainly wasn't anything close to 40 years of development in Go computers.
First two points are true, but the third one simply wrong. The Computer Go research is almost as long as for Chess, and it was a popular research theme, exactly because it couldn't be solved with the same approaches as in chess. The Ing foundation even offered a $1.4M dollar price till the year 2000 for anyone who could write a strong Go program, which went unclaimed (senseis.xmp.net/?IngPrize). So it's not like there was no incentive to invest many resources into the problem.
You're missing a lot of information. The method used for chess was only tractable because of intelligent pruning. For Go, you cannot prune intelligently using hand-made methods due to the state space being so much larger as well as Go not having a well-defined way to determine who's winning (outside of the obvious cases). Go required a whole new way of approaching the search space to become tractable. Consequently, the newer methods that are used for Go are also orders of magnitudes better at Chess.
Your point 3 is a gross misstatement. Chess got lots of development and funding because it was on the frontier of feasibility. Go was not. It actually had lots of CS departments chugging along at it for a very long time. The approaches to Chess utterly failed at Go and thus CS departments used it for a research problem.