GO is a considerably more difficult game to master than chess. In GO, there are 250 valid movements in any given situation as opposed to the typical 35 in chess. A GO board can be set up in more different ways than there are atoms in the universe. As a result, a blend of critical thinking, strategy, imagination, and intellect is needed to solve the GO puzzle. For this reason, AlphaGo’s victory over Lee Sedol in the game of GO in 2016 is seen as a significant turning point in the development of artificial intelligence technology.
Since 2021, KataGo has gained popularity as an open-source AI capable of defeating the best human GO players. With several upgrades and enhancements, KataGo was taught using a method similar to AlphaZero. It is capable of reaching the top levels quickly and completely from scratch with no outside data, progressing purely via self-play.
A paper outlining a strategy to defeat KataGo by using adversarial techniques that exploit KataGo’s blind spots was published last week by a group of AI researchers from MIT, UC Berkeley, and FAR AI. A far inferior hostile GO-playing program can cause KataGo to lose by making unexpected plays outside of its training set.
The primary disadvantage of deep learning-based algorithms is that they are only as good as the data they are trained on. Consequently, introducing false data might lead to the deep learning model malfunctioning. A model may be subjected to an adversarial assault by being given false or deceptive data while it is being trained or by being given data that has been purposefully created to fool a model that has already been trained. The researchers looked for and discovered a vulnerability in KataGo in their latest endeavor.
KataGo might struggle against opponents who play in unfamiliar or unusual ways since it is trained on “standard” methods to play the game of GO. The researchers suggested that attempting to stake out a small corner of the board may be one approach to playing GO in a hostile manner. By controlling the whole rest of the board, this strategy deceives KataGo into believing it has already won the game. One of the principles of GO is that if one player passes and the other follows suit, the game is over, and the winners are determined by adding up their points.
The opposition scores more points and triumphs because it receives all the points for its little corner territory, and KataGo does not receive any points for the undefended territory that has adversarial stones. In this manner, the adversary wins by fooling KataGo into prematurely stopping the game at a position beneficial to the enemy. Researchers reported that attack defeats KataGo with a win rate of >99% when no search is used and a win rate of >50% when KataGo employs sufficient search to be almost superhuman. The researchers point out that the trick only works with KataGo; attempting to use it against humans (even amateurs) would quickly fail since they will instinctively understand what is going on.
The key takeaway is that learning GO does not provide the opponent an advantage, nor is it superior than KataGo. The adversary’s primary goal is to exploit an unforeseen weakness in KataGo, which it easily does. This discovery has significantly wider ramifications because practically every deep-learning AI system may experience similar situations.
Adam Gleave, a doctoral student at UC Berkeley and one of the paper’s co-authors, explains that research demonstrates that AI systems that appear to function at a human level frequently do so in a very ‘alien’ way and as a result, might fail in ways that are unexpected to humans. Gleave claims that while this outcome in GO is amusing, failures of a similar nature in safety-critical systems might be catastrophic.
For instance, imagine a self-driving vehicle AI that runs into a very unusual situation that it didn’t anticipate, allowing a human to manipulate it into engaging in risky activities. This study underscores the necessity for improved automated testing of AI systems to uncover worst-case failure modes, not only assess average-case performance, according to Gleave.