{"id":13509,"date":"2025-06-18T03:47:39","date_gmt":"2025-06-18T07:47:39","guid":{"rendered":"https:\/\/spinor.info\/weblog\/?p=13509"},"modified":"2025-07-14T21:41:34","modified_gmt":"2025-07-15T01:41:34","slug":"yes-gpt-can-play-chess","status":"publish","type":"post","link":"https:\/\/spinor.info\/weblog\/?p=13509","title":{"rendered":"Yes, GPT can play chess"},"content":{"rendered":"<p>The other day, I <a href=\"https:\/\/www.tomshardware.com\/tech-industry\/artificial-intelligence\/chatgpt-got-absolutely-wrecked-by-atari-2600-in-beginners-chess-match-openais-newest-model-bamboozled-by-1970s-logic\">came across a post<\/a>, one that has since appeared on several media sites: An Atari 2600 from 1977, running a chess game, managed to beat ChatGPT in a game of chess.<\/p>\n<p>Oh my, I thought. Yet another example of people misconstruing a language model&#8217;s capabilities. Of course the Atari beat ChatGPT in a game of chess. Poor ChatGPT was likely asked to keep track of the board&#8217;s state in its &#8220;head&#8221;, and accurately track that state across several moves. That is not what an LLM is designed to do. It is fundamentally a token generator: You feed it text (such as the transcript of the conversation up to the latest prompt) and it generates additional text.<\/p>\n<p>The fact that the text it generates is coherent, relevant, even creative and information-rich is a minor miracle on its own right, but it can be quite misleading. It is easy to sense a personality behind the words, even without the deceptively fine-tuned &#8220;alignment&#8221; features of ChatGPT. But personality traits notwithstanding, GPT would not be hiding a secret chessboard somewhere, one that it could use to keep track of, and replay, the moves.<\/p>\n<p>But that does not mean GPT cannot play chess, at least at an amateur level. All it needs is a chessboard.<\/p>\n<p>So I ran an experiment: I supplied GPT with a chessboard. Or to be more precise, I wrote a front-end that fed to GPT the current state of the board using a recognized notation (FEN &#8212; the Forsyth-Edwards Notation). Furthermore, I only invoked GPT with minimal prompting: the current state of the board and up to two recent moves, instead of the entire chat history.<\/p>\n<p>I used the o3 reasoning model of GPT4.1 for this purpose, which seemed to have been a bit of an overkill; GPT pondered some of the moves for several minutes, even exceeding five minutes in one case. Although it lost in the end, it delivered a credible game against its opponent, GNU Chess playing at &#8220;level 2&#8221;.<\/p>\n<p>In fact, for a while, GPT seemed to be ahead: I was expecting it to win when it finally made a rather colossal blunder, sacrificing its queen for no good reason. That particular move was a profound outlier: Whereas GPT prefaced all its other moves with commentary and analysis, this particular move was presented with no commentary whatsoever. It&#8217;s almost as if it simply decided to sabotage its own game. Or perhaps just a particularly bad luck of the draw by an RNG in what is fundamentally a stochastic reasoning process?<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-13510\" src=\"https:\/\/spinor.info\/weblog\/wp-content\/uploads\/2025\/06\/gpt-vs-gnu-chess.png\" alt=\"\" width=\"400\" height=\"399\" srcset=\"https:\/\/spinor.info\/weblog\/wp-content\/uploads\/2025\/06\/gpt-vs-gnu-chess.png 400w, https:\/\/spinor.info\/weblog\/wp-content\/uploads\/2025\/06\/gpt-vs-gnu-chess-300x300.png 300w, https:\/\/spinor.info\/weblog\/wp-content\/uploads\/2025\/06\/gpt-vs-gnu-chess-150x150.png 150w, https:\/\/spinor.info\/weblog\/wp-content\/uploads\/2025\/06\/gpt-vs-gnu-chess-96x96.png 96w, https:\/\/spinor.info\/weblog\/wp-content\/uploads\/2025\/06\/gpt-vs-gnu-chess-24x24.png 24w, https:\/\/spinor.info\/weblog\/wp-content\/uploads\/2025\/06\/gpt-vs-gnu-chess-36x36.png 36w, https:\/\/spinor.info\/weblog\/wp-content\/uploads\/2025\/06\/gpt-vs-gnu-chess-48x48.png 48w, https:\/\/spinor.info\/weblog\/wp-content\/uploads\/2025\/06\/gpt-vs-gnu-chess-64x64.png 64w\" sizes=\"(max-width: 400px) 100vw, 400px\" \/><\/p>\n<p>Still, it managed to last for 56 moves against capable chess software. The only other blunder it made during the game was one attempted illegal move that would have left its king in check. A retry attempt yielded a valid move.<\/p>\n<p>I am including a transcript of the game below, as recorded by my interface software. The queen was lost in turn 44.<\/p>\n<p>As I mentioned, I thought that using the o3 model and its reasoning capability was excessive. Since then, I ran another experiment, this time using plain GPT4.1. The result was far less impressive. The model made several attempts at illegal moves, and the legal moves it made were not exactly great; it lost its queen early on, and lost the game in 13 moves. Beginner level, I guess.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-13511\" src=\"https:\/\/spinor.info\/weblog\/wp-content\/uploads\/2025\/06\/gpt41-vs-gnu-chess.png\" alt=\"\" width=\"587\" height=\"430\" srcset=\"https:\/\/spinor.info\/weblog\/wp-content\/uploads\/2025\/06\/gpt41-vs-gnu-chess.png 587w, https:\/\/spinor.info\/weblog\/wp-content\/uploads\/2025\/06\/gpt41-vs-gnu-chess-300x220.png 300w, https:\/\/spinor.info\/weblog\/wp-content\/uploads\/2025\/06\/gpt41-vs-gnu-chess-150x110.png 150w\" sizes=\"(max-width: 587px) 100vw, 587px\" \/><\/p>\n<p>Oh well. So maybe the reasoning model is required, to be able to make GPT play credibly.<\/p>\n<p>Nonetheless, I think these examples demonstrate that while these models are no chess grandmasters, they are not stochastic parrots either. The fact that a language model can, in fact, offer a reasonable game against a dedicated chess software opponent speaks for itself.<\/p>\n<p>Here is the transcript of the 56-move game that the o3 model would have likely won, had it not squandered away its queen.<\/p>\n<pre style=\"margin-left: 1in; margin-right: 1.5in;\">1. e4 e6\r\n2. d4 d5\r\n3. Nc3 Nf6\r\n4. Bg5 Be7\r\n5. e5 Nd7\r\n6. h4 Bxg5\r\n7. hxg5 Qxg5\r\n8. Nf3 Qd8\r\n9. Bd3 h6\r\n10. O-O a5\r\n11. Nb5 Nb6\r\n12. Rc1 O-O\r\n13. c4 dxc4\r\n14. Bxc4 Bd7\r\n15. Bd3 Bxb5\r\n16. Bxb5 f6\r\n17. Qb3 Qe7\r\n18. exf6 Rxf6\r\n19. Ne5 Qd6\r\n20. Rd1 Kf8\r\n21. Nc4 Nxc4\r\n22. Qxc4 c6\r\n23. Rd3 Rf7\r\n24. Ba4 b5\r\n25. Bxb5 Qd7\r\n26. Qc5+ Ke8\r\n27. d5 cxb5\r\n28. dxe6 Qxe6\r\n29. Re3 Qxe3\r\n30. Qxe3+ Re7\r\n31. Qf3 Ra7\r\n32. Qh5+ Kd8\r\n33. Rd1+ Rd7\r\n34. Qf3 Rxd1+\r\n35. Qxd1+ Nd7\r\n36. Qd2 b4\r\n37. a3 Re4\r\n38. f3 Rc4\r\n39. b3 Rc3\r\n40. axb4 axb4\r\n41. Qd5 Rc1+\r\n42. Kh2 g5\r\n43. Qg8+ Kc7\r\n44. Qc8+ Kxc8\r\n45. Kg3 Rc3\r\n46. Kg4 Rxb3\r\n47. Kh5 Rxf3\r\n48. gxf3 b3\r\n49. Kxh6 b2\r\n50. Kxg5 b1=Q\r\n51. Kh6 Qh1+\r\n52. Kg7 Qxf3\r\n53. Kg8 Ne5\r\n54. Kh8 Qb7\r\n55. Kg8 Qf7+\r\n56. Kh8 Ng6#<\/pre>\n<p>I can almost hear a character, from one of the old Simpson&#8217;s episodes, in a scene set in Springfield&#8217;s Russian district, yelling loudly as it upturns the board: &#8220;\u0425\u043e\u0440\u043e\u0448\u0430\u044f \u0438\u0433\u0440\u0430!&#8221;<\/p>\n<fb:like href='https:\/\/spinor.info\/weblog\/?p=13509' send='false' layout='button_count' show_faces='true' width='450' height='65' action='like' colorscheme='light' font='lucida grande'><\/fb:like>","protected":false},"excerpt":{"rendered":"<p>The other day, I came across a post, one that has since appeared on several media sites: An Atari 2600 from 1977, running a chess game, managed to beat ChatGPT in a game of chess. Oh my, I thought. Yet another example of people misconstruing a language model&#8217;s capabilities. Of course the Atari beat ChatGPT <a href='https:\/\/spinor.info\/weblog\/?p=13509' class='excerpt-more'>[&#8230;]<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[58],"tags":[],"class_list":["post-13509","post","type-post","status-publish","format-standard","hentry","category-cybernetics","category-58-id","post-seq-1","post-parity-odd","meta-position-corners","fix"],"_links":{"self":[{"href":"https:\/\/spinor.info\/weblog\/index.php?rest_route=\/wp\/v2\/posts\/13509","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/spinor.info\/weblog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/spinor.info\/weblog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/spinor.info\/weblog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/spinor.info\/weblog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=13509"}],"version-history":[{"count":6,"href":"https:\/\/spinor.info\/weblog\/index.php?rest_route=\/wp\/v2\/posts\/13509\/revisions"}],"predecessor-version":[{"id":13587,"href":"https:\/\/spinor.info\/weblog\/index.php?rest_route=\/wp\/v2\/posts\/13509\/revisions\/13587"}],"wp:attachment":[{"href":"https:\/\/spinor.info\/weblog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=13509"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/spinor.info\/weblog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=13509"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/spinor.info\/weblog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=13509"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}