While I was working on my minimalist but full implementation of a GPT, I also thought of a game that can help participants understand better how language models really work. Here are the rules:
- Someone asks a question.
- Participants take turns, making a best effort to contribute to the answer, ONE WORD AT A TIME.
- The round is finished when someone ends it with a period.
Say, there are three participants, Alice, Bob and Christine, trying to answer the question, “What was the most significant geopolitical event of the 20th century”?

Did Alice really want to talk about the atomic bomb? Perhaps she was thinking of the Sarajevo assassination and the start of WW1. Or the collapse of the USSR.
Did Bob really mean to talk about the bomb? Perhaps he was thinking about the discovery of the atomic nature of matter and how it shaped society. Or maybe something about the atomic chain reaction?
Did Christine really mean to talk about the first atomic test, the Trinity test in New Mexico? Maybe she had in mind Hiroshima and Nagasaki.
The answer we got is an entirely sensible answer. But none of the participants knew that this will be the actual answer. There was no “mind” conceiving this specific answer. Yet the “latent knowledge” was present in the “network” of the three players. At each turn, there were high probability and lower probability variants. Participants typically but not necessarily picked the highest probability “next word”, but perhaps opted for a lower probability alternative on a whim, for instance when Bob used “TESTED” instead of “DROPPED”.
Language models do precisely this, except that in most cases, what they predict next is not a full word (though it might be) but a fragment, a token. There is no advance knowledge of what the model would say, but the latent knowledge is present, as a result of the model’s training.
In 1980, Searle argued, in the form of his famous Chinese Room thought experiment, that algorithmic symbol manipulation does not imply understanding. In his proposed game, participants who do not speak Chinese manipulate Chinese language symbols according to preset rules, conveying the illusion of comprehension without actual understanding. I think my little game offers a perfect counterexample: A non-algorithmic game demonstrating the emergence of disembodied intelligence based on the prior world knowledge of its participants, but not directly associated with any specific player.
My wife and I just played two turns of this game. It was a fascinating experience for both of us.