Very cool site. I was referring to off the rack chat GPT that people use as a search tool/life hack/assistant--I think its weirdly bad chess play can give us insight into what it's missing in life and writing advice too. That said, of course it can be combined with a traditional chess engine to give coherent tips (which appear to be what this is!)
Even more shocking than the way it plays is what it gives you for a FEN after every move. Ask it to generate one and you’ll see. The moves are not associated with a board. They are just associated with other moves in the public record. So, when the public record grows thin, the moves break free from reality.
I tried playing chat gpt but when it made a mistake, instead of letting it correct it, I asked why it made an illegal move in the first place. Interesting enough it answered how a human would answer.
"The mistake came from automatic pattern recognition — a kind of muscle memory that kicks in when playing fast: I saw “knight sac + check,” and my brain filled in a familiar response without validating the actual piece placement."
So in a way its doing exactly what it's supposed to do, pass a Turing test. It choses human behavior over accuracy.
One of our 1600 club players made a bishop move in time pressure moving it from a light square to a dark square and neither player caught it.
Interesting! I didn't ask this about chess, but when it hallucinates other search results, I've also asked it to explain its hallucinations. A lot of it seems designed to please the user and find something that sounds cool or fitting.
Interestingly, there is decently strong chess skill latent in the GPT models, but the work done to mould them into friendly chatbots takes away most of the skill. The one standout model where the chess abilities are clear is gpt-3.5-turbo-instruct, which is/was accessible only via API. You prompt it with the start of a PGN file and read the next move that it continues with.
I (1800 FIDE, 1900 Lichess blitz) have played two ten-game matches against it at a 3+2 time control, using a custom-made setup, losing the first 8-2 and the second 9-1. I played out all my losses to mate, testing that it can finish off games properly. Playing against it is a sometimes-strange experience -- it is extremely fast, regularly playing and defending against two-move tactics, but it occasionally makes blunders that no human with that speed would make, allowing me to not always lose.
With an hour on my clock, I think I would usually (though not always) win -- the text-completion model always plays at the same speed and doesn't spend time 'reasoning' the way some of the latest chatbots can do.
It does occasionally produce illegal moves, most commonly something involving a check (e.g. moving a pinned piece, moving out of a check by one piece into a check by another), but they are relatively rare -- nowhere near the frequency seen from regular ChatGPT -- and I can go ten games without seeing one.
One stark difference between the GPT model and regular chess engines is that its moves don't just depend on the current position, but also the sequence of moves that reached it. A sequence of normal moves will usually lead to a normal continuation; the same position reached by silly moves with pieces left hanging along the way will probably lead to some bizarre continuation.
It's fascinating to me that an entity can learn to play chess at a strong club standard just by studying the letters and numbers in PGN files. In principle it makes sense that it should be possible with enough training examples (there are many millions of PGN'd games), but it remains deeply astonishing to me to see it happen.
I also ran some experiments playing gpt-3.5-turbo-instruct against Leela without search. Not surprisingly, the dedicated chess network is far stronger than the general language model, but the score wasn't 100-0, with GPT picking up 3 points in 100 games against the latest Leela network, and more against some early Leela networks.
1. g4 is Stockfish's (henceforth "the engine's") clear choice for White's worst opening move, but 1. f3 seems to be the favored way either to express contempt (actual or feigned), or to ensure that the only ego points on the line are the opponent's. Either way, it's a punk move.
Advanced question: aside from g4 and f3, what three opening moves does the engine consider to be worse than making no move at all?
I feel like I know this one but I forget---so my guesses are 1.h4 and 1. b4. As soon as I hit reply i'm going off to check with the engines and see if I'm right :)
I think this app is very useful (https://app-chesscoach.vercel.app), chess is a game of feedback so we can't always pay USD 100 an hour for a GM, the other day I couldn't get rid of a situation with a knight on a square threatening my King and when I asked for the AI Analysis it showed me that all I had to do was advance a pawn, so simple and I couldn't see it, obviously my game improves, it showed me that I usually leave the central squares free for my opponent, situations that just using stockfish are difficult to notice with all those little arrows, lol, really fantastic app.
Hey GPT im ready for my chess game, but not for your advice :)
You can always ask me for advice :) JenPT!
And then we'd experience a different form of A.I.: actual intelligence.
but then, why does it work here: https://app.chesscoach.dev
Very cool site. I was referring to off the rack chat GPT that people use as a search tool/life hack/assistant--I think its weirdly bad chess play can give us insight into what it's missing in life and writing advice too. That said, of course it can be combined with a traditional chess engine to give coherent tips (which appear to be what this is!)
Thank you both for your comment! I am the creator of app.chesscoach.dev
I believe there is a future where LLMs can help explain nuances of chess especially once they have been further trained / fine-tuned.
I really enjoyed your article and Jennifer, I dropped you an email as well.
Even more shocking than the way it plays is what it gives you for a FEN after every move. Ask it to generate one and you’ll see. The moves are not associated with a board. They are just associated with other moves in the public record. So, when the public record grows thin, the moves break free from reality.
Yep, totally weird and when I asked for a photo of it it had two white kings and a black queen (no black king or white queen.)
I tried playing chat gpt but when it made a mistake, instead of letting it correct it, I asked why it made an illegal move in the first place. Interesting enough it answered how a human would answer.
"The mistake came from automatic pattern recognition — a kind of muscle memory that kicks in when playing fast: I saw “knight sac + check,” and my brain filled in a familiar response without validating the actual piece placement."
So in a way its doing exactly what it's supposed to do, pass a Turing test. It choses human behavior over accuracy.
One of our 1600 club players made a bishop move in time pressure moving it from a light square to a dark square and neither player caught it.
Interesting! I didn't ask this about chess, but when it hallucinates other search results, I've also asked it to explain its hallucinations. A lot of it seems designed to please the user and find something that sounds cool or fitting.
Thanks for your story, Jen. I’m sure I’ll never be playing GPT in Chess.
Interestingly, there is decently strong chess skill latent in the GPT models, but the work done to mould them into friendly chatbots takes away most of the skill. The one standout model where the chess abilities are clear is gpt-3.5-turbo-instruct, which is/was accessible only via API. You prompt it with the start of a PGN file and read the next move that it continues with.
I (1800 FIDE, 1900 Lichess blitz) have played two ten-game matches against it at a 3+2 time control, using a custom-made setup, losing the first 8-2 and the second 9-1. I played out all my losses to mate, testing that it can finish off games properly. Playing against it is a sometimes-strange experience -- it is extremely fast, regularly playing and defending against two-move tactics, but it occasionally makes blunders that no human with that speed would make, allowing me to not always lose.
With an hour on my clock, I think I would usually (though not always) win -- the text-completion model always plays at the same speed and doesn't spend time 'reasoning' the way some of the latest chatbots can do.
It does occasionally produce illegal moves, most commonly something involving a check (e.g. moving a pinned piece, moving out of a check by one piece into a check by another), but they are relatively rare -- nowhere near the frequency seen from regular ChatGPT -- and I can go ten games without seeing one.
One stark difference between the GPT model and regular chess engines is that its moves don't just depend on the current position, but also the sequence of moves that reached it. A sequence of normal moves will usually lead to a normal continuation; the same position reached by silly moves with pieces left hanging along the way will probably lead to some bizarre continuation.
It's fascinating to me that an entity can learn to play chess at a strong club standard just by studying the letters and numbers in PGN files. In principle it makes sense that it should be possible with enough training examples (there are many millions of PGN'd games), but it remains deeply astonishing to me to see it happen.
I also ran some experiments playing gpt-3.5-turbo-instruct against Leela without search. Not surprisingly, the dedicated chess network is far stronger than the general language model, but the score wasn't 100-0, with GPT picking up 3 points in 100 games against the latest Leela network, and more against some early Leela networks.
https://pappubahry.substack.com/p/gpt-versus-one-node-leela
1. g4 is Stockfish's (henceforth "the engine's") clear choice for White's worst opening move, but 1. f3 seems to be the favored way either to express contempt (actual or feigned), or to ensure that the only ego points on the line are the opponent's. Either way, it's a punk move.
Advanced question: aside from g4 and f3, what three opening moves does the engine consider to be worse than making no move at all?
I feel like I know this one but I forget---so my guesses are 1.h4 and 1. b4. As soon as I hit reply i'm going off to check with the engines and see if I'm right :)
Argh, the second was I thought of right after reply. 1. Nh3?? Probably the third worst move indeed.
I think this app is very useful (https://app-chesscoach.vercel.app), chess is a game of feedback so we can't always pay USD 100 an hour for a GM, the other day I couldn't get rid of a situation with a knight on a square threatening my King and when I asked for the AI Analysis it showed me that all I had to do was advance a pawn, so simple and I couldn't see it, obviously my game improves, it showed me that I usually leave the central squares free for my opponent, situations that just using stockfish are difficult to notice with all those little arrows, lol, really fantastic app.
They promised us rocket ships, and we end up with Eddie Haskell.