Monday, March 15, 2010

Chess Query Language

It's amazing how many tools are available on the web for seemingly obscure tasks. Recently, a friend of mine was writing a short story, and he needed an answer to this question: In high-level chess games, how often do the different pieces survive through the game without being captured (ignoring kings)? In the context of this question, each of the 30 starting pieces is treated as distinct; we want to know how often the pawn that starts on the a2 square survives, how often the b2-pawn survives, etc., rather than how often general pawns survive.

I think this qualifies as an obscure question. It seems simple enough to answer in principle - just get a database of games, and write something to play through each game, tracking which pieces survive. Simple enough, but a fair bit of work. Luckily there's a tool that will do this type of thing: Chess Query Language.

CQL is quite powerful, and it's pretty straightforward to set up a CQL query. For example, to answer the above question about survival rates, I started by creating a query to see how often the white queen's rook survives:

:forany Rook R
(:position :initial $Rook[a1])
(:position :terminal $Rook[a-h1-8])

That's it. The first line creates a Rook piece designator; the second and third lines specify positions that have to exist in a game for the game to match the query. Thus the query will match any game where a rook is on the a1 square in the initial game position, and that same rook is somewhere on the board in the terminal game position.

This query took about 45 minutes to run through a database of about 2.5 million games, and found that this rook survived in about 1.4 million of them. I just had this repeat for all pieces and pawns to generate the final answer.

So, I can advise that if you're ever involved in a Harry Potter-style human chess game, you should volunteer to be one of the wing pawns. Don't allow yourself to play as a knight, whatever you do.