line

The line filter lets CQL look ahead or look backwards to determine what follows or precedes the current position.

Regular expressions

The syntax for line is based on the familiar regular expression syntax.

Regular expressions are usually used as a way to search for strings of characters inside text. For example, the regular expression

That was a (good )+game
could be used to find instances of "That was a good game" or "That was a good good game" or "That was a good good good game" and so on.

To convert the concept of ordinary regular expressions to CQL and apply it on chess positions, two changes need to be made.

Regular expressions on trees

Usually regular expressions are applied to a linear string of characters, like a text file. However, we can imagine a tree each node of which is labeled with a single characters. Each path from the root to a leaf of the tree denotes a particular linear string. We can match a regular expression to the tree by matching the expression to one of the strings from the root to the leaf.

Regular expressions in chess

In normal regular expressions, we match a string made up of characters. In chess, instead each character can be any chess position. So a "string" of chess positions is just a linear sequence of chess positions, arranged to make a legal game.

The regular expression itself, now uses filters to match a particular linear sequence of chess positions. For example, the regular expression

check*
will match any sequence of 0 or more positions in which one side is in check. Likewise
check+ --> stalemate
would match any sequence of 1 or more positions of check, followed by a stalemate. (The --> is used to separate different regular expressions)

Regular expressions in chess trees

Now we combine the concepts of regular expressions on trees and of regular expressions in chess to get what we want: regular expressions on chess trees. A chess tree is just a game tree: each node is a position, and a node is connected to its children by chess moves, according to the PGN file.

Thus, a chess regular expression matches a chess tree if there is a string of chess positions from the root of the tree to a leaf of the tree that is matched by the regular expression.

Regular expression chess syntax

CQL supports most of the usual character regular expression syntax, applied to filters instead of to other character regular expressions:

regexp meaning character example chess example
. any .z . -->
* zero or more repetitions z* check*
+ one or more repetitions z+ check+
? optional z? check?
{} repetition z{2,3} check{2 3}
() grouping (yz)* (check --> move from k)*

line syntax

A line filter consists of
  1. line
  2. optional parameters
  3. <-- or -->
  4. constituents separated by <-- or -->
The optional parameters are:
  • a range indicating that the length of the sequence of positions found must be in the range to match;
  • firstmatch indicating search should stop after the first sequence of positions matching the constituents is found;
  • lastposition indicating that the position at the end of the sequence of positions that is found is to be returned (rather the length of the sequence)
  • singlecolor indicating that only positions whose side to move is the same as to side to move of the position at the start of the line are to be considered
  • nestban indicating that a position cannot start a line if that position was already part of a matching sequence of the same line filter (from an earlier position)
  • primarywhen moving to the next position, only consider positions resulting from primary moves
  • secondary when moving to the next position, only consider positions resulting from secondary moves
  line --> check //current position is a check
  line --> check*>5 // at least 5 checks
  line 5 100 nestban --> check*
  line --> check
       --> move from k
       --> R attacks _
  line --> (move from k--> 
            move from K)+

<-- and -->

The arrows <-- and --> are used inside of a line filter to denote the direction of motion through the game tree and to separate individual constituents.

A right arrow --> denotes that the direction is towards future moves: the line is looking forward. The --> comes after the line keyboard and also between entries in parentheses.

If the --> is replaced by the left arrows <-- then the line will look backwards, towards the past and previous moves.

A given line filter can use only one type of arrow: either <-- or -->.

A constituent is a either a filter, or it is formed from another constituent by using the special regular expression characters: +, *, (), {}, or ?.

line filter semantics

We begin by discussing semantics when all the arguments are filters.
line --> filter1 
     --> filter2
     --> filter3
        ...
     --> filtern
matches the current position if:
  • filter1 matches the current position;
  • filter2 matches the next position following the current position
  • filter3 matches the next position after the next position following the current position
  • ... and so on
Before an argument filter is evaluated to determine whether it matches a position, the current position is set to that position.

If variations is set, then line will also consider positions that move into variations. Otherwise, only mainline positions are considered.

For example, the diagrammed position:

D. Gurgenidze 1985, start of sequence
(found from CQL file: checkcheckcheckstalemate.cql)

is matched by the CQL file checkcheckcheckstalemate.cql (when run on sample database). The relevant line filter:

  line --> . 
     --> check 
     --> check 
     --> check
     --> stalemate

regular expression special characters

A regular expression always matches the longest possible sequence of positions. The specific characters in detail:

The '*' symbol

'*' means "repeat 0 or more times".

Thus, consider the filter

  line --> not check
       -->  check*
The last 'check' is modified by the '*' so it is repeated 0 or more times. Thus, the expression is equivalent to:

  line --> check // repeat 0 times
  or line --> not check 
          --> check // repeat 1 times
  or line --> not check 
          --> check // repeat 1 times
          --> check // repeat 2 times
  or line --> not check 
          --> check // repeat 1 times
          --> check // repeat 2 times
          --> check // repeat 3 times
  ....; and so on forever
Therefore, the line filter matches a position if either:
  • The current position is not a check, or
  • The current position is not a check and the next position is a check, or
  • The current position is not a check and the next position is a check and the following position is a check... and so on.

Because of the rule above about matching longest sequence of position, the actual filter will match the longest possible sequence of checks that it can starting from a not check in the current position.

The '+' symbol

The '+' symbol following a constituent means "repeat 1 or more times". Thus,
	line --> move from Q
	      --> move from [Kk]+
              --> mate
	
will match any position from which the next move is a move by the White queen; following which is a sequence of one or more positions from which a King moves; following which there is a position that is mate.

{ range } repetition symbol

When the braces '{' and '}' enclose a range they denote repetition of the preceding constituent a number of times that lies inside the range. For example,
  line --> move from Q
       --> not move from Q {10 20}
       --> move from Q

will match a sequence of moves that begins with a move of the a white queen, then has between 10 and 20 consecutive moves of any piece other than a white queen, and ends with a move of a white queen.

(Note: The repetition symbol is not designed for very high numbers of repetitions; don't use if for a 1000 repetitions, for example, or there may be performance degradation).

The '?' symbol

The '?' following a constituent means "repeat 0 or 1 times". For example,
	line --> move from Q
	     --> move from k?
	     --> mate
means
	line --> move from Q
	     -->  mate
	or
	line --> move from Q
	     --> move from k
	     --> mate
That is, either White delivers mate with the Queen, or after White's Queen move, black delivers mate by a King move.

The '()' wildcard symbol

A sequence of filters separated by arrows inside parentheses matches a sequence of consecutive positions that match the filters respectively. This construct is used exclusively with wildcards.

For example, suppose you want to match a white queen move followed by a black move followed by a sequence of White checks by a pawn followed by black king moves followed by mate. You can use this:

  line --> move from Q
       --> btm
       --> (move from P  
                   --> check)+           
       --> wtm  
       --> mate

value of line filter

By default the line filter has as value the number of positions in the longest sequence of positions that was matched. Thus, a line filter can be sorted by putting it in a sort filter:
  sort
    {line --> check*}
    >= 5
or equivalently
   sort line 5 1000 --> check*

{+} and {*}

Sometimes it can be confusing to clearly distinguish between '+' and '*' as arithmetic operators and as wildcards. If you want to be absolutely sure these symbols are interpreted as wildcards, enclose them in braces.

Linearization: using move inside line when variations is set

When variations is set in the CQL header, line evaluates its constituent move filters differently from usual. (The rules below seem complicated, but they give the intuitive behavior).

The problem these rules are designed to address is that the move filter matches a position X based on every possible move that arises from the position. However, as the line filter goes down the game tree, it only selects a single move at a time from X to evaluate. This can mean that the line filter winds up traversing a move from line that the move filter already rejected.

Suppose for example that a user wants to match positions that from which a bishop promotes, and from which the game lasts at least 10 moves. A natural way to write this is this:

cql (input test.pgn variations)
 line 10 1000 
     --> move promote B
     --> .*
Here, the user saying that in the current position, one side promotes to a bishop, and that following that move, there are 0 or more other moves. The range 10 1000 in the next says there must be at least 10 positions contained in the matched sequence.

Now conside the following excerpt from a PGN game:

  
23. e8Q (23. e8B? Rf4+! =) 
23. ...Ng1
24. {many moves omitted}
...
44. Rf3 #
Call the position before white's 23'd move above X. The user does not want the CQL file to match X, because there is only one move following the bishop promotion. However, the above CQL would match the position without the rule specified below:
  • When X is evaluated, the move filter evaluates to true, because there is a matching move (a bishop promotion) from X.
  • From X, the line filter then evaluates the position after 23. e8Q. This position certainly matches the next constituent of the line filter, namely .*. The line filter goes on to match the position after 23...Ng1 and so on, all the way until 44. Rf3#
  • Since this represents at least 40 position, the whole line filter matches X.

To prevent this behavior CQL uses a rule called linearization. Under linearization, before evaluating any filters, the line filter chooses a single "line" in the game tree, from the root to a terminal position. All the move filters are then evaluated as if that "line" gave the only valid moves from a position. The remaining moves are discarded.

For example, in the PGN excerpt above, there are two lines:

  • the line 23. e8Q Ng1 ... 44. Rf3# is one line. When the next filter tries to choose the this line, the bishop underpromotion disappears from the game. Thus, the move filter in the CQL file, move promote B no longer matches the position X
  • the second line is 23. e8B Rf4+ . When the line filter selects this line, then the position X now does match the move filter, since there is bishop promotion at move 23. However, this line only has a length of three positions (X, and the positions after the next two moves). Since the length of this line is not long enough to fall within the 10 1000 range of the line filter, the line filter does not match X

Exceptions to linearization

Linearization is not applied in two cases. These rarely arise and this section can be safely skipped on first reading:
  • If the line filter never visits or needs to visit any positions following the current position, then there is nothing to linearize. For example, if the move filter in
        line --> move to _
    is not linearized, since there is only one filter in the line filter so the line filter never visits any other positions.
  • If a line filter (or any filter than can modify the current position) is inside another line, the linearization of the outer line filter does not affect the linearization of the inner line.
Combining these two principles, linearization may be disabled for a particular move filter inside a line filter by wrapping the move filter in another line filter.

To turn off linearization completely, use the -nolinearize option to cql (this option is unsupported).

Using wildcards to look for maneuvers

Wildcards are useful in isolating the particular pieces you're interested in in a maneuver. For example, in turton.cql, there is a line filter with these subfilters:
  line --> ...
        --> {not move from Front or Side}*
        --> move from Side to Criticalsquare
        ...
       
In this particular theme, without going into details, we are particularly interested in the movements of the pieces Front and Side. We want to track where they go. So the first wildcard operator ensures that that we ignore movements of pieces other than those, and just focus on those pieces.

Examples

The line filter is used through the examples: bristol-universal.cql, bristol1.cql, bristol2.cql, clearance-delayed.cql, consecutive-checks-by-one-color.cql, consecutive-checks.cql, excelsior-multiple.cql, forced-moves-both-sides.cql, forced-moves-either-side.cql, forced-moves-white.cql, parallelpaths-simple.cql, queentriangulation.cql, staircase-sort.cql, staircase.cql, turton.cql, white-try.cql.

Let's look in detail at how the line filter is used in turton.cql, which expresses the Turton theme. In the Turton theme, two line pieces switch places so that one can support the other. The first line piece has to cross a "critical square" to allow the second line piece to reach that critical square so that the pieces are correctly positioned.

The following study illustrates the theme. In the position below:

Costeff 2015, after 4...Rf4
(found from CQL file: turton.cql)

The first line piece (called Front in the CQL code) is the Qe3. The second line piece (called Side in the CQL code) is the Ra2. The critical square is e2.

The obvious thing for White to do is to move 4.Re2?, preparing a back rank mate:

after 4. Re2? (variation)
(found from CQL file: turton.cql)

Unfortunately, this maneuver fails to 4...R:f7 (all these lines can be seen in the comments in the turton-out.pgn of course).

White needs to get the queen behind the rook. White does this using the Turton maneuver.

White first executes the move 4. Qe7!, preparing the Turton but allowing the queen check 5...Qd1+ leading to the following position:

Costeff 2015, after 4...Qd1+
(found from CQL file: turton.cql)

This is where the Turton maneuver comes into play. The line filter of turton.cql begins

 line -->  move from Front
            comment ("Front moves to " Front
	              " crossing critical square " 
                      CriticalSquare)
The comment here is not a filter: it is parameter to the move filter. Thus, this line constituent consists of a single move filter with two parameters: from and comment.

Note that this particular constituent will be evaluated for many possible different CriticalSquare values. Due to smart comments however, the actual comment is output only if all the constituents of the line match (and are the longest possible match).

It turns out that when the current position is the above diagram, that Front is Qe7, that Side is Ra2 and that CriticalSquare is e2, all the rest of the line constituents match. That is why the comment is output to the critical move

 5. Qe1! Front moves to Qe1 crossing critical square e2 

reaching

after 5.Qe1
(found from CQL file: turton.cql)

Now, rather surprisingly the white cannot be taken, as white wins after 5...Q:e1+ 6. K:g2. Thus, the black queen retreat 5...Qd8 in response (glossing over the xray constituent, which eliminates certain pathological cases), we see an idiom that comes up a lot in this kinds of themes:

 -->not move from (Front|Side)*
The subfilter Front|Side are the squares on which the pieces Front and Side currently stand (namely a2 and e1 in the current position, but that can change as the current position changes). Thus, not move from (Front|Side) matches a move from some piece other than Front or Side. Finally, the wildcard specifier * means that the filter is repeated 0 or more times. In other words, this constituent has the effect of ignoring all the moves other than those of the pieces we are interested in, Front and Side.

We now have matched (0 moves of) the wildcard constituent, and we reach the position before the rook move:

 ->  move from Side to CriticalSquare
       comment ("Side moves to the critical square: " CriticalSquare)

As before, the comment here is a parameter to the move filter, and due to smart comments is only output when it is part of a successful Turton maneuver. Here, the output is the comment shown to 6.Re2!, namely Side moves to the critical square e2. (Recall that we are assuming CriticalSquare is bound to e2. Otherwise the move filter will not match, and so neither will the line filter, and so a new CriticalSquare value will be tried.)

We then reach the position:

after 6. Re2
(found from CQL file: turton.cql)

which is the same as the earlier diagram showing the position after 4. Re2? in the variation. Thus, White has successfully switched around the rook (Side) and the queen (Front).

The line filter concludes by ignoring any sequence of moves not from from Side or Front as before:

 -->  not move from (Front | Side) *

and reaches the final move of the Turton: the rook moves to e8, winning:

-->move from Side
reaching the position
after 7. Re8+ 1-0
(found from CQL file: turton.cql)

The final constituent of the filter is the point of the Turton: the queen supports the rook through the critical square:

--> xray (Front CriticalSquare Side)