The ~~ operator
The~~
binary operator is used to determine whether
a string matches a regular expression.
The left hand side of the ~~
filter is a string filter whose value is
the string to search within, the target
.
The right hand side of the ~~
filter is a quoted regular expression, the pattern
.
The ~~
operator matches the position if the target matches the pattern. For example, each of the following filters match the current position:
"football" ~~ "f" "football" ~~ "f.*l" "football" ~~ "[otba]+ll"
Suppose the player playing White in the current game is Kasparov. Then:
player white ~~ "Kasparov" player white ~~ "K.*ov"
To check if either Kotov or Kasparov is playing white or black, one could use:
flipcolor player white ~~ "K(ot|aspar)ov"
or more simply
player ~~ "K(ot|aspar)ov"
Regexes can be used in this to query the result of any filter returning a string:
event ~~ "Wijk .* Zee" date ~~ "2004\.03\." site ~~ "Bel.*m"
Value of ~~ operator
The value of the ~~ operator is the matched string, that is, the sequence of characters in the target that matched the regular expression:Result= "football" ~~ ".*" Result == "football" Result2= "football" ~~ "otb" Result2 == "otb" Result3 = "football" ~~ "[otba]+" Result3 == "ootba"
Note that a value can be an empty string, which is different from failing to match.
Thus,
RR= "hello" ~~ "z*" //this matches RR=="" // the value of RR is the empty string "hello" ~~ "z+" // this filter fails to match
Group captures
The~~
filter sets the values \0
, \1
, \2
and so on to denote the value of the
regex capturing group, if any. \0
is the matched string. \1
is the first capturing group, and so on:
"football" ~~ "(o+)tba(l+)" \0 == "ootball" \1 == "oo" \2 == "ll"
index of a capturing group
If\i
is a capturing group, then \-i
is the index (zero-based) within the target string at which this capturing group is located:
"football" ~~ "(o+)tba(l+)" \-0 == 1 \-1 == 2 \-2 == 6
For getting the index of a string inside another string more generally, use indexof.
Extracting numbers using ~~
You can use~~
to extract numbers from strings using the int
filter.
For example, suppose you have a string that contains among other things a substring
"Eval: 43" where 43 is any number. You can get that value as follows:
Target= "Blunder: Eval: 43" Target ~~ "Eval: (\d+)" Val = int \1The variable
Val
will have value 43. If Target
had no such matching substring,
the ~~
would not have matched and Val
would not be changed
Using ~~ with while
~~
is treated specially when used as the test of a while
filter (using a syntax borrowed from Perl):
while (lhs ~~ regex) body
Here, lhs
is a string filter; regex
is a quoted string; body
is any filter.
Initially, lhs
will be evaluated to get a string, the target
.
The regular expression regex
will be successively
matched from left to right across the string,
with body
being evaluated after each match.
This kind of while
filter will match any position, unless the lhs
failed to match.
Let's call a "square string" a two-character string denoting a square, like "a4"
.
For example, this function counts the number of square strings in a string:
function CountSquares(Arg){ NumSquares=0 while(Arg~~"[a-h][1-8]") NumSquares+=1 NumSquares //return number of square strings }
We could apply this function to different strings:
CountSquares("No squares")==0 CountSquares("One c6 square")==1 CountSquares("Foura1d3squae8c7")==4
Suppose we wanted to count the number of distinct square strings in a string.
The makesquare
filter can take a single string as an argument and return a square.
If we |
all these squares together and count the number of squares in the result, we will get the number of
distinct squares:
function CountDistinctSquares(Arg){ Squares=~. //the empty setwhile(Arg~~"[a-h][1-8]") Squares = Squares ∪ makesquare \0
#Squares }
Note how \0
above refers to the currently matched string, in this case, the two-character string denoting a single square.
CountDistinctSquares("Two: a2a1a1a2") == 2
For another example of the use of while
with ~~
, see ~~ form of while.
Precedence
The~~
filter has higher precedence than +
:
X="foot" Y="ball" X+ (Y ~~ "tba") == "tba" // false (X+Y) ~~ "tba" == "tba" //true X+Y ~~ "tba" == "tba" //true, same as above
(As usual, we recommend using parentheses or braces to clarify the meaning when in doubt about precedence.)
Matching multiline targets
There are a few special considerations involved in matching multiline strings.
If the target
does not contain the newline character, then ^
matches the beginning of the target and
$
matches the end of the target. If the target
contains the newline character, then on some platforms ^
matches the beginning of the line while $
matches the end of the line. Unfortunately, we do not know when this inconsistency will be fixed.
Note that .
in the pattern never matches a newline. Generally, to match a line of characters in a platform-independent way, one can use something like:
Lines="pin" + \n + "mate" + \n + "1-0" + \n while (Lines~~".*"){ CurrentLine=\0 // Now the variable CurrentLine holds the current line, // without the trailing \n }
Also note that in typical Windows usages end of lines are indicated by the two characters \r
and \n
. (On Linux and Mac, just \n
is used). This is unlikely to cause much confusion in practice, but Windows users should be aware of the issue if parsing multiline strings.
Matching quotation marks characters
To search for a regular expression which contains the character"
, use \x22
:
Target = "Tal said: " + \" + "mate" + \" Target ~~ "\x22mate\x22"In the above example, the string
Target
has the value
Tal said: "mate"
This because, standing alone, the two-character sequence \"
stands for a quotation mark in CQL.
However, that sequence cannot currently be embedded inside a longer string literal. Therefore, the
hexadecimal value of the quotation mark must be used to search for it as a regular expression.
Example
The fen filter documentation shows how to use the~~
filter to parse FEN strings.