Strings
A string is a sequence of characters. A string filter is a filter whose value is a string. A literal string is a string enclosed in quotation marks. For example,"rooks"
is a literal string with 5 characters.
As of CQL 6.1, strings are first-class datatypes. They can be assigned to variables, returned as the result of functions, compared, and used as arguments to sort.
Strings can be compared for equality using ==
and !=
just like other data types.
If x
and y
are strings, then x + y
is their concatentation:
"pin" + "mate" == "pinmate"
Strings can be compared using <= , < , >= , >
using alphabetical order:
"The file h1" > "The file H1" "" < "a" "A" < "a"
Strings can be stored in variables like integers or sets of squares; they can be the result of another CQL expression; they can passed to functions:
Y=1
X= if (Y>0) "check" else "mate"
X ≡ "check"
table of filters manipulating strings
The following filters handle strings specifically:Name | Use | Example |
---|---|---|
~~ | regular expression matching | player ~~ "Ka.*ov" |
\i | get a capturing group | \2 =="4a" |
\-i | index of a capturing group | \-2 ==4 |
# | length of a string | #"pin"==3 |
+ | concatenate strings | "x"+"y"=="xy" |
ascii | conversion from to and from ASCII | ascii "A"==65 ascii 65=="A" |
currenttransform | current transform | message currenttransform |
date player event eventdate site eco | specified PGN field | player white == "Kasparov" player~~"K.*v" date~~"1943\.0[1-6]" sort event |
originalcomment | the comment in the PGN file | originalcomment ~~ "Eval: (\d+)" |
dictionary | store and retrieve strings | dictionary D["hi"]="bye" |
fen | get FEN of current position as a string | Y=fen |
in | substring | "et" in "Reti" |
indexof | index of a substring | indexof ("n" "pin")==2 |
int | convert string to int | int "23"==23 |
lowercase | convert string to lowercase | lowercase "Tal"=="tal" |
makesquare | convert string to square | a3==makesquare "a3" |
max min | max or min of its arguments | x=max("a" "b") |
readfile | read a string from a file | X=readfile "cook.cqo" |
settag | set value of PGN tag | settag("CustomTag" "Troitzky") |
sort | sort string filters | sort player white |
sort by string | sort by a string value | sort date |
str | convert arguments to string | str("X is: " X) |
tag | get a PGN tag value | tag "CustomTag"=="value" |
uppercase | convert string to uppercase | uppercase "Tal"=="TAL" |
writefile | write a string to a file | writefile("cook.cqo" X) |
Predefined strings
There are special predefined strings:\n
is string consisting of the linefeed character; \t
is the tab character; \"
is the quote character; \r
is the carriage return character; \\
is the backslash character. Note that these predefined strings are not specially interpreted inside quoted strings (although they may be specially interpreted when used inside regular expressions with ~~ ):
message ("The value of x is: " \n x) y = "pin" + \n + "mate" #("pin" + \n) ≡ 4 #"pin\n" ≡ 5 "pin\n"[3]== \\ "pin"[4]=="n"
These predefined strings are treated the same as quoted strings and are considered to be string literals.
Capturing groups
Ifi
is a literal nonegative integer, then \i
can refer to the i
'th capturing group after a ~~
operation; \0
refers to the entire matched sequence of characters. See capturing groups in ~~ for more information. For example
"hello23"~~"ello(\d+)" \0 ≡ "ello23" \1 ≡ 23
indexing into strings
Strings are zero-indexed: the first character is at character position 0. Supposei
is a non-negative integer and x
is a string.
If i < #x
then x[i]
is the character (that is, the length-1 string) at index i
. If i >= #x
then x [i]
fails to match the position.
If i
is negative, then it is first converted into #x + i
and then the above rules are used. Thus, x[-1]
is the last character of x
(or it fails to match if x
has length 0). Similarly, x[-2]
is the next-to-last character of x
unless x
has fewer than 2
characters, in which case it fails to match.
More formally,
an expression of the form x[i]
for a string x
simply matches if x
and i
each match the current position and
i
is nonnegative and less than #x
.
An expression x[i]
matches the current position
whenever either
x[i]
simply matches or
x[#x+i]
simply matches.
An expression of the form x[m : n]
matches the position whenever x
, m
and n
match the position.
"hello"[0]=="h" "hello"[4]=="o" "hello"[-1]=="o" "hello"[-2]=="l" "hello"[5] // false; does not match position "hello"[-100] // false; does not match ("hello" + "goodbye")[5]=="g" ("hello" + "goodbye")[#"hello"+3]=="d" "filename.cql"[-4:] == ".cql"
If m
and n
are nonnegative integers, and x
is a string then
x[m:n]is the string consisting of those characters of
x
whose indices lie between m
and n-1
inclusive .
If m
is missing, it is taken to be 0. If n
is missing, it is taken to be #x
. If either m
or n
is negative, it is replaced non-recursively by #x + m
; likewise with n
. Thus, x[:5]
are the first 5 characters of x
:
"mate"[0:2] == "ma" "mate"[1:2] == "a" "mate"[1:100]== "ate" "mate"[1:1]== "" "mate" [1:-1]== "at" "mate" [-2:-1] == "t" "mate" [2:1]== ""
Assignment of strings
Strings can be assigned just like numbers:x="a" x ≡ "a" x+="b" x ≡ "ab"
Indexed strings (when the string being indexed is a variable) can also be assigned.
x="a" // x is "a" x[0]="b" // x is now "b" x[0]="hello" // x is now "hello" x[-2]="c" // x is "helco" x[5]="z" // expression fails to match; x is still "helco"
String ranges (when the index expression contains :
) can similarly be assigned, and can be used to prepend or append to strings:
x="a" // x is "a" x[0:0]="b" // x is "ba" x[2:2]="This" // x is "bahis" x[-3:-1]="HEY" // x is "baHEYs" x[2:4]="Z" // x is "baZYs" x[:2]="VV" // x is "VVZYs" x[2:]="" // x is "VV"
Performance notes when dealing with long strings
CQL is not particularly efficient when dealing with long strings, and does not generally support strings of more than a billion characters at all. CQL 6.1 sometimes makes unnecessary copies of string subexpressions, which can hurt performance when dealing with long strings. To avoid extra copies, use+=
rather than +
for appending to a variable, and in general try to keep strings in variables.
(CQL can manipulate strings longer than a billion characters so long as the length of the string is never evaluated and the string is never indexed into; this technique, however, is not supported. For example, a multigigabyte file can be read using readfile and then each line parsed using
BigFile=readfile "bigfile.pgn" while (BigFile~~.*){ Line=\0 ...})
The warnings in this section apply only to long strings either generated in a CQL loop or read from readfile. For the kinds of strings typically found in pgn files - comments, tag values and so on, the issues discussed in this section do not arise.