Strings

A string is a sequence of characters. A string filter is a filter whose value is a string. A literal string is a string enclosed in quotation marks. For example, "rooks" is a literal string with 5 characters.

As of CQL 6.1, strings are first-class datatypes. They can be assigned to variables, returned as the result of functions, compared, and used as arguments to sort.

Strings can be compared for equality using == and != just like other data types.

If x and y are strings, then x + y is their concatentation:

	"pin" + "mate" == "pinmate"

Strings can be compared using <= , < , >= , > using alphabetical order:

      "The file h1" > "The file H1"
      "" < "a"
      "A" < "a"  

Strings can be stored in variables like integers or sets of squares; they can be the result of another CQL expression; they can passed to functions:

      Y=1
      X= if (Y>0) "check" else "mate"
      X  "check"

table of filters manipulating strings

The following filters handle strings specifically:
NameUseExample
~~regular expression matchingplayer ~~ "Ka.*ov"
\iget a capturing group\2 =="4a"
\-iindex of a capturing group\-2 ==4
#length of a string#"pin"==3
+concatenate strings"x"+"y"=="xy"
asciiconversion from to and from ASCIIascii "A"==65
ascii 65=="A"
currenttransformcurrent transformmessage currenttransform
date
player
event
eventdate
site
eco
specified PGN fieldplayer white == "Kasparov"
player~~"K.*v"
date~~"1943\.0[1-6]"
sort event
originalcommentthe comment in the PGN fileoriginalcomment ~~ "Eval: (\d+)"
dictionarystore and retrieve stringsdictionary D["hi"]="bye"
fenget FEN of current position as a stringY=fen
insubstring"et" in "Reti"
indexofindex of a substringindexof ("n" "pin")==2
intconvert string to intint "23"==23
lowercaseconvert string to lowercaselowercase "Tal"=="tal"
makesquareconvert string to squarea3==makesquare "a3"
max
min
max or min of its argumentsx=max("a" "b")
y=min("a" "b")
readfileread a string from a fileX=readfile "cook.cqo"
settagset value of PGN tagsettag("CustomTag" "Troitzky")
sortsort string filterssort player white
sort by stringsort by a string valuesort date
strconvert arguments to stringstr("X is: " X)
tagget a PGN tag valuetag "CustomTag"=="value"
uppercaseconvert string to uppercaseuppercase "Tal"=="TAL"
writefilewrite a string to a filewritefile("cook.cqo" X)

Predefined strings

There are special predefined strings: \n is string consisting of the linefeed character; \t is the tab character; \" is the quote character; \r is the carriage return character; \\ is the backslash character. Note that these predefined strings are not specially interpreted inside quoted strings (although they may be specially interpreted when used inside regular expressions with ~~ ):
      message ("The value of x is: " \n x)
      y = "pin" + \n + "mate"
      #("pin" + \n)  4
      #"pin\n"  5
      "pin\n"[3]== \\
      "pin"[4]=="n"

These predefined strings are treated the same as quoted strings and are considered to be string literals.

Capturing groups

If i is a literal nonegative integer, then \i can refer to the i'th capturing group after a ~~ operation; \0 refers to the entire matched sequence of characters. See capturing groups in ~~ for more information. For example
    "hello23"~~"ello(\d+)"
    \0  "ello23"
    \1  23

indexing into strings

Strings are zero-indexed: the first character is at character position 0. Suppose i is a non-negative integer and x is a string.

If i < #x then x[i] is the character (that is, the length-1 string) at index i. If i >= #x then x [i] fails to match the position.

If i is negative, then it is first converted into #x + i and then the above rules are used. Thus, x[-1] is the last character of x (or it fails to match if x has length 0). Similarly, x[-2] is the next-to-last character of x unless x has fewer than 2 characters, in which case it fails to match.

More formally, an expression of the form x[i] for a string x simply matches if x and i each match the current position and i is nonnegative and less than #x.

An expression x[i] matches the current position whenever either x[i] simply matches or x[#x+i] simply matches.

An expression of the form x[m : n] matches the position whenever x, m and n match the position.

 "hello"[0]=="h"
 "hello"[4]=="o"
 "hello"[-1]=="o"
 "hello"[-2]=="l"
 "hello"[5] // false; does not match position
 "hello"[-100] // false; does not match
 ("hello" + "goodbye")[5]=="g"
 ("hello" + "goodbye")[#"hello"+3]=="d"
 "filename.cql"[-4:] == ".cql"

If m and n are nonnegative integers, and x is a string then

x[m:n]
is the string consisting of those characters of x whose indices lie between m and n-1 inclusive .

If m is missing, it is taken to be 0. If n is missing, it is taken to be #x. If either m or n is negative, it is replaced non-recursively by #x + m; likewise with n. Thus, x[:5] are the first 5 characters of x :

 "mate"[0:2] == "ma"
 "mate"[1:2] == "a"
 "mate"[1:100]== "ate"
 "mate"[1:1]== ""
 "mate" [1:-1]== "at" 
 "mate" [-2:-1] == "t"
 "mate" [2:1]== ""

Assignment of strings

Strings can be assigned just like numbers:
  x="a"
  x  "a"
  x+="b"
  x  "ab"

Indexed strings (when the string being indexed is a variable) can also be assigned.

  x="a" // x is "a"
  x[0]="b" // x is now "b"
  x[0]="hello" // x is now "hello"
  x[-2]="c" // x is "helco"
  x[5]="z" // expression fails to match; x is still "helco"

String ranges (when the index expression contains :) can similarly be assigned, and can be used to prepend or append to strings:

  x="a" // x is "a"
  x[0:0]="b" // x is "ba"
  x[2:2]="This" // x is "bahis"
  x[-3:-1]="HEY" // x is "baHEYs"
  x[2:4]="Z" // x is "baZYs"
  x[:2]="VV" // x is "VVZYs"
  x[2:]="" // x is "VV"

Performance notes when dealing with long strings

CQL is not particularly efficient when dealing with long strings, and does not generally support strings of more than a billion characters at all. CQL 6.1 sometimes makes unnecessary copies of string subexpressions, which can hurt performance when dealing with long strings. To avoid extra copies, use += rather than + for appending to a variable, and in general try to keep strings in variables.

(CQL can manipulate strings longer than a billion characters so long as the length of the string is never evaluated and the string is never indexed into; this technique, however, is not supported. For example, a multigigabyte file can be read using readfile and then each line parsed using

    BigFile=readfile "bigfile.pgn"
    while (BigFile~~.*){
      Line=\0 ...}
)

The warnings in this section apply only to long strings either generated in a CQL loop or read from readfile. For the kinds of strings typically found in pgn files - comments, tag values and so on, the issues discussed in this section do not arise.

Acknowledgment

Most of the string features, the scheme for integrating them into CQL 6.1 while maintaining back-compatibility with CQL 6.0, and considerable implementation assistance, are due to Robert Gamble.