Standard escape is \, but it may be changed with the -d option v: literal escape /: other half of expression ": Begin of literal symbols except '"'. Literal '"' in half-literal mode .: any symbol except newline ,: any symbol, even newline and nul character ': any symbol except nul character : blank, tab, windows-newline (0x0D==^M), and the delete character (0x7F) *: Greedy match previous group zero or more times. When used as the second character of a repetition, this means "at least one" +: Possessive match previous group zero or more times When used as the second character of a repetition, this means that for each length, only one combination is attempted, which makes things faster. ?: non-greedy match previous group zero or more times. When used as the second character of a repetition, this means at most one time 0: null character 0x00. Use z when in selection sequence for null character there, as 0 means 0-0xF there. n: character 0x0A (newline) b: character 0x08 (backspace) r: character 0x0D Note that this program assumes strict unix encoding. In windows/dos encoding, be prepared to use this character at the end of each line (it is contained in '.', so that sould be fine) t: tab character 0x09 e: 0x1b, an escape sequence a: small-case letters A: uppercase letters _: all special symbols that can be part of a valid variable name in the given language, except unicode characters. Usually, only '_' is contained. In C, '$' should be added, and in kubernetes, only '-' should be contained. u: any character that actually CAN be a valid unicode character, so no "0xF8-0xFF", but also no 0xC0 and no 0xC1 U: does some fuss to decide whether a character is larger than 0x7F and IS part of a valid unicode sequence. It looks the adjacent bytes to the left for a starter, then interprets the whole number from there, and checks that the resulting unicode is bigger than 0x7F, smaller than 16777216, and not UTF-16-surrogate. NOT IMPLEMENTED x: start of 1-Byte-hex sequence that denotes a character or parts thereof. the following two characters do not need to be escaped X: start of 1-Byte-hex sequence that denotes a escaped symbol, which subsequently is interpreted as operator. the following two characters do not need to be escaped [: start of match list, to be ended with ] (which must be escaped in half-literal mode). This does not start half-literal mode, so use [""] in non-literal mode (: start of group. Will not cancel half-literal mode, to be ended with ) (which must be escaped in half-literal mode) {: start of count. to be ended with } (which must be escaped in half-literal mode) ^: start of line. Negation when in [] and not escaped with "". -: when in [] and not escaped, this means a range between the previous and the next symbol. Outside of it, it subtracts n characters from the precurring reference, with a curly bracket enclosing the numerical value of n =: insert number. Formula to calculate this number must follow in brackets. When inside curly brackets, these can be omitted, but then, only a formula is allowed $: end of line g: start of whole input space G: end of whole input space z: followed by s or e, it denotes the official beginning or end of the pattern in the search-part, to select the part that actually is replaced. It goes from the first \zs to the last \ze. Without these designators, the found positions may not overlap. With these designators they may, but the first \zs must be after the first \ze of the last expression. In the replace-part, ze means that the next search should continue from there rather than the end of the \ze of the search part, or the end thereof. Note that \ze before the first \zs means that the next search may continue before the last if a replace was undertaken, which may result in an infinite loop, or The system is transparent towards encoding, it will see only single bytes, so note that [äµ] will match ä,µ,ö and ¤, and some utf-8-invalid junk #: means start of a reference. In rex, this needs a simple number coming, followed by an other #. 0 stands for the whole match pattern. Nothing of the contents require escaping. In subs, it refers to references. Also followed by a number that requires no escaping, this refers to the string substituted with that index. It refers to all occurences, in that order. To select a particular occurence, add a '.' followed by the index of it, starting at 0. Negative numbers are counted from the end. Round brackets in these constructs are parsed as numbers. Round brackets in () expressions stand for a calculation, which may contain more '#'-sequences (whose result is interpreted as numbers), allowing nesting of '#'s for that purpose. if a # expression starts with a dot, the absolute index of the group is adressed, starting with 0. Question mark in a '#'-Expression returns only its length, in bytes, as number. if a # expression ends with a dot, the absolute index of that is returned as number. If it ends with .?, it will tell how many absolute indexes it has. If a # expression starts with a '~', it refers to the last match. #~.# will return information what the nth match it was. There also must be some sequence that converts the resulting number into a character. No, we better make '#' the terminator of a '=' sequence. When such a sequence ends resulting in a number, that will give the corresponding substitute. Named variables: in Expressions, the following variable names are defined (Theoretical): cmd commandline parameter. Must be followed by an index LAST the last match, from beginning to end last the last match, from formal beginning to formal end LAst the last match, from beginning to formal end laST the last match, from formal beginning to end LINE the line number, starting from 1 line the line number, starting from 0 tabline the line number of the expression that matches, starting from 0 name the file name of input tabname the file name of the table that matches zS the start of current match. Integer. zE the end of current match zs the start, as set with the zs command ze the start, as set with the ze command Named functions: put them there like operators, separate them with spaces or brackets E: return empty string d: decimal Number, signed D: decimal Number, unsigned //sad, that there are no signed heximal numbers, but with the limited namespace and the very low usage of signs amongst heximal numbers, I decided to leave them out X: Heximal number, unsigned x: lowercase Heximal number, unsigned