PCRE
A regular expression, regex or regexp is a sequence of characters that
defines a search pattern. Since the 1980s, different syntaxes for
writing regular expressions exist, the two most widely used ones being
the POSIX syntax and the similar but more advanced Perl standard.
TinTin++ supports the Perl standard known as PCRE (Perl Compatible
Regular Expressions).
Regular expressions are an integral part of TinTin++, but keep in mind
that tintin doesn't allow you to use regular expressions directly,
instead it uses a simpler intermediate syntax that still allows more
complex expressions when needed.
Commands that utilize regular expressions are: action, alias, elseif,
gag, grep, highlight, if, kill, local, math, prompt, regexp, replace,
substitute, switch, variable and while. Several other commands use
regular expressions in minor ways. Fortunately the basics are very
easy to learn.
TinTin++ Regular Expression
The following support is available for regular expressions.
^ match start of line.
$ match of end of line.
\ escape one character.
%1-%99 match of any text, stored in the corresponding index.
%0 should be avoided in the regex, contains all matched text.
{ } embed a perl compatible regular expression, matches are stored.
%!{ } embed a perl compatible regular expression, matches are not stored.
[ ] . + | ( ) ? * are treated as normal text unless used within braces.
Keep in mind that { } is replaced with ( ) automatically unless %!{ }
is used.
TinTin++ Description POSIX
%a Match zero or more characters including newlines ([^\0]*?)
%A Match zero or more newlines ([\n]*?)
%c Match zero or more ansi color codes ((?:\e\[[0-9;]*m)*?)
%d Match zero or more digits ([0-9]*?)
%D Match zero or more non-digits ([^0-9]*?)
%i Matches become case insensitive (?i)
%I Matches become case sensitive (default) (?-i)
%s Match zero or more spaces ([\r\n\t ]*?)
%S Match zero or more non-spaces ([^\r\n\t ]*?)
%w Match zero or more word characters ([A-Za-z0-9_]*?)
%W Match zero or more non-word characters ([^A-Za-z0-9_]*?)
%? Match zero or one character (.??)
%. Match one character (.)
%+ Match one or more characters (.+?)
%* Match zero or more characters excluding newlines (.*?)
Ranges
If you want to match 1 digit use %+1d, if you want to match between 3
and 5 spaces use %+3..5s, if you want to match 1 or more word
characters use %+1..w, etc.
Variables
If you use %1 in an action to perform a match the matched string is
stored in the %1 variable which can be used in the action body.
Example: #act {%1 says 'Tickle me'} {tickle %1}
If you use %2 the match is stored in %2, etc. If you use an unnumbered
match like %* or %S the match is stored at the last used index
incremented by one.
Example: #act {%3 says '%*'} {#if {"%4" == "Tickle me"} {tickle %3}}
The maximum variable index is 99. If you begin an action with %* the
match is stored in %1. You should never use %0 in the trigger part of
an action, when used in the body of an action %0 contains all the parts
of the string that were matched.
To prevent a match from being stored use %!*, %!w, etc.
Perl Compatible Regular Expressions
You can embed a PCRE (Perl Compatible Regular Expression) using curley
braces { }, these braces are replaced with parentheses ( ) unless you
use %!{ }.
Or
You can separate alternatives within a PCRE using the | character.
Example: #act {%* raises {his|her|its} eyebrows.} {say 42..}
Brackets
You can group alternatives and ranges within a PCRE using brackets.
Example: #act {%* says 'Who is number {[1-9]}?} {say $number[%2] is number %2}
The example only triggers if someone provides a number between 1 and
9. Any other character will cause the action to not trigger.
Example: #act {%* says 'Set password to {[^0-9]*}$} {say The password must
contain at least one number, not for security reasons, but just to
annoy you.} {4}
When the ^ character is used within brackets it creates an inverse
search, [^0-9] matches every character except for a number between 0
and 9.
Quantification
A quantifier placed after a match specifies how often the match is
allowed to occur.
? repeat zero or one time.
* repeat zero or multiple times.
+ repeat once or multiple times.
{n} repeat exactly n times, n must be a number.
{n,} repeat at least n times, n must be a number.
{n,o} repeat between n and o times, n and o must be a number.
Example: #act {%* says 'Who is number {[1-9][0-9]{0,2}}?} {Say $number[%2] is
number %2}
The example only triggers if someone provides a number between 1 and
999.
Parantheses
TinTin Regular Expressions automatically add parenthesis, for example
%* translates to (.*?) in PCRE unless the %* is found at the start or
end of the line, in which cases it translates to (.*). Paranthesis in
PCRE causes a change in execution priority similar to mathematical
expressions, but parentheses also causes the match to be stored to a
variable.
When nesting multiple sets of parentheses each nest is assigned its
numerical variable in order of appearance.
Example: #act {%* chats '{Mu(ha)+}'} {chat %2ha!}
If someone chats Muha you will chat Muhaha! If someone chats Muhaha
you will chat Muhahaha!
Lazy vs Greedy
By default regex matches are greedy, meaning {.*} will capture as much
text as possible.
Example: #regex {bli bla blo} {^{.*} {.*}$} {#show Arg1=(&1) Arg2=(&2)}
This will display: Arg1=(bli bla) Arg2=(blo)
By appending a ? behind a regex it becomes lazy, meaning {.*?} will
capture as little text as possible.
Example: #regex {bli bla blo} {^{.*?} {.*?}$} {#show Arg1=(&1) Arg2=(&2)}
This will display: Arg1=(bli) Arg2=(bla blo).
Escape Codes
PCRE support the following escape codes.
PCRE Description POSIX
\A Match start of string ^
\b Match word boundaries (^|\r|\n|\t| |$)
\B Match non-word boundaries [^\r\n\t ]
\c Insert control character \c
\d Match digits [0-9]
\D Match non-digits [^0-9]
\e Insert escape character \e
\f Insert form feed character \f
\n Insert line feed character \n
\r Insert carriage return character \r
\s Match spaces [\r\n\t ]
\S Match non-spaces [^\r\n\t ]
\t Insert tab character \t
\w Match letters, numbers, and underscores [A-Za-z0-9_]
\W Match non-letters, numbers, and underscores [^A-Za-z0-9_]
\x Insert hex character \x
\Z Match end of string $
\s matches one space, \s+ matches one or multiple spaces, the use
of {\s+} is required for this sequence to work in tintin, \s by itself will work outside of a set of braces.
Color triggers
To make matching easier text triggers (Actions, Gags, Highlights,
Prompts, and Substitutes) have their color codes stripped. If you
want to create a color trigger you must start the triggers with a ~
(tilde). To make escape codes visible use #config {convert meta} on.
Example: #action {~\e[1;37m%1} {#var roomname %1}
If the room name is the only line on the server in bright white
white color trigger will save the roomname.
This covers the basics. PCRE has more options, most of which are
somewhat obscure, so you'll have to read a PCRE manual for additional
information.
Related: map and path.
|