Usage
Installing
You can choose one of two methods to get the Convenient Regular Expressions.
They are described below.
One is with the npm
and the second one is with the <script>
tag.
Install with npm
The package name is con-reg-exp
, so just run:
npm install con-reg-exp
Now, you can import it with the import
keyword:
import cre from "con-reg-exp";
Alternatively, if you are using CommonJS, you can write:
const cre = require("con-reg-exp");
The cre
symbol is available now and you can use it.
The package contains also TypeScript definitions for it.
Use <script>
tag
You can make cre
available globally in your HTML page by using <script>
tag:
<script type="text/javascript" src="con-reg-exp.min.js"></script>
You can download the script from the releases page. You can also download latest version directly.
If you are using TypeScript, the release archive also contains TypeScript definitions. You can import them using triple-slash directive:
/// <reference path="./con-reg-exp.d.ts" />
Using in the code
The only symbol that the module exports is cre
(short for "Convenient Regular Expression").
Use it in the tagged templates:
const myRegExp = cre`... regular expression goes here ...`;
The cre
tag function returns a standard RegExp
object.
CautionThe input string for the
cre
tag function must be trusted. Don't pass unescaped user input to it.This also applies to interpolation except string interpolation.
Flags
After the cre
tag, you can specify one or more flags.
The flags and tag are separated with a dot .
.
For example:
let items = input.split(cre.ignoreCase.unicode`"and"`);
Flag name | RegExp equivalent |
Description |
---|---|---|
indices |
d |
Generate indices. |
global |
g |
Do global search. |
ignoreCase |
i |
Case-insensitive. |
legacy |
opposite of u |
CRE support unicode surrogate pairs by default. This flag brings up old behavior which interprets surrogate pair as two separate characters and also disables \u{} and \p{} . |
unicode |
v |
Upgraded unicode (v-mode). |
sticky |
y |
Sticky search. |
cache |
Use CRE cache, see below. |
You may notice that some RegExp flags are not listed above.
The m
and s
flags are handled automatically by the CRE and you don't need to know
about their status.
NoteThe
unicode
(v
in RegExp) flag is a recent addition to JavaScript's regular expressions, so it may not be available on some older platforms.
Cache
The regular expression can be cached, so once it get translated
into RegExp, the CRE will return copy of same object for the same
input string and interpolated values. It is done using cache
flag that must be the,
for example:
let match = input.match(cre.cache`repeat digit`);
If high performance is your goal, better solution would be to put the expression
in a variable just once and reuse it, for example in global const
.
WarningBe careful with the interpolation and cache. If interpolated values are changing, cache will keep regular expression for each value. Frequently changing values may lead to memory leaks.
Syntax
NoteIf you are new to Convenient Regular Expressions, look at the tutorial first.
Comments
You can use comments inside CRE. They are the same as JavaScript's comments,
but you have to remember to escape `
and ${
since you are in tagged template.
let hasMath = cre`
/* The following expression will match
* a simple math with two integers.
*/
at-least-1 digit; // First integer
repeat whitespace;
[*/+-]; // Operator
repeat whitespace;
at-least-1 digit; // Second integer
`.test(input);
String literal
It string literal matches a string literally.
Its syntax is the same as JavaScript string literals with one exception.
You have to escape backticks `
and interpolation begin sequence ${
.
For example:
let containsWarning = cre.ignoreCase`"warning"`.test(input);
let numberOfLines = input.match(cre`"\n"`).length + 1;
let hasBacktick = cre`"\`"`.test(input);
String literal alias
Some special characters are available with a keyword. You can use it instead of writing it literally in a string.
Value | Aliases |
---|---|
"\n" |
new-line , nl , \n , line-feed , lf |
"\r" |
carriage-return , cr , \r |
"\t" |
tabulation , tab , \t |
"\0" |
null , nul , \0 |
" " |
space , sp |
"\xA0" |
nbsp |
Character class
Character class matches one character in set of characters.
Character class in CRE works the same as
RegExp character class with one exception.
You have to escape backticks `
and interpolation begin sequence ${
.
For example:
let isNotHexString = cre`[^0-9a-fA-F]`.test(input);
Complement of character class can also be expressed using not
keyword.
The example above can be rewritten as follows:
let isNotHexString = cre`not [0-9a-fA-F]`.test(input);
Character class keyword
Some character classes can be expressed with the keyword, for example any
matches
any character.
Keyword | RegExp equivalent | Complement RegExp | Description | Aliases |
---|---|---|---|---|
any |
. with s flag |
[] |
Matches any character | |
digit |
\d |
\D |
Digit | |
white‑space |
\s |
\S |
Whitespace | whitespace |
word‑character |
\w |
\W |
Word characters (A to Z , a to z , 0 to 9 , _ ) |
word‑char |
line‑terminator |
[\r\n\u2028\u2029] |
. without s flag |
Character that terminates the line | line‑term terminator term |
let hasDigit = cre`digit`.test(input);
Character class keyword can also be complemented with the not
operator:
let onlyDigits = input.replace(cre.global`not digit`, '');
Unicode Property
Character class that matches specific unicode property uses property
(alias prop
) keyword followed by actual property enclosed in < >
.
let hasNonEnglishLetters = cre`lookahead not [A-Z], prop<Letter>`.test(input);
You can also specify unicode properties in character class [\p{...}]
.
It is not available if legacy
flag is set.
If the unicode
flag is set, it can be used to match properties of strings, instead of just characters.
Quantifiers
// TODO: Maybe stop using node, but instead maybe simply "expression"?
Quantifiers tells how many times the following node must match.
Those are equivalents of ?
, *
, +
, {n,m}
in RegExp notation.
For example:
let numberLessThan100 = cre`optional [1-9], digit`;
Quantifier is a keyword with flexible syntax. Some parts of it are optional, so it gives you a flexibility in writing quantifiers.
Keyword | RegExp equivalent |
Min | Max | Comment |
---|---|---|---|---|
optional |
? |
0 | 1 | |
at‑least‑1 |
+ |
1 | ∞ | You can skip at‑ and optionally add repeat‑ prefix or ‑times suffix. |
at‑least‑N |
{N,} |
N | ∞ | You can skip at‑ and optionally add repeat‑ prefix or ‑times suffix. |
at‑most‑N |
{0,N} |
0 | N | You can skip at‑ and optionally add repeat‑ prefix or ‑times suffix. |
repeat |
* |
0 | ∞ | |
N‑times |
{N} |
N | N | You can skip ‑times and optionally add repeat‑ prefix. |
N‑to‑M‑times |
{N,M} |
N | M | You can skip ‑times and optionally add repeat‑ prefix. |
Lazy quantifiers
The above "greedy" quantifiers can be prefixed with lazy‑
or non‑greedy‑
.
It will make the quantifier lazy (non-greedy) which is equivalent of adding ?
sign in the standard RegExp.
For example:
let matchQuotedText = cre`["], lazy-repeat any, ["]`;
Flow and grouping
Sequence
// TODO: maybe call it differently?
Literals, character classes, and also other nodes described below can be
separated by comma or semicolon (,
or ;
) to provide sequence of matches. For example:
let hasGrayInAnyForm = cre`"gr", [ae], "y"`.test(input);
The comma and semicolon have the same meaning, but for better readability:
use comma to separate expressions in a single line, for example:
let matchNumber = cre`optional [+-], at-least-1 digit`;
use semicolon to separate expressions between lines by placing it at the end of a line, for example:
let matchNumber = cre` optional [+-]; at-least-1 digit; `;
Empty expression between separators are ignored. Separator after closing bracket or parenthesis is optional, for example:
let addressWithUserName = cre`
optional {
lazy-repeat any;
"@"; // here we have an empty expression which will be ignored
} // no need to for a semicolon
repeat any;
`
Operator or
Use or
operator to specify two or more alternatives.
It makes an alternative of expressions on the left and right.
Multiple or
s can be used to make an alternative of more expressions.
The same example as above, but using or
:
let hasGrayInAnyForm = cre`"gray" or "grey"`.test(input);
All prefix (unary) operators have higher precedence, so or
operator is applied after all
prefixes on both sides. If left or right hand expression contains multiple prefix operators surround it with
the parentheses for better readability. For example:
// This is more readable:
let pattern = cre`(other: lazy-repeat not digit) or (optional digit)`;
// than that:
let pattern = cre`other: lazy-repeat not digit or optional digit`;
// but both generates exactly the same result.
Parentheses ( )
and braces { }
Parentheses ( )
and braces { }
works exactly the same.
They controls the precedence of evaluation.
In theory, you can exchange ( )
and { }
however you want,
but for clarity, you should use ( )
to group nodes in single line, and
{ }
for groups that span multiple lines.
The same example as above, but with parentheses ( )
:
let hasDarkGrayInAnyForm = cre`(optional "dark ", "gray") or (optional "dark ", "grey")`.test(input);
Or, multi-line variant:
let hasDarkGrayInAnyForm = cre`
{
optional "dark ", "gray";
} or {
optional "dark ", "grey"
}
`.test(input);
Operator not
You can place not
operator before some atoms to achieve opposite meaning.
Operand | Matching with not operator |
Examples |
---|---|---|
Any character class | Complement of the class | not [abc] , not digit , not property<Letter> |
One character | Anything other than that character | not \n , not "x" |
Boundary assertion | Negative boundary assertion | not end‑of‑line , not word‑boundary |
Look ahead and look behind assertion | Inverse of that assertion | not look-ahead "x" |
Assertions
Boundary assertion
Checks if the current position is a specific boundary. There are several boundaries:
Keyword | RegExp equivalent | Description | Aliases |
---|---|---|---|
begin‑of‑text |
^ without m flag |
Start of input | start‑of‑text |
end‑of‑text |
$ without m flag |
End of input | |
begin‑of‑line |
^ with m flag |
Start of line or input | start‑of‑line |
end‑of‑line |
$ with m flag |
End of line or input | |
word‑boundary |
\b |
Word boundary |
For example:
let isJPEG = cre.ignoreCase`(".jpeg" or ".jpg"), end-of-text`.test(fileName);
Boundary assertion can be complemented with the not
operator, for example:
let tabInTheMiddleOfLine = cre`not begin-of-line, "\t"`.test(fileName);
Look assertion
This kind of assertion allows you to match given pattern without consuming anything.
Keyword | Description | Alias |
---|---|---|
lookahead |
Matches the following atom against text after current position | look‑ahead |
lookbehind |
Matches the following atom against text before current position | look‑behind |
You can add not
keyword before or after lookahead
and lookbehind
keyword
to get Negated assertion.
Capturing
Named capturing group
The named capturing group looks similar to the labels in JavaScript.
It is a name with colon at the end name:
.
let match = input.match(cre`
user: at-least-1 [a-zA-Z_.-];
"@";
address: at-least-1 [a-zA-Z_.-];
`);
Positional capturing group
NoteFor better readability and maintenance, prefer named groups over positional.
The positional capturing group uses integer as a capturing group name.
The capturing group counts from 1
.
It counts up for every capturing group (both named and positional) as in classic RegExp patterns.
let match = input.match(cre`
1: at-least-1 [a-zA-Z_.-];
"@";
2: at-least-1 [a-zA-Z_.-];
`);
The capturing group numbers are validated, so counting it incorrectly will cause an error.
Backreference
You can match previously captured string using match
operator followed by
capturing group name or number enclosed by < >
.
let hasRepeatingWords = cre`
firstWord: at-least-1 word-char;
at-least-1 not word-char;
match<firstWord>;
`.test(input);
Interpolation
Other values and expressions can be added to the CRE using interpolation. There are tree kinds of interpolations: string interpolation, character class interpolation, expression interpolation.
String interpolation
You can place arbitrary string into the string literal. You don't need to escape it before using, the CRE will handle that for you.
function hasVerbInAnyForm(text, verb) {
return cre.ignoreCase`
word-boundary;
"${verb}" or "${verb}s" or "${verb}es";
word-boundary;
`.test(text);
}
If the interpolated value is CRE, error will be thrown. If it is not a string, it will be converted to string.
Character class interpolation
You can place arbitrary string into the character class. Each character from the string will be added to matching set of that class. You don't need to escape it before using, the CRE will handle that for you.
Characters from interpolated value are added with escaping,
so value a-f
will be interpreted as a\-f
. If this is not your intention,
you must specify each character in the interpolated string - abcdef
.
function isInteger(text, isHex) {
const additionalCharacters = isHex ? "abcdef" : "";
return cre.ignoreCase`
start-of-text;
at-least-1 [0-9${additionalCharacters}];
end-of-text;
`.test(text);
}
If the interpolated value is CRE, error will be thrown. If it is not a string, it will be converted to string.
Expression interpolation
You can reuse CRE expression in another expression.
The input expression must be a CRE expression created with the cre
tag or a string.
There is a slide difference between interpolating CRE expression and a string:
- Interpolated CRE expression becomes an atom, so, for example, adding
repeat
quantifier will affect entire interpolated expression. TheignoreCase
,unicode
andlegacy
flags must be the same in both expressions. JavaScript does not allow to change RegExp flags in the middle of the expression. - Interpolated string is simply placed token by token, so, for example,
adding
repeat
quantifier will affect only the first atom in the interpolated expression.
const number = cre`
optional [+-]; // Sign
{
at-least-1 digit; // Integral part
optional (".", repeat digit);// Optional factional part
or
"."; // Variant with only fractional part
at-least-1 digit;
}
optional { // Optional exponent part
[eE];
optional [+-];
at-least-1 digit;
}
`;
const ws = cre`repeat whitespace`;
function validateJSONArrayOfNumbers(text) {
return cre.cache`
begin-of-text, ${ws}; // Trim leading whitespaces
"[", ${ws}; // Begin of array
optional { // Array can be empty
repeat { // Numbers with trailing comma
${number}, ${ws};
",", ${ws};
}
${number}, ${ws}; // Last number has no comma
}
"]", ${ws};
end-of-text;
`.test(text);
}
If it is possible, try to interpolate CRE expression instead of strings. Use string only when interpolated expression is not a full CRE expression.
For example:
function validateWord(text, allowEmpty) {
let quantifier = allowEmpty ? 'repeat' : 'at-least-1';
return cre.cache`
begin-of-text;
${quantifier} word-character;
end-of-text;
`.test(text);
}