"pâté".normalize("NFD").replace(/[\u0300-\u036f]/g, "") // "pate"
'lea@verou.me'.match(/(\w+)@/)[1] // "lea"
code.replace(/ {4}/g, '\t'); // fix broken indentation
<input name="zip" pattern="\d{5}" />
Also: text editors, IDEs, command line tools (grep, etc), databases, and more
What we learned
- Regexes match anywhere, unless restricted
- Matches cannot intersect
- Case sensitive by default, use the
i
flag to change
- Unicode characters by
\uXXXX
\u2665 for ♥, emojis, rainbow matches rainbow flag emoji
Flag |
Name |
Purpose |
ES? |
g |
Global |
Get all matches |
ES3 |
i |
Case Insensitive |
Ignore case when matching |
ES3 |
m |
Multiline |
^ and $ match the beginning and end of each line, not of the whole string |
ES3 |
y |
Sticky |
Anchors each match of a regular expression to the end of the previous match. |
ES2015 |
u |
Unicode |
Treat pattern as a sequence of unicode code points |
ES2015 |
s |
DotAll |
. matches newlines as well. |
ES2018? (Stage 3) |
What we learned
- | for alternatives
- Group with parenthses (Useful for alternatives or quantifiers)
- Alternatives can be empty
- Order matters!
(c|b|f|)at|dog, do(|g) for order
Test Challenge
fizz
buzz
fizzbuzz
Fizzbuzz
// Matches everything, including ""
/(fizz|)(buzz|)/
// Correct but inelegant
/(fizzbuzz|fizz|buzz)/
/(fizz(buzz|)|buzz)/
Emojis with woman
👩
👩💼
👪
👨👩👦👦
👩👦👦
👩👧
👩👦
👩👩👦👦
👩👩👦
👩👩👧
💏
👩❤️💋👩
not
👨👧
👨👨👧
👨👨👦
👨❤️👨
Emojis with woman
// Ewww, just stop
/👩|👩💼|👪|👩👦👦|👩👧|👩👦|👩👩👦👦|👩👩👦|👩👩👧|👨👩👦👦|.../
/👩/ // Wrong, doesn’t include 👪 and 💏!
/👩|👪|💏/
/1.5/ show it matches 21056 etc, show two are needed to match emojis
What we learned
- Dot (.) matches anything, except line breaks
- The experimental
s
flag makes it match line breaks
- Escape metacharacters with a backslash
What we learned
- {n} = n times
- {m,n} = at least m times but no more than n times
- {m,} = at least m times
(ab){n}, a{2,5} and caaaaaaaaaat, parentheses always needed for emoji
What we learned
- * = {0,} , + = {1,} , ? = {0,1}
- Careful of accidental zero length matches!
What we learned
- Quantifiers are greedy
- Lazify them by adding a ? after them
Delete last > to show what happens
What we learned
- Brackets = set, range or a combination of both
- Concatenate multiple ranges for a union
- Most metacharacters don’t need escaping in
[]
in Unicode order
Hex color
#abc
, #f00
, #BADA55
, #C0FFEE
Hex color
/#[a-f0-9]{3,6}/i // Wrong!
/#([a-f0-9]{3}){1,2}/i
What we learned
-
\w
= [a-zA-Z0-9_]
,
\d
= [0-9]
,
\s
≈ [\t\r\n ]
- Combine to form more complex character classes
Counting words (roughly)
function wordCount(text){
return text.match(/\w+/g).length;
}
function wordCount(text){
return text.split(/\s+/).length;
}
Number
Without exponent or digit separators
-1
.05
+1000
3.1415926535
42.
Number
/^[-+]?[\d.]+$/ // Too lax
/^[-+]?\d*\.?\d+$/ // False negatives: 5.
/^[-+]?\d*\.?\d*$/ // False positives: ., +., + etc
// Accurate, but is it worth it?
/^[-+]?(\d*\.?\d+|\d+\.)$/
What we learned
^
negates a character class
-
\W
= [^\w]
,
\D
= [^\d]
,
\S
= [^\s]
- Even the dot is a character class:
.
= [^\r\n\u2028\u2029]
- DotAll alternatives:
[^]
, [\S\s]
, [\W\w]
, [\D\d]
Strip HTML
function stripHTML(str){
return str.replace(/<.+?>/g, '');
}
function stripHTML(str){
return str.replace(/<[^>]+>/g, '');
}
Warning: Will fail in edge cases
Credit card numbers
4060 1234 5678 9000
4060-1234-5678-3457
1230123456789123
4.060/123456-78 90-00
not
hello
4060
12345678901234567890
Credit card numbers
// Never do this!
/\d{16}/
// Limited grouping + allows 19 digit numbers
/(\d{4}.?){3}\d{4}/
/(\d\D*){16}/
What we learned
- Syntax:
\p{UNICODE_PROPERTY=VALUE}
. Needs u
flag.
- When Unicode Property is
General_Category
it can be omitted.
- Experimental, proposal in Stage 3, implemented in Chrome & Safari
Script=Cyrillic
What we learned
- Parentheses form capturing groups
- Add
?:
in the begninning to avoid this
What we learned
^
= beginning of string
$
= end of string
- beginning/end of lines, with the
m
flag
ISO 8601 Dates
Just dates, no time or timezone information
2012-12-12
, 1986-06-13
ISO 8601 Dates
/^\d{4}-\d{2}-\d{2}$/.test(str)
/^\d{4}-(0\d|1[0-2])-([0-2]\d|3[01])$/.test(s)
Can it be improved further?
Trimming a string
if (!String.prototype.trim) {
String.prototype.trim = function(){
return this.replace(/^\s+|\s+$/g, '');
}
}
What we learned
- Assertions are zero width
\b
= word boundary = between \w
and \W|^|$
\B
= non-word boundary = between \w
and \w
or \W
and \W
What we learned
(?=a)
= followed by a, which can be any regex
(?!a)
= NOT followed by a
Lookahead hacks
for fun and profit
Intersection A ∩ B
“Password must be 6 letters or longer, and must contain at least one number, one letter, and one symbol.”
password.length > 6
&& /\d/.test(password) // has digit?
&& /[a-z]/i.test(password) // has letter?
&& /\W/.test(password) // has symbol?
// Or, with lookaheads…
/^(?=.*\d)(?=.*[a-z])(?=.*[\W_]).{6,}$/i
Subtraction A – B
“Any integer that’s NOT divisible by 50”
/^(?!\d+[50]0)\d+$/
Negation A̅
“Anything that doesn’t contain "foo"”
/^(?!.*foo).+$/
What we learned
(?<=a)
= preceded by a, which can be any regex
(?<!a)
= NOT preceded by a
- Experimental (Proposal in Stage 3), only supported in V8 behind a flag
What we learned
('|")(\\\1|.)+?\1 for escaped quotes
Regex credit: Steven Levithan
Correctly Nested Parens
()
, (())()
, (()()())
, ((((((()))))))
not ()(
, )
, ((((((()))
Not a regular language
Context Free Grammar