Javascript 的正則表達式(Regular Expression, regex)

建立正規式

正則表達式的規則稱作 pattern。在 JavaScript 中可以透過 Regular expression literals 的方式或建構式的方式來建立 regular expressions pattern：

方法一：Regular expression literals

script 載入時即編譯，當 pattern 不會改變時，使用此方式定義 pattern 效能較好。

1
var re = /ab+c/;

方法二：Function Constructor

程式執行過程才會被編譯，效能較差，適合用在 regular expression pattern 可能會改變時使用。

1
2
var re = new RegExp('ab+c');
var myRe = new RegExp('d(b+)d', 'g');

Regular expression literals 效能較好，適合 pattern 不會改變的情況；
Function Constructor 效能較差，適合用在 pattern 可能動態改變的情況。

使用正規式

在 JavaScript 中可以使用正規式的函式包含：

RegExp.prototype.test()：搜尋字串中是否有符合的部分，回傳 true/false。
RegExp.prototype.exec()：以陣列回傳字串中匹配到的部分，否則回傳 null。
String.prototype.match()：以陣列回傳字串中匹配到的部分，否則回傳 null。
String.prototype.replace()：尋找字串中匹配的部分，並取代之。
String.prototype.search()：尋找字串中是否有符合的部分，有的話回傳 index，否則回傳 -1。
String.prototype.split()：在字串根據匹配到的項目拆成陣列。

簡單來說，當你想要看字串是否包含某 pattern 時，使用 test 或 search；
想要更多的資訊（花較多耗效能），則使用 exec 或 match。

特殊字元 (character)

標籤（flag）

1
2
3
regex = /hello/; // 區分大小寫，匹配 "hello", "hello123", "123hello123", "123hello"，但不匹配 "hell0", "Hello"
regex = /hello/i; // 不區分大小寫，匹配 "hello", "HelLo", "123HelLO"
regex = /hello/g; // 全域搜尋

ES 2019 新增 /s 的標籤，過去 . 可以用來匹配除了換行符號以外（\n, \r）的所有字元：

1
2
3
// 過去 . 可以匹配到除了「換行符號」以外的所有字元
console.log(/./.test('\n')); // → false
console.log(/./.test('\r')); // → false

過去雖然可以使用 [\w\W] 來匹配到換行符號，但這不是最好的做法：

1
2
console.log(/[\w\W]/.test('\n')); // → true
console.log(/[\w\W]/.test('\r')); // → true

在 ES 2019 中，只要最後有標記 /s 的標籤，如此 . 將也能夠匹配到換行符號：

1
2
console.log(/./s.test('\n')); // → true
console.log(/./s.test('\r')); // → true

普通字元 `//`

1
2
var regex = /a/;
var regex = /is/;

反斜線 `\`

1
2
3
4
5
6
/* 在「非」特殊字元前面使用反斜線時，表示要把反斜線後的字當作是特殊字元 */
var regex = /\b/; // b 原本不是特殊字元，這個 b 要當成特殊字元

/* 在特殊字元前面使用反斜線時，表示要把反斜線後的字當作是「非」特殊字元 */
var regex = /if\(true/; // ( 原本是特殊字元，但這裡要當成非特殊字元
var regex = /1\+2=3/; // + 原本是特殊字元，但這裡要當成非特殊字元

任意一個字元 `.`

可以用來匹配除了換行符號（\n）以外的所有字元：

1
2
3
var regex = /a.man/; // a*man 都會 match，例如 "acman", "awman", 但 "a\nman" 無法匹配。

var regex = /.a/; // 任何一個字元後加上 a

多個字元 `[]`

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// 小寫 a 或大寫 A
var regex = /[aA]/;

// 匹配所有不是 a 或 A 的字
var regex = /[^aA]/;

// a, e, i, o, u 都會 match
var regex = /[aeiou]/;

// 英文字母
var regex = /[a-z]/; // 所有小寫的字母，從小寫 a 到小寫 z
var regex = /[A-Z]/; // 所有大寫的字母，從大寫 A 到大寫 Z
var regex = /[a-zA-Z]/; // 所有英文字母

// 數字 5 ~ 8
var regex = /[5-8]/;

括號 `()`

套用到所有

1
2
3
var regex = /^a|^the|^an/; // 套用到裡面所有的

var regex = /^(a|the|an)/; // 等同於

不是（除了） `^`

1
2
3
4
5
/* 不是 a 都會 match */
var regex = /[^a]/;

/* 不是數字都會 match */
var regex = /[^0-9]/;

多個字元縮寫

keywords：\d、\w、\s、\b、\D、\W、\S

\d : digit，[0-9]
\w : word，包含英文大小寫、數字、底線，[A-Za-z0-9_]
\s : space，包含 space, tab, form feed, line feed，[\f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]
\D : 不是 digit，等同於 [^\d]
\W : 不是 word，等同於 [^\w]
\S : 不是 space，等同於 [^\s]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
/* 所有 word + e */
var regex = /\we/;

/* 連續兩個任意的數值 */
var regex = /\d\d/;

/* 句子中結尾為 s 的單字 */
var regex = /s\b/;

var regex = /\b[a-z]/g; // 句子中各個單字的第一個字母

其他特殊字元：

\t : tab
\b : word boundary，用來比對單字和單字間的空白，/s\b/ 則會比對句子中最一個字母是 s 的單字

Word boundary \b、\B

透過 \b 可以配對 word boundary，word boundary 指的是一個字元的前後沒有其他任何字元。

要注意 \b 和 [\b] 是不一樣的，[\b] 是用來配對 backspace。

1
2
3
// is 這個單字才會被選到，Th`is` 的 is 不會
let matchedResult = 'This is an apple.'.match(/\bis\b/);
// [ 'is', index: 5, input: 'This is an apple.' ]

相反地，\B 則是 non-word boundary：

Before the first character of the string, if the first character is not a word character.
After the last character of the string, if the last character is not a word character.
Between two word characters
Between two non-word characters
The empty string

1
2
3
4
// 使用 \B 會配對到 This 中的 is

let matchedResult = 'This is an apple.'.match(/\Bis/);
// [ 'is', index: 2, input: 'This is an apple.' ]

出現次數 `* + ? {} {, }`

keywords： * 、+、?、{次數}、{最少次數, 最多次數}

* : 任意次數，等同於{0,}
+ : 至少一次（後面要跟著），等同於 {1,}
? : 零或一次（有或沒有），等同於 {0,1}
{次數}
{最少次數, 最多次數}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
var regex = /abc/; // 找到符合 "abc"

var regex = /ab*c/; // *表示前一個單字可以是 0 個或多個，因此 ac, abc, abbbbc 都符合規則

var regex = /n?a/; // n 可有可無

var regex = /a{2}/; // a 要 2 次，所以會是 a

var regex = /a{2,4}/; // a 介於 2 次到 4 次之間

var regex = /a{2,}/; // 2 次以上的 a 都可以，大括號後面不要有空格

var regex = /(hello){4}/; // 4 次的 hello，hellohellohellohello

var regex = /\d{3}/; // 3 次的數字

開頭與結尾

keywords：^、$

^ 開頭
$ 結尾

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
/* 以 A 開頭的字才會匹配到 */
/^A/gm.test('Abc'); // true
/^A/gm.test('bac'); // false

/* 開頭有 He */
var regex = /^He/;

/* 結尾有 llo */
var regex = /llo$/;

/* 開頭 He 結尾 llo 中間任意字元可以有任意次數 */
var regex = /^He.*llo$/;

或 `|`

1
2
3
4
5
// and 或 android，match 到 `and`roid 就不 match `android`
var regex = /and|android/;

// match 到 android 還是會 match and
var regex = /android|and/;

LookAround Assertions

keywords: x(?=y)、x(?!y)

Lookahead assertions: x(?=y)、x(?!y)
Lookbehind assertions: (?<=y)x、(?<!y)x

Look Ahead

?=：後面需要跟著
?!：後面不能跟著

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// foo(?=bar)，foo 後面要跟著 bar 才會配對到 foo
const regexp = /foo(?=bar)/;
regexp.exec('foo'); // null
regexp.exec('bar'); // null
regexp.exec('foobar'); // [ 'foo', index: 0, input: 'foobar', groups: undefined ]

// foo(?!bar)，foo 後面不能跟著 bar，如此才會配對到 foo
const regexp = /foo(?!bar)/;
regexp.exec('foo'); // [ 'foo', index: 0, input: 'foo', groups: undefined ]
regexp.exec('foo123'); // [ 'foo', index: 0, input: 'foo123', groups: undefined ]
regexp.exec('bar'); // null
regexp.exec('foobar'); // null

Look Behind

?<=：前面需要跟著才會匹配到
?<!：前面不能跟著才會匹配到

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// (?<=foo)bar，當 bar 前面有 foo 時才會配對到 bar
const regexp = /(?<=foo)bar/;
regexp.exec('foo'); // null
regexp.exec('bar'); // null
regexp.exec('foobar'); // [ 'bar', index: 3, input: 'foobar', groups: undefined ]

// (?<!foo)bar，當 bar 前面沒有 foo 時才會配對到 bar
const regexp = /(?<!foo)bar/;
regexp.exec('foo'); // null
regexp.exec('bar'); // [ 'bar', index: 0, input: 'bar', groups: undefined ]
regexp.exec('123bar'); // [ 'bar', index: 3, input: '123bar', groups: undefined ]
regexp.exec('foobar'); // null

❗❗❗ Lookbehind assertions 屬於 ES2018 的語法，須注意相容性。 ❗❗❗

貪婪模式（Greedy Mode）

預設會啟用貪婪模式，如果想要關閉貪婪模式，也就是讓到一匹配到就停止，可以使用在 * 、 + 等後面加上 ?，例如 .*?、.+?。

pattern 筆記

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
export default {
    // 允許數字、英文字
    // 長度4~12
    account: /^[0-9A-Za-z]{4,12}$/,
    
    // 需包含數字、英文字、英文特殊符號
    // 但不得包含符號 \
    // 長度1~24
    password: /^(?=.*\d)(?=.*[a-zA-Z])[a-zA-Z0-9!@#$%^&*()_+{}:"|<>?\-=\[\]'\;,./~`]{1,24}$/,
    
    // 允許數字、英文字、中文字
    // 長度2~8
    nickName: /^[0-9A-Za-z\u4E00-\u9FFF]{2,8}$/,
    
    // 允許數字、英文字、中文字、英文特殊符號、空白，但開頭不得為特殊符號或空白
    realName: /^(?!(-))(?!(\s))(?!(,))[A-Za-z\-\,\s\u4E00-\u9FFF]{0,20}[^(?=(!@#$%^&*()_+{}:"|<>?\-=\[\]'\;,./~\s\d`)))]$/,
    
    // 信箱
    createEmail: /^([\w])([\-\._]?[\w]){0,64}\@([\w])([\-\._]?[\w]){0,64}\.([a-zA-Z]){2,6}$/,
    
    // 前三碼英文字、後三碼數字
    promotionCode: /^[A-Za-z]{3}[0-9]{3}$/,
    
    // 允許數字、英文字、中文字、英文特殊符號、中文特殊符號、空白
    // 長度0~100
    remark: /^[0-9A-Za-z-_\u4E00-\u9FA5+/.*!@#$%&?()=|':;<>,~！@#￥……&*（）——|{}【】‘；：”“'。，、？%\s]{0,100}$/,
}