javascript中语言的正则表达式只支持先行断言(lookhead)和先行否定断言。不支持后行断言和后行断言,目前,有一个提案,这个提案是github一位叫做 Gorkem Yakin, Nozomu Katō的大神提出的,并被ES组织录入标准!
以下是原文:
Authors: Gorkem Yakin, Nozomu Katō
Lookarounds are zero-width assertions that match a string without consuming anything. ECMAScript has lookahead assertions that does this in forward direction, but the language is missing a way to do this backward which the lookbehind assertions provide. With lookbehind assertions, one can make sure that a pattern is or isn't preceded by another, e.g. matching a dollar amount without capturing the dollar sign.
There are two versions of lookbehind assertions: positive and negative.
Positive lookbehind assertions are denoted as (?<=...)
and they ensure that the pattern contained within precedes the pattern following the assertion. For example, if one wants to match a dollar amount without capturing the dollar sign, /(?<=\$)\d+(\.\d*)?/
can be used, matching '$10.53'
and returning '10.53'
. This, however, wouldn't match €10.53
.
Negative lookbehind assertions are denoted as (? and, on the other hand, make sure that the pattern within doesn't precede the pattern following the assertion. For example,
/(? wouldn't match
'$10.53'
, but would '€10.53'
.
All regular expression patterns, even unbounded ones, are allowed as part of lookbehind assertions. Therefore, one could, for example, write /(?<=\$\d+\.)\d+/
to match a dollar amount and capture just the fraction part.
Patterns normally match starting from the leftmost sub-pattern and move on to the sub-pattern on the right if the left sub-pattern succeeds. When contained within a lookbehind assertion, the order of matching would be reversed. Patterns would match starting from the rightmost sub-pattern and advance to the left instead. For example, given /(?<=\$\d+\.)\d+/
, the pattern would first find a number and ensure first that it is preceded by .
going backward, then \d+
starting from .
, and lastly $
starting from where \d+
within the assertion begins. The backtracking direction would also be reversed as a result of this.
Unbounded repetitions require a pattern within a lookbehind assertion to be run backward. Running patterns backward would have two side effects regarding capture groups.
Firstly, capture groups within a lookbehind assertion would have different values compared to when they are outside if the groups have sub-patterns with greedy quantifiers. When within a lookbehind assertion, groups on the right would capture most characters rather than the ones on the left. For example, given /(?<=(\d+)(\d+))$/
and the string '1053'
, the second group would capture '053'
and the first group would be '1'
. With /^(\d+)(\d+)/
, on the other hand, the first group would capture '105'
and the second group would be '3'
.
The other side effect is how backreferences are resolved. If a backreference together with its corresponding group is within a lookbehind assertion and the backreference is placed to the right of the group, the backreference would match the empty string given that the group wouldn't have been processed at that point. For example, normally /(a)\1-/
matches 'aa-'
, whereas /^(?<=(a)\1)-/
would match 'a-'
instead.
大概翻译成中文为:
RegExp 外观后行断言
作者:高尔肯·亚金,诺佐穆·卡泰
介绍
"查看"是零宽度断言,它匹配字符串而不消耗任何内容。ECMAScript 具有向前移动的前瞻性断言,但语言缺少一种向后进行查找后断言提供的方法。通过外观后断言,可以确保模式是或前面没有另一个模式,例如,匹配美元金额而不捕获美元符号。
高级 API
有两个版本的看背后断言:正面和负面。
正看后断言表示为 (?=...), 它们确保包含在断言之后的模式之前。例如,如果想要匹配美元金额而不捕获美元符号,可以使用 /(\$$\d\(\.\d*)?/匹配"10.53 美元"和返回"10.53 美元"。然而,这与10.53英镑不相配。
负外观后断言表示为 (?=!...),另一方面,确保 中的模式不位于断言之后的模式前面。例如,/(\!$\d\(?:\。\\d*)/与"10.53美元"不匹配,但会"10.53 英镑"。
所有正则表达式模式(即使是无边界的)都允许作为外观后断言的一部分。因此,例如,可以写入 /(\$=d=}。)\d\/ 匹配美元金额,只捕获分数部分。
模式通常从最左边的子模式开始匹配,如果左侧子模式成功,然后移动到右侧的子模式。当包含在看后断言中时,匹配顺序将反转。模式将匹配从最右边的子模式开始,然后向左推进。例如,给定 /(\$=d=.)\d\/,模式将首先找到一个数字,并首先确保它前面有 。向后移动,然后 [d] 从 开始,最后 $ 从断言中 [d] 开始的地方开始。回溯方向也将因此而反转。
开放式问题
匹配方向
联合国
以下内容引自《ES6标准入门第三版》
JavaScript 语言的正则表达式,只支持先行断言(lookahead)和先行否定断言(negative lookahead),不支持后行断言(lookbehind)和后行否定断言(negative lookbehind)。目前,有一个提案,引入后行断言,V8 引擎4.9版已经支持。
”先行断言“指的是,x只有在y前面才匹配,必须写成/x(?=y)/。比如,只匹配百分号之前的数字,要写成/\d+(?=%)/。”先行否定断言“指的是,x只有不在y前面才匹配,必须写成/x(?!y)/。比如,只匹配不在百分号之前的数字,要写成/\d+(?!%)/。
/\d+(?=%)/.exec('100% of US presidents have been male') // ["100"]
/\d+(?!%)/.exec('that’s all 44 of them') // ["44"]
/\d+(?=%)/.exec('100% of US presidents have been male') // ["100"]
/\d+(?!%)/.exec('that’s all 44 of them') // ["44"]
必须只在%之前的才能完成断言,如果前面是其他字母,数字在更前面,则达不到效果!
alert(/\d+(?=%)/g.exec('67 fgajkfa% of US presidents have been male'));
//null;
上面两个字符串,如果互换正则表达式,就不会得到相同结果。另外,还可以看到,”先行断言“括号之中的部分((?=%)),是不计入返回结果的。
“后行断言”正好与“先行断言”相反,x只有在y后面才匹配,必须写成/(?<=y)x/。比如,只匹配美元符号之后的数字,要写成/(?<=\$)\d+/。”后行否定断言“则与”先行否定断言“相反,x只有不在y后面才匹配,必须写成/(?
/(?<=\$)\d+/.exec('Benjamin Franklin is on the $100 bill') // ["100"]
/(?
上面的例子中,“后行断言”的括号之中的部分((?<=\$)),也是不计入返回结果。
下面的例子是使用后行断言进行字符串替换。
const RE_DOLLAR_PREFIX = /(?<=\$)foo/g;
'$foo %foo foo'.replace(RE_DOLLAR_PREFIX, 'bar');
// '$bar %foo foo'
上面代码中,只有在美元符号后面的foo才会被替换。
“后行断言”的实现,需要先匹配/(?<=y)x/的x,然后再回到左边,匹配y
的部分。这种“先右后左”的执行顺序,与所有其他正则操作相反,导致了一些不符合预期的行为。
首先,”后行断言“的组匹配,与正常情况下结果是不一样的。
/(?<=(\d+)(\d+))$/.exec('1053') // ["", "1", "053"]
/^(\d+)(\d+)$/.exec('1053') // ["1053", "105", "3"]
上面代码中,需要捕捉两个组匹配。没有"后行断言"时,第一个括号是贪婪模式,第二个括号只能捕获一个字符,所以结果是105和3。而"后行断言"时,由于执行顺序是从右到左,第二个括号是贪婪模式,第一个括号只能捕获一个字符,所以结果是1和053。
其次,"后行断言"的反斜杠引用,也与通常的顺序相反,必须放在对应的那个括号之前。
/(?<=(o)d\1)r/.exec('hodor') // null
/(?<=\1d(o))r/.exec('hodor') // ["r", "o"]
上面代码中,如果后行断言的反斜杠引用(\1)放在括号的后面,就不会得到匹配结果,必须放在前面才可以。因为后行断言是先从左到右扫描,发现匹配以后再回过头,从右到左完成反斜杠引用。