python正则表达式(\S+)和 \d+的含义

在jenkins jaoco生成增量覆盖率功能时:

研读的一个python通过python-git获取diff时,碰到了这两个正则符号:(\S+)、(\d+)

https://github.com/raoweijian/jacoco-diff/blob/master/diff_processor.py

下面是找到的符号的说明:

\d+的出处:

https://books.google.com.hk/books?id=T5ncDgAAQBAJ&pg=PA192&lpg=PA192&dq=re.match(%27@@+-%5Cd%2B,%5Cd%2B+%5C%2B(%5Cd%2B),%5Cd%2B+@@%27,+line):&source=bl&ots=FrgzHyhsYi&sig=ACfU3U0Ex9AAolnnuDbQTzqw2mcvUHMkyQ&hl=zh-CN&sa=X&ved=2ahUKEwjgm5u7muLnAhWS7GEKHZLSCDEQ6AEwBHoECAkQAQ#v=snippet&q=%5Cd&f=false

书名:Learning R Programming

python正则表达式(\S+)和 \d+的含义_第1张图片

python正则表达式(\S+)和 \d+的含义_第2张图片

 python正则表达式(\S+)和 \d+的含义_第3张图片

 

下面是(\S+)的说明:

来源:https://www.ntu.edu.sg/home/ehchua/programming/howto/Regexe.html

摘录: 

1.10  Example: Swapping Words using Parenthesized Back-References ^(\S+)\s+(\S+)$ and $2 $1
The ^ and $ match the beginning and ending of the input string, respectively.
The \s (lowercase s) matches a whitespace (blank, tab \t, and newline \r or \n). On the other hand, the \S+ (uppercase S) matches anything that is NOT matched by \s, i.e., non-whitespace. In regex, the uppercase metacharacter denotes the inverse of the lowercase counterpart, for example, \w for word character and \W for non-word character; \d for digit and \D or non-digit.
The above regex matches two words (without white spaces) separated by one or more whitespaces.
Parentheses () have two meanings in regex:
to group sub-expressions, e.g., (abc)*
to provide a so-called back-reference for capturing and extracting matches.
The parentheses in (\S+), called parenthesized back-reference, is used to extract the matched substring from the input string. In this regex, there are two (\S+), match the first two words, separated by one or more whitespaces \s+. The two matched words are extracted from the input string and typically kept in special variables $1 and $2 (or \1 and \2 in Python), respectively.
To swap the two words, you can access the special variables, and print "$2 $1" (via a programming language); or substitute operator "s/(\S+)\s+(\S+)/$2 $1/" (in Perl).
Code Example in Python
Python keeps the parenthesized back references in \1, \2, .... Also, \0 keeps the entire match.

$ python3
>>> re.findall(r'^(\S+)\s+(\S+)$', 'apple orange')
[('apple', 'orange')]     # A list of tuples if the pattern has more than one back references
# Back references are kept in \1, \2, \3, etc.
>>> re.sub(r'^(\S+)\s+(\S+)$', r'\2 \1', 'apple orange')   # Prefix r for raw string which ignores escape
'orange apple'
>>> re.sub(r'^(\S+)\s+(\S+)$', '\\2 \\1', 'apple orange')  # Need to use \\ for \ for regular string
'orange apple'
Code Example in Java
Java keeps the parenthesized back references in $1, $2, ....

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class TestRegexSwapWords {
   public static void main(String[] args) {
      String inputStr = "apple orange";
      String regexStr = "^(\\S+)\\s+(\\S+)$";  // Regex pattern to be matched
      String replacementStr = "$2 $1";         // Replacement pattern with back references

      // Step 1: Allocate a Pattern object to compile a regex
      Pattern pattern = Pattern.compile(regexStr);

      // Step 2: Allocate a Matcher object from the Pattern, and provide the input
      Matcher matcher = pattern.matcher(inputStr);

      // Step 3: Perform the matching and process the matching result
      String outputStr = matcher.replaceFirst(replacementStr); // first match only
      System.out.println(outputStr);   // Output: orange apple
   }
}

 

你可能感兴趣的:(自己的杂记)