fzyz_sb

Python标准库学习笔记1：文本

1. string---文本常量和模板

作用:包含处理文本的常量和类
Python版本:1.4及以后版本

1.1 函数

capwords():将一个字符串中所有单词的首字母大写

>>> import string
>>> s = 'The quick brown fox jumped over the lazy dog'
>>> string.capwords(s)
'The Quick Brown Fox Jumped Over The Lazy Dog'

1. 使用列表来完成

>>> s
'The quick brown fox jumped over the lazy dog'
>>> " ".join(map(lambda x: x[0].upper() + x[1:], s.split(" ")))
'The Quick Brown Fox Jumped Over The Lazy Dog'

但是如果单词之间存在多个空白字符,则列表完成的代码存在瑕疵.新修改的代码如下:

>>> ss
'The quick brown fox jumped over the lazy   dog'
>>> for index in range(len(ss)):
	if (index == 0 or ss[index] == " ") and index != len(ss) - 1 and ss[index + 1] != " ":
		ss = ss[:index + 1] + ss[index + 1].upper() + ss[index + 2:]

		
>>> ss
'THe Quick Brown Fox Jumped Over The Lazy   Dog'

maketrans():结合translate()方法将一组字符修改为另一组字符,这种做法优于反复调用replace()

>>> import string
>>> leet = string.maketrans('abegiloprstz', '463611092572')
>>> s
'The quick brown fox jumped over the lazy dog'
>>> s.translate(leet)
'Th3 qu1ck 620wn f0x jum93d 0v32 7h3 142y d06'

1. 使用replace()方法反复完成

>>> s
'The quick brown fox jumped over the lazy dog'
>>> subStr = s
>>> length = len('abegiloprstz')
>>> for i in range(0, length):
	subStr = subStr.replace('abegiloprstz'[i], '463611092572'[i])

	
>>> subStr
'Th3 qu1ck 620wn f0x jum93d 0v32 7h3 142y d06'

1.2 模板

使用string.Template拼接时,可以在变量名前面加上前缀$(如$var)来标识变量,或者如果需要与两侧的文本相区分,还可以使用大括号将变量括起(如${var})
一个简单的例子如下:

import string

values = {'var': 'foo'}

#通过string.Template进行转移,需要转义符$
t = string.Template("""
Variable    : $var
Escape      : $$	#$重复两次来完成转义
Variable in text: ${var}iable
""")

print 'TEMPLATE:', t.substitute(values)

#字符串的格式化显示,通过关键字来匹配数据
s = """
Variable    : %(var)s
Escape      : %%	#%重复两次来完成转义
Variable in text: %(var)siable
"""

print 'INTERPOLATION:', s % values

解释器输出:

>>> 
TEMPLATE: 
Variable    : foo
Escape      : $
Variable in text: fooiable

INTERPOLATION: 
Variable    : foo
Escape      : %
Variable in text: fooiable

模板与标准字符拼接有一个重要区别,即 模板不考虑参数类型.值会转换为字符串,再将字符串插入到结果中.这里没有提供格式化选项.
我们可以通过 safe_substitute()方法,避免未能提供模板所需全部参数时可能产生的异常:

import string

values = {'var': 'foo'}

t = string.Template("$var is here but $missing is not provided")

try:
    print 'substitute() :', t.substitute(values)
except KeyError, err:
    print 'ERROR:', str(err)

#如果模板未提供,则保持原值
print 'safe_substitute():', t.safe_substitute(values)

解释器显示如下:

>>> 
substitute() : ERROR: 'missing'
safe_substitute(): foo is here but $missing is not provided

1.3 高级模板

可以修改string.Template的默认语法,为此要调整它在模板体中查找变量名所使用的正则表达式模式.一种简单的做法是修改delimiter和idpattern类属性.

import string

template_text = """
Delimiter : %%
Replatec : %with_underscore
Ignored : %notunderscored
"""

d = {'with_underscore' : 'replaced',
     'notunderscored' : 'not replaced',}

#定界符修改为%
#变量名的格式必须符合'[a-z]+_[a-z]+',即中间必须有下划线_
class MyTemplate(string.Template):
    delimiter = '%'
    idpattern = '[a-z]+_[a-z]+'

t = MyTemplate(template_text)
print 'Modified ID pattern'
print t.safe_substitute(d)

解释器显示如下:

>>> 
Modified ID pattern

Delimiter : %
Replatec : replaced
Ignored : %notunderscored

要完成更复杂的修改,可以覆盖pattern属性,定义一个全新的正则表达式.所提供的模式必须包含4个命名组,分别对应 定界符,命名变量,用大括号括住的变量名,以及不合法的定界符模式

要完成更复杂的修改,可以覆盖pattern属性,定义一个全新的正则表达式.所提供的模式必须包含4个命名组,分别对应定界符,命名变量,用大括号括住的变量名,以及不合法的定界符模式
import re
import string

class MyTemplate(string.Template):
    delimiter = '{{'    #将定界符修改为'{{'
    pattern = r"""
\{\{(?:
(?P<escaped>\{\{)|
(?P<named>[_a-z][_a-z0-9]*)\}\}|
(?P<braced>[_a-z][_a-z0-9]*)\}\}|
(?P<invalid>)
)
"""

t = MyTemplate("""
{{{{
{{var}}
{{foo}}
""")
print 'MATCHES:', t.pattern.findall(t.template)
print 'SUBSTITUTED:', t.safe_substitute(var='123replacement', foo='replacement')

解释器显示如下:

>>> 
MATCHES: [('{{', '', '', ''), ('', 'var', '', ''), ('', 'foo', '', '')]
SUBSTITUTED: 
{{
123replacement
replacement

备注: 不理解pattern的四个参数的使用.

2. textwrap---格式化文本段落

作用:通过调整换行符在段落中出现的位置来格式化文本
Python版本: 2.5及以后版本
需要美观打印时,可以用textwrap模块来格式化要输出的文本.这个模块允许通过编程提供类似段落自动换行或填充特性等功能.

2.1 示例数据

sample_text = """
The textwrap module can be used to format text for output in
situations where pretty-printing is desired. It offers
programmatic functionality similar to the paragraph wrapping
or filling features found in many text editors
"""

存入模块textwrap_example.py中,供后面程序的导入.

2.2 填充数据

通过提供宽度来填充数据

>>> import textwrap
>>> from textwrap_example import sample_text
>>> print textwrap.fill(sample_text, width = 50)
     The textwrap module can be used to format
text for output in     situations where pretty-
printing is desired. It offers     programmatic
functionality similar to the paragraph wrapping
or filling features found in many text editors

结果显示只有第一行有缩进,其余的均没有.

2.3 去除现有缩进

我们可以通过dedent来引入一级缩进:

>>> print textwrap.dedent(sample_text)

The textwrap module can be used to format text for output in
situations where pretty-printing is desired. It offers
programmatic functionality similar to the paragraph wrapping
or filling features found in many text editors

2.4 结合dedent和fill

我们可以通过dedent达到缩进,而通过fill来填充空格:

>>> dedented_text = textwrap.dedent(sample_text).strip()
>>> for width in [45, 70]:
	print '%d Columns:\n' % width
	print textwrap.fill(dedented_text, width=width)
	print

	
45 Columns:

The textwrap module can be used to format
text for output in situations where pretty-
printing is desired. It offers programmatic
functionality similar to the paragraph
wrapping or filling features found in many
text editors

70 Columns:

The textwrap module can be used to format text for output in
situations where pretty-printing is desired. It offers programmatic
functionality similar to the paragraph wrapping or filling features
found in many text editors

2.5 悬挂缩进

更好的情况是:第一行保持缩进,用于区别后面各行

>>> dedented_text = textwrap.dedent(sample_text).strip()
>>> print textwrap.fill(dedented_text, initial_indent='', subsequent_indent=' ' * 4, width = 50,)
The textwrap module can be used to format text for
    output in situations where pretty-printing is
    desired. It offers programmatic functionality
    similar to the paragraph wrapping or filling
    features found in many text editors

3. re---正则表达式

3.1 查找文本中的模式

search()函数取模式和要扫描的文本作为输入,找到则返回一个Match对象,否则返回None.
而每个Match对象包含有关匹配性质的信息,包括原输入字符串,使用的正则表达式,以及模式在原字符串中出现的位置:

>>> import re
>>> pattern = 'this'
>>> text = 'Does this text match the pattern?'
>>> match = re.search(pattern, text)
>>> dir(match)
['__class__', '__copy__', '__deepcopy__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'end', 'endpos', 'expand', 'group', 'groupdict', 'groups', 'lastgroup', 'lastindex', 'pos', 're', 'regs', 'span', 'start', 'string']
>>> match.string
'Does this text match the pattern?'
>>> match.start
<built-in method start of _sre.SRE_Match object at 0x0000000002A96648>
>>> match.start()
5
>>> match.re
<_sre.SRE_Pattern object at 0x0000000002A9E258>
>>> match.re()

Traceback (most recent call last):
  File "<pyshell#24>", line 1, in <module>
    match.re()
TypeError: '_sre.SRE_Pattern' object is not callable
>>> match.re.pattern
'this'

备注:使用dir()和help()函数来查看各个对象的功能,很重要.

3.2 编译表达式

如果表达式经常被使用,编译这些表达式会更加高效.compile()函数会把一个表达式字符串转换为一个RegexObject

import re

#预编译模式
regexes = [re.compile(p) for p in ['this', 'that']]

text = 'Does this text match the pattern'

print 'Text: %r\n' % text

for regex in regexes:
    print 'Seeking "%s" ->' % regex.pattern,

    if regex.search(text):
        print 'match'
    else:
        print 'no match'

解释器显示如下:

>>> 
Text: 'Does this text match the pattern'

Seeking "this" -> match
Seeking "that" -> no match
>>> type(regexes)
<type 'list'>
>>> regexes
[<_sre.SRE_Pattern object at 0x0000000002BAE0E8>, <_sre.SRE_Pattern object at 0x0000000002BAE258>]

3.3 多重匹配

findall()函数会返回输入中与模式匹配而不重叠的所有字串

import re

text = 'abbaaabbbbaaaaa'

pattern = 'ab'

for match in re.findall(pattern, text):
    print 'Found "%s"' % match
#这里re.finditer(pattern, text)只会运行一次,所以match才会递归显示每一项(for在Python中的语法)
for match in re.finditer(pattern, text):
    s = match.start()
    e = match.end()
    print 'Found "%s" at %d:%d' % (text[s:e], s, e)

解释器显示如下:

>>> 
Found "ab"
Found "ab"
Found "ab" at 0:2
Found "ab" at 5:7

3.4 模式语法

正则表达式支持更强大的模式,而不只是简单的字面量文本字符串.模式可以重复,可以锚定到输入中不同的逻辑位置,还可以采用紧凑形式表示而不需要在模式中提供每一个字面量字符.使用所有这些特性时,需要结合字面量文本值和元字符,元字符是re实现的正则表达式模式语法的一部分.

import re

def test_patterns(text, patterns=[]):
    for pattern, desc in patterns:
        print 'Pattern %r (%s)\n' % (pattern, desc)
        print '     %r' % text
        for match in re.finditer(pattern, text):
            s = match.start()
            e = match.end()
            substr = text[s:e]
            n_backslashes = text[:s].count('\\')
            prefix = '.' * (s + n_backslashes)
            print '     %s%r|' % (prefix, substr),
        print
    return

if __name__ == "__main__":
    test_patterns('abbaaabbbbaaaaa',
                  [('ab', "'a' followed by 'b'"),])

存储在文件re_test_patterns.py中.

重复

模式中有五种表达重复的方式.如果模式后面跟元字符*,这个模式会重复0次或多次.如果为+,则至少重复1次.为?则重复0或1次.{m}特定重复m次.{m,n}则至少重复m次,最大重复n次.{m,}则至少重复m次,无上限.

from re_test_patterns import test_patterns

test_patterns(
    'abbaabbba',
    [('ab*',    'a followed by zero or more b'),
     ('ab+',    'a followed by one or more b'),
     ('ab?',    'a followed by zero or one b'),
     ('ab{3}',  'a followed by three b'),
     ('ab{2,3}',   'a followed by two to three b'),
     ])

解释器显示如下:

>>> 
Pattern 'ab*' (a followed by zero or more b)

     'abbaabbba'
     'abb'|      ...'a'|      ....'abbb'|      ........'a'|
Pattern 'ab+' (a followed by one or more b)

     'abbaabbba'
     'abb'|      ....'abbb'|
Pattern 'ab?' (a followed by zero or one b)

     'abbaabbba'
     'ab'|      ...'a'|      ....'ab'|      ........'a'|
Pattern 'ab{3}' (a followed by three b)

     'abbaabbba'
     ....'abbb'|
Pattern 'ab{2,3}' (a followed by two to three b)

     'abbaabbba'
     'abb'|      ....'abbb'|

正常情况下,处理重复指令时, re匹配模式时会利用尽可能多的输入.这种所谓"贪心"的行为可能导致单个匹配减少,或者匹配中包含了多于原先预计的输入文本.在重复指令后面加上 "?"可以关闭这种贪心行为:

from re_test_patterns import test_patterns

test_patterns(
    'abbaabbba',
    [('ab*?',    'a followed by zero or more b'),
     ('ab+?',    'a followed by one or more b'),
     ('ab??',    'a followed by zero or one b'),
     ('ab{3}?',  'a followed by three b'),
     ('ab{2,3}?',   'a followed by two to three b'),
     ])

解释器显示如下:

>>> 
Pattern 'ab*?' (a followed by zero or more b)

     'abbaabbba'
     'a'|      ...'a'|      ....'a'|      ........'a'|
Pattern 'ab+?' (a followed by one or more b)

     'abbaabbba'
     'ab'|      ....'ab'|
Pattern 'ab??' (a followed by zero or one b)

     'abbaabbba'
     'a'|      ...'a'|      ....'a'|      ........'a'|
Pattern 'ab{3}?' (a followed by three b)

     'abbaabbba'
     ....'abbb'|
Pattern 'ab{2,3}?' (a followed by two to three b)

     'abbaabbba'
     'abb'|      ....'abb'|

字符集

字符集是一组字符,包含可以与模式中相应位置匹配的所有字符.例如[ab]可以匹配a或b:

from re_test_patterns import test_patterns

test_patterns(
    'abbaabbba',
    [('[ab]', 'either a or b'),
     ('a[ab]+', 'a followed by 1 or more a or b'),
     ('a[ab]+?', 'a followed by 1 or more a or b, not greedy'),
     ])

解释器显示如下:(注意贪心算法)

>>> 
Pattern '[ab]' (either a or b)

     'abbaabbba'
     'a'|      .'b'|      ..'b'|      ...'a'|      ....'a'|      .....'b'|      ......'b'|      .......'b'|      ........'a'|
Pattern 'a[ab]+' (a followed by 1 or more a or b)

     'abbaabbba'
     'abbaabbba'|
Pattern 'a[ab]+?' (a followed by 1 or more a or b, not greedy)

     'abbaabbba'
     'ab'|      ...'aa'|

字符集还可以用来排除某些特定字符.尖字符(^)表示要查找未在随后的字符集中出现的字符.

from re_test_patterns import test_patterns

test_patterns(
    'This is some text -- with punctuation',
	#找到不包含字符"-","."或空格的所有字符串
    [('[^-. ]+', 'sequences without -, ., or space'),
     ])

解释器显示如下:

>>> 
Pattern '[^-. ]+' (sequences without -, ., or space)

     'This is some text -- with punctuation'
     'This'|      .....'is'|      ........'some'|      .............'text'|      .....................'with'|      ..........................'punctuation'|

利用字符区间来定义一个字符集,其中包括一个起点和一个终点之间所有连续的字符:

from re_test_patterns import test_patterns

test_patterns(
    'This is some text -- with punctuation',
    [('[a-z]+', 'sequences of lowercase letters'),
     ('[A-Z]+', 'sequences of uppercase letters'),
     ('[a-zA-Z]+', 'sequences of lowercase or uppercase letters'),
     ('[A-Z][a-z]+', 'one uppercase followed by lowercase'),
     ])

解释器显示如下:

>>> 
Pattern '[a-z]+' (sequences of lowercase letters)

     'This is some text -- with punctuation'
     .'his'|      .....'is'|      ........'some'|      .............'text'|      .....................'with'|      ..........................'punctuation'|
Pattern '[A-Z]+' (sequences of uppercase letters)

     'This is some text -- with punctuation'
     'T'|
Pattern '[a-zA-Z]+' (sequences of lowercase or uppercase letters)

     'This is some text -- with punctuation'
     'This'|      .....'is'|      ........'some'|      .............'text'|      .....................'with'|      ..........................'punctuation'|
Pattern '[A-Z][a-z]+' (one uppercase followed by lowercase)

     'This is some text -- with punctuation'
     'This'|

作为字符集的一种特殊情况,元字符"."指模式应当匹配该位置的任何单字符.

from re_test_patterns import test_patterns

test_patterns(
    'abbaabbba',
    [('a.', 'a followed by any one character'),
     ('b.', 'b followed by any one character'),
     ('a.*b', 'a followed by anything, ending in b'),
     ('a.*?b', 'a followed by anything, ending in b'),
     ])

解释器显示如下:

>>> 
Pattern 'a.' (a followed by any one character)

     'abbaabbba'
     'ab'|      ...'aa'|
Pattern 'b.' (b followed by any one character)

     'abbaabbba'
     .'bb'|      .....'bb'|      .......'ba'|
Pattern 'a.*b' (a followed by anything, ending in b)

     'abbaabbba'
     'abbaabbb'|
Pattern 'a.*?b' (a followed by anything, ending in b)

     'abbaabbba'
     'ab'|      ...'aab'|

转义码

re可以识别的转义码如下:

转义码	含义
\d	一个数字
\D	一个非数字
\s	空白符(制表符,空格,换行符等)
\S	非空白符
\w	字母数字
\W	非字母数字

from re_test_patterns import test_patterns

test_patterns(
    'A prime #1 example!',
    [(r'\d+', 'sequence of digits'),
     (r'\D+', 'sequence of nondigits'),
     (r'\s+', 'sequence of whitespace'),
     (r'\S+', 'sequence of nonwhitespace'),
     (r'\w+', 'alphanumeric characters'),
     (r'\W+', 'nonalphanumeric')
     ])

解释器显示如下:

>>> 
Pattern '\\d+' (sequence of digits)

     'A prime #1 example!'
     .........'1'|
Pattern '\\D+' (sequence of nondigits)

     'A prime #1 example!'
     'A prime #'|      ..........' example!'|
Pattern '\\s+' (sequence of whitespace)

     'A prime #1 example!'
     .' '|      .......' '|      ..........' '|
Pattern '\\S+' (sequence of nonwhitespace)

     'A prime #1 example!'
     'A'|      ..'prime'|      ........'#1'|      ...........'example!'|
Pattern '\\w+' (alphanumeric characters)

     'A prime #1 example!'
     'A'|      ..'prime'|      .........'1'|      ...........'example'|
Pattern '\\W+' (nonalphanumeric)

     'A prime #1 example!'
     .' '|      .......' #'|      ..........' '|      ..................'!'|

要匹配属于正则表达式语法的字符,需要对搜索模式中的字符进行转义:

from re_test_patterns import test_patterns

test_patterns(
    r'\d+ \D+ \s+',
    [(r'\\.\+', 'escape code'),
     ])

解释器显示如下:

>>> 
Pattern '\\\\.\\+' (escape code)

     '\\d+ \\D+ \\s+'
     '\\d+'|      .....'\\D+'|      ..........'\\s+'|

锚定

可以使用锚定指令指定输入文本中模式应当出现的相对位置.

锚定码	含义
^	字符串或行的开始
$	字符串或行的结束
\A	字符串开始
\Z	字符串结束
\b	一个单词开头或末尾的空串
\B	不在一个单词开头或末尾的空串

from re_test_patterns import test_patterns

test_patterns(
    'This is some text -- with punctuation.',
    [(r'^\w+', 'word at start of string'),
     (r'\A\w+', 'word at start of string'),
     (r'\w+\S*$', 'word near end of string, skip punctuation'),
     (r'\w+\S*\Z', 'word near end of string, skip punctuation'),
     (r'\w*t\w*', 'word containing t'),
     (r'\bt\w+', 't at start of word'),
     (r'\w+t\b', 't at end of word'),
     (r'\Bt\B', 't not start or end of word'),
     ])

解释器显示如下:

>>> 
Pattern '^\\w+' (word at start of string)

     'This is some text -- with punctuation.'
     'This'|
Pattern '\\A\\w+' (word at start of string)

     'This is some text -- with punctuation.'
     'This'|
Pattern '\\w+\\S*$' (word near end of string, skip punctuation)

     'This is some text -- with punctuation.'
     ..........................'punctuation.'|
Pattern '\\w+\\S*\\Z' (word near end of string, skip punctuation)

     'This is some text -- with punctuation.'
     ..........................'punctuation.'|
Pattern '\\w*t\\w*' (word containing t)

     'This is some text -- with punctuation.'
     .............'text'|      .....................'with'|      ..........................'punctuation'|
Pattern '\\bt\\w+' (t at start of word)

     'This is some text -- with punctuation.'
     .............'text'|
Pattern '\\w+t\\b' (t at end of word)

     'This is some text -- with punctuation.'
     .............'text'|
Pattern '\\Bt\\B' (t not start or end of word)

     'This is some text -- with punctuation.'
     .......................'t'|      ..............................'t'|      .................................'t'|

3.5 限制搜索

如果提前已经知道只需搜索整个输入的一个子集,可以告诉re限制搜索范围,从而进一步约束正则表达式匹配.例如,如果模式必须出现在输入的最前面,那么使用match()而不是search()会锚定搜索,而不必在搜索模式中显式的包含一个锚.

>>> import re
>>> text = 'This is some text -- with punctuation.'
>>> pattern = 'is'
>>> m = re.match(pattern, text)
>>> print m
None
>>> s = re.search(pattern, text)
>>> print s
<_sre.SRE_Match object at 0x0000000002C265E0>

已编译正则表达式的search()方法还接受可选的start和end位置参数,将搜索限制在输入的一个子串中:

import re

text = 'This is some text -- with punctuation.'
pattern = re.compile(r'\b\w*is\w*\b')

print 'Text:', text
print

pos = 0
while True:
    match = pattern.search(text, pos)
    if not match:
        break
    s = match.start()
    e = match.end()
    print ' %2d : %2d = "%s"' % (s, e - 1, text[s:e])
    pos = e

解释器显示如下:

>>> 
Text: This is some text -- with punctuation.

  0 :  3 = "This"
  5 :  6 = "is"

3.6 用组解析匹配

搜索模式匹配是正则表达式所提供强大功能的基础.为模式增加组(group)可以隔离匹配文本的各个部分.通过小括号("("和")")来分组:

from re_test_patterns import test_patterns

test_patterns(
    'abbaaabbbbaaaaa',
    [('a(ab)', 'a followed by literal ab'),
     ('a(a*b*)', 'a followed by 0-n a and 0-n b'),
     ('a(ab)*', 'a followed by 0-n ab'),
     ('a(ab)+', 'a followed by 1-n ab'),
    ])

解释器显示如下:

>>> 
Pattern 'a(ab)' (a followed by literal ab)

     'abbaaabbbbaaaaa'
     ....'aab'|
Pattern 'a(a*b*)' (a followed by 0-n a and 0-n b)

     'abbaaabbbbaaaaa'
     'abb'|      ...'aaabbbb'|      ..........'aaaaa'|
Pattern 'a(ab)*' (a followed by 0-n ab)

     'abbaaabbbbaaaaa'
     'a'|      ...'a'|      ....'aab'|      ..........'a'|      ...........'a'|      ............'a'|      .............'a'|      ..............'a'|
Pattern 'a(ab)+' (a followed by 1-n ab)

     'abbaaabbbbaaaaa'
     ....'aab'|

要访问一个模式中单个组所匹配的子串,可以使用Match对象的group()方法:

import re

text = 'This is some text -- with punctuation.'

print text
print

patterns = [
    (r'^(\w+)', 'word at start of string'),
    (r'(\w+)\S*$', 'word at end, with optional punctuation'),
    (r'(\bt\w+)\W+(\w+)', 'word starting with t, another word'),
    (r'(\w+t)\b', 'word ending with t'),
    ]

for pattern, desc in patterns:
    regex = re.compile(pattern)
    match = regex.search(text)
    print 'Pattern %r (%s)\n' % (pattern, desc)
    print ' ', match.groups()
print

解释器显示如下:

>>> 
This is some text -- with punctuation.

Pattern '^(\\w+)' (word at start of string)

  ('This',)
Pattern '(\\w+)\\S*$' (word at end, with optional punctuation)

  ('punctuation',)
Pattern '(\\bt\\w+)\\W+(\\w+)' (word starting with t, another word)

  ('text', 'with')
Pattern '(\\w+t)\\b' (word ending with t)

  ('text',)

Python对基本分组语法做了扩展,增加了命名组.通过使用名字来指示组,这样以后就可以更容易的修改模式,而不必同时修改使用了匹配结果的代码.要设置一个组的名字,可以使用以下语法: (?P<name>pattern):

import re

text = 'This is some text -- with punctuation.'

print text
print

patterns = [
    r'^(?P<first_word>\w+)',
    r'(?P<last_word>\w+)\S*$',
    r'(?P<t_word>\bt\w+)\W+(?P<other_word>\w+)',
    r'(?P<ends_with_t>\w+t)\b',
    ]

for pattern in patterns:
    regex = re.compile(pattern)
    match = regex.search(text)
    print 'Matching "%s"' % pattern
    print ' ', match.groups()
    print ' ', match.groupdict()
    print

解释器显示如下:

>>> 
This is some text -- with punctuation.

Matching "^(?P<first_word>\w+)"
  ('This',)
  {'first_word': 'This'}

Matching "(?P<last_word>\w+)\S*$"
  ('punctuation',)
  {'last_word': 'punctuation'}

Matching "(?P<t_word>\bt\w+)\W+(?P<other_word>\w+)"
  ('text', 'with')
  {'other_word': 'with', 't_word': 'text'}

Matching "(?P<ends_with_t>\w+t)\b"
  ('text',)
  {'ends_with_t': 'text'}

备注: 使用 groupdict()可以获取一个字典,它将组名映射到匹配的子串. groups()返回的有序序列还包含命名模式.
所以,我们可以更新test_patterns(),它会显示与一个模式匹配的编号组和命名组:

import re

def test_patterns(text, patterns=[]):
    for pattern, desc in patterns:
        print 'Pattern %r (%s)\n' % (pattern, desc)
        print '     %r' % text
        for match in re.finditer(pattern, text):
            s = match.start()
            e = match.end()
            prefix = ' ' * (s)
            print ' %s%r%s ' % (prefix, text[s:e], ' ' * (len(text) - e)),
            print match.groups()
            if match.groupdict():
                print '%s%s' % (' ' * (len(text) - s), match.groupdict())
        print
    return

if __name__ == "__main__":
    test_patterns('abbaabbba',
                  [(r'a((a*)(b*))', "'a' followed by 0-n a and 0-n b"),])

解释器显示如下:

>>> 
Pattern 'a((a*)(b*))' ('a' followed by 0-n a and 0-n b)

     'abbaabbba'
 'abb'        ('bb', '', 'bb')
    'aabbb'   ('abbb', 'a', 'bbb')
         'a'  ('', '', '')

组对于指定候选模式也很有用.可以使用管道符号(|)指示应当匹配某一个或另一个模式:

from re_test_patterns import test_patterns

test_patterns(
    'abbaabbba',
    [(r'a((a+)|(b+))', 'a then seq. of a or seq. of b'),
     (r'a((a|b)+)', 'a then seq. of [ab]'),
     ])

解释器显示如下:

>>> 
Pattern 'a((a+)|(b+))' (a then seq. of a or seq. of b)

     'abbaabbba'
 'abb'        ('bb', None, 'bb')
    'aa'      ('a', 'a', None)

Pattern 'a((a|b)+)' (a then seq. of [ab])

     'abbaabbba'
 'abbaabbba'  ('bbaabbba', 'a')

如果匹配子模式的字符串并不是从整个文本抽取的一部分,此时定义一个包含子模式的组也很有用.这些组称为"非捕获组".非捕获组可以用来描述重复模式或候选模式,而不再返回值中区分字符串的匹配部分.要创建一个非捕获组,可以使用语法(?:pattern)

from re_test_patterns import test_patterns

test_patterns(
    'abbaabbba',
    [(r'a((a+)|(b+))', 'capturing form'),
     (r'a((?:a+)|(?:b+))', 'noncapturing'),
     ])

解释器显示如下:

>>> 
Pattern 'a((a+)|(b+))' (capturing form)

     'abbaabbba'
 'abb'        ('bb', None, 'bb')
    'aa'      ('a', 'a', None)

Pattern 'a((?:a+)|(?:b+))' (noncapturing)

     'abbaabbba'
 'abb'        ('bb',)
    'aa'      ('a',)

3.7 搜索选项

利用选项标志可以改变匹配引擎处理表达式的方式.可以使用OR操作结合这些标志,然后传递至compile(),search(),match()以及其他接受匹配模式完成搜索的函数

不区分大小写的匹配

IGNORECASE使模式中的字面量字符和字符区间与大小写字符都匹配.

import re

text = 'This is some text -- with punctuation.'
pattern = r'\bT\w+'
with_case = re.compile(pattern)
without_case = re.compile(pattern, re.IGNORECASE)

print 'Text:\n  %r' % text
print 'Pattern:\n   %s' % pattern
print 'Case-sensitive:'
for match in with_case.findall(text):
    print ' %r' % match
print 'Case-insensitive:'
for match in without_case.findall(text):
    print ' %r' % match

解释器显示如下:

>>> 
Text:
  'This is some text -- with punctuation.'
Pattern:
   \bT\w+
Case-sensitive:
 'This'
Case-insensitive:
 'This'
 'text'

多行输入

有两个标志会影响如何在多行输入中进行搜索:MULTILINE和DOTALL.MULTILINE标志会控制模式匹配代码如何对包含换行符的文本处理锚定指令.当打开多行模式时,除了整个字符串外,还要在每一行的开头和结尾应用^和$的锚定规则:

import re

text = 'This is some text -- with punctuation.\nA second line.'
pattern = r'(^\w+)|(\w+\S*$)'
single_line = re.compile(pattern)
multiline = re.compile(pattern, re.MULTILINE)

print 'Text:\n  %r' % text
print 'Pattern:\n   %s' % pattern
print 'Single Line:'
for match in single_line.findall(text):
    print ' %r' % (match,)
print 'Multiline    :'
for match in multiline.findall(text):
    print ' %r' % (match,)

解释器显示如下:

>>> 
Text:
  'This is some text -- with punctuation.\nA second line.'
Pattern:
   (^\w+)|(\w+\S*$)
Single Line:
 ('This', '')
 ('', 'line.')
Multiline    :
 ('This', '')
 ('', 'punctuation.')
 ('A', '')
 ('', 'line.')

DOTALL也是一个与多行文本有关的标志.正常情况下,点字符(.)可以与输入文本中除了换行符之外的所有其他字符匹配.这个标志则允许点字符还可以匹配换行符.

import re

text = 'This is some text -- with punctuation.\nA second line.'
pattern = r'.+'
no_newlines = re.compile(pattern)
dotall = re.compile(pattern, re.DOTALL)

print 'Text:\n  %r' % text
print 'Pattern:\n   %s' % pattern
print 'No newlines:'
for match in no_newlines.findall(text):
    print ' %r' % (match,)
print 'Multiline    :'
for match in dotall.findall(text):
    print ' %r' % (match,)

解释器显示如下:

>>> 
Text:
  'This is some text -- with punctuation.\nA second line.'
Pattern:
   .+
No newlines:
 'This is some text -- with punctuation.'
 'A second line.'
Multiline    :
 'This is some text -- with punctuation.\nA second line.'

详细表达式语法

详细表达式语法:允许在模式中嵌入注释和额外的空白符

import re

address = re.compile(
    '''
    [\w\d.+-]+  #username
    @
    ([\w\d.]+\.)+   #domain name prefix
    (com|org|edu)
''',
    re.UNICODE | re.VERBOSE)

candidates = [
    u'[email protected]',
    u'[email protected]',
    u'[email protected]',
    u'[email protected]'
    ]

for candidate in candidates:
    match = address.search(candidate)
    print '%-30s  %s' % (candidate, 'Matches' if match else 'No match')

解释器显示如下:

>>> 
[email protected]          Matches
[email protected]   Matches
[email protected]  Matches
[email protected]           No match

则我们可以扩展此版本:解析包含人名和Email地址的输入.

import re

address = re.compile(
    '''
    ((?P<name>
    ([\w.,]+\s+)*[\w.,]+)
    \s*
    <
    )?
    (?P<email>
    [\w\d.+-]+  #username
    @
    ([\w\d.]+\.)+   #domain name prefix
    (com|org|edu)
    )
    >?
''',
    re.UNICODE | re.VERBOSE)

candidates = [
    u'[email protected]',
    u'[email protected]',
    u'[email protected]',
    u'[email protected]'
    u'First Last <[email protected]>',
    u'No Brackets [email protected]',
    u'First Last',
    u'First Middle Last <[email protected]>',
    u'First M. Last <[email protected]>',
    u'<[email protected]>',
    ]

for candidate in candidates:
    print 'Candidate:', candidate
    match = address.search(candidate)
    if match:
        print ' Name :', match.groupdict()['name']
        print ' Email:', match.groupdict()['email']
    else:
        print ' No match'

解释器显示如下:

>>> 
Candidate: [email protected]
 Name : None
 Email: [email protected]
Candidate: [email protected]
 Name : None
 Email: [email protected]
Candidate: [email protected]
 Name : None
 Email: [email protected]
Candidate: [email protected] Last <[email protected]>
 Name : example.fooFirst Last
 Email: [email protected]
Candidate: No Brackets [email protected]
 Name : None
 Email: [email protected]
Candidate: First Last
 No match
Candidate: First Middle Last <[email protected]>
 Name : First Middle Last
 Email: [email protected]
Candidate: First M. Last <[email protected]>
 Name : First M. Last
 Email: [email protected]
Candidate: <[email protected]>
 Name : None
 Email: [email protected]

在模式中嵌入标志

如果编译表达式时不能增加标志,如将模式作为参数传入一个将在以后编译该模式的库函数时,可以把标志嵌入到表达式字符串本身.例如不区分大小写的匹配,可以在表达式开头增加(?i)

import re

text = 'This is some text -- with punctuation.'
pattern = r'(?i)\bT\w+'
regex = re.compile(pattern)

print 'Text     :', text
print 'Pattern  :', pattern
print 'Matches  :', regex.findall(text)

解释器显示如下:

>>> 
Text     : This is some text -- with punctuation.
Pattern  : (?i)\bT\w+
Matches  : ['This', 'text']

所有标志的缩写如下:

标志	缩写
IGNORECASE	i
MULTILINE	m
DOTALL	s
UNICODE	u
VERBOSE	x

3.8 前向或后向

很多情况下,仅当模式中另外某个部分也匹配时才匹配模式的某一部分,这非常有用.例如上例中只有尖括号成对时候,表达式才匹配.所以修改如下,修改后使用了一个肯定前向断言来匹配尖括号对.前向断言语法为(?=pattern):

import re

address = re.compile(
    '''
    ((?P<name>
    ([\w.,]+\s+)*[\w.,]+)
    \s+
    )
    (?= (<.*>$)
    |
    ([^<].*[^>]$)
    )
    <?
    (?P<email>
    [\w\d.+-]+  #username
    @
    ([\w\d.]+\.)+   #domain name prefix
    (com|org|edu)
    )
    >?
''',
    re.UNICODE | re.VERBOSE)

candidates = [
    u'[email protected]',
    u'No Brackets [email protected]',
    u'Open Bracket <[email protected]>',
    u'Close Bracket [email protected]>',
    ]

for candidate in candidates:
    print 'Candidate:', candidate
    match = address.search(candidate)
    if match:
        print ' Name :', match.groupdict()['name']
        print ' Email:', match.groupdict()['email']
    else:
        print ' No match'

解释器显示如下:

>>> 
Candidate: [email protected]
 No match
Candidate: No Brackets [email protected]
 Name : No Brackets
 Email: [email protected]
Candidate: Open Bracket <[email protected]>
 Name : Open Bracket
 Email: [email protected]
Candidate: Close Bracket [email protected]>
 No match

否定前向断言((?!pattern))要求模式不匹配当前位置后面的文本.例如,Email识别模式可以修改为忽略自动系统常用的noreply邮件地址:

import re

address = re.compile(
    '''
    ^
    (?!noreply@.*$)
    [\w\d.+-]+  #username
    @
    ([\w\d.]+\.)+   #domain name prefix
    (com|org|edu)
    $
''',
    re.UNICODE | re.VERBOSE)

candidates = [
    u'[email protected]',
    u'[email protected]',
    ]

for candidate in candidates:
    print 'Candidate:', candidate
    match = address.search(candidate)
    if match:
        print ' Match:', candidate[match.start():match.end()]
    else:
        print ' No match'

解释器显示如下:

>>> 
Candidate: [email protected]
 Match: [email protected]
Candidate: [email protected]
 No match

相应的 否定后向断言语法为:(?<!pattern)

address = re.compile(
    '''
    ^
    [\w\d.+-]+  #username
    (?<!noreply)
    @
    ([\w\d.]+\.)+   #domain name prefix
    (com|org|edu)
    $
''',
    re.UNICODE | re.VERBOSE)

可以借组语法(?<=pattern)用肯定后向断言查找符合某个模式的文本:

import re

twitter = re.compile(
'''
(?<=@)
([\w\d_]+)
''',
    re.UNICODE | re.VERBOSE)

text = '''This text includes two Twitter handles.
One for @ThePSF, and one for the author, @doughellmann.'''

print text
for match in twitter.findall(text):
    print 'Handle:', match

解释器显示如下:

>>> 
This text includes two Twitter handles.
One for @ThePSF, and one for the author, @doughellmann.
Handle: ThePSF
Handle: doughellmann

3.9 自引用表达式

匹配的值还可以用在表达式后面的部分中.最容易的办法是使用\num按id编号引用先前匹配的组:

import re

address = re.compile(
r'''
(\w+)   #first name
\s+
(([\w.]+)\s+)?  #optional middle name or initial
(\w+)   #last name
\s+
<
(?P<email>
\1
\.
\4
@
([\w\d.]+\.)+
(com|org|edu)
)
>
''',
    re.UNICODE | re.VERBOSE | re.IGNORECASE)

candidates = [
u'First Last <[email protected]>',
u'Different Name <[email protected]>',
u'First Middle Last <[email protected]>',
u'First M. Last <[email protected]>',
    ]

for candidate in candidates:
    print 'Candidate:', candidate
    match = address.search(candidate)
    if match:
        print ' Match name:', match.group(1), match.group(4)
        print ' Match email:', match.group(5)
    else:
        print ' No match'

解释器显示如下:

>>> 
Candidate: First Last <[email protected]>
 Match name: First Last
 Match email: [email protected]
Candidate: Different Name <[email protected]>
 No match
Candidate: First Middle Last <[email protected]>
 Match name: First Last
 Match email: [email protected]
Candidate: First M. Last <[email protected]>
 Match name: First Last
 Match email: [email protected]

按数字id创建反向引用有两个缺点:1是表达式改变时需要重新编号,这样难以维护.2是最多创建99个引用,如果超过99个,则会产生更难维护的问题.

所以Python的表达式可以使用(?P=name)指示表达式中先前匹配的一个命名组的值:

address = re.compile(
r'''
(?P<first_name>\w+)   #first name
\s+
(([\w.]+)\s+)?  #optional middle name or initial
(?P<last_name>\w+)   #last name
\s+
<
(?P<email>
(?P=first_name)
\.
(?P=last_name)
@
([\w\d.]+\.)+
(com|org|edu)
)
>
''',
    re.UNICODE | re.VERBOSE | re.IGNORECASE)

在表达式中使用反向引用还有一种机制,即根据前一个组是否匹配来选择不同的模式.可以修正这个Email模式,使得如果出现名字就需要有尖括号,不过如果只有Email地址本身就不需要尖括号.语法是(?(id)yes-expression|no-expression),这里id是组名或编号,yes-expression是组有值时使用的模式,no-expression则是组没有值时使用的模式.

import re

address = re.compile(
r'''
^
(?P<name>
([\w.]+\s+)*[\w.]+
)?
\s*
(?(name)
(?P<brackets>(?=(<.*>$)))
|
(?=([^<].*[^>]$))
)
(?(brackets)<|\s*)
(?P<email>
[\w\d.+-]+
@
([\w\d.]+\.)+
(com|org|edu)
)
(?(brackets)>|\s*)
$
''',
    re.UNICODE | re.VERBOSE)

candidates = [
u'First Last <[email protected]>',
u'No Brackets [email protected]',
u'Open Bracket <[email protected]',
u'Close Bracket [email protected]>',
u'[email protected]',
    ]

for candidate in candidates:
    print 'Candidate:', candidate
    match = address.search(candidate)
    if match:
        print ' Match name:', match.groupdict()['name']
        print ' Match email:', match.groupdict()['email']
    else:
        print ' No match'

解释器显示如下:

>>> 
Candidate: First Last <[email protected]>
 Match name: First Last
 Match email: [email protected]
Candidate: No Brackets [email protected]
 No match
Candidate: Open Bracket <[email protected]
 No match
Candidate: Close Bracket [email protected]>
 No match
Candidate: [email protected]
 Match name: None
 Match email: [email protected]

3.10 用模式修改字符串

使用sub()可以将一个模式的所有出现替换为另一个字符串:

import re

bold = re.compile(r'\*{2}(.*?)\*{2}')

text = 'Make this **bold**. This **too**.'

print 'Text:', text
print 'Bold:', bold.sub(r'<b>\1</b>', text)

解释器显示如下:

>>> 
Text: Make this **bold**. This **too**.
Bold: Make this <b>bold</b>. This <b>too</b>.

要在替换中使用命名组,可以使用语法\g<name>.我们可以使用count来限制完成的替换数:

import re

bold = re.compile(r'\*{2}(?P<bold_text>.*?)\*{2}', re.UNICODE)

text = 'Make this **bold**. This **too**.'

print 'Text:', text
print 'Bold:', bold.sub(r'<b>\g<bold_text></b>', text, count=1)

解释器显示如下:

>>>
Text: Make this **bold**. This **too**.
Bold: Make this <b>bold</b>. This **too**.

3.11 利用模式拆分

str.split()是分解字符串来完成解析的最常用方法之一.但是如果存在多行情况下,我们则需要findall,使用(.+?)\n{2,}的模式.

import re

text = '''Paragraph one
on two lines.

Paragraph two.


Paragraph three.'''

for num, para in enumerate(re.findall(r'(.+?)\n{2,}',
                                      text,
                                      flags=re.DOTALL)
                           ):
    print num, repr(para)
    print

解释器显示如下:(注意{2,}这个模式)

>>> 
0 'Paragraph one\non two lines.'

1 'Paragraph two.'

但是这样最后一行无法显示.我们可以使用split来处理:

import re

text = '''Paragraph one
on two lines.

Paragraph two.


Paragraph three.'''

print 'With findall:'
for num, para in enumerate(re.findall(r'(.+?)(\n{2,}|$)',
                                      text,
                                      flags=re.DOTALL)
                           ):
    print num, repr(para)
    print
print
print 'With split:'
for num, para in enumerate(re.split(r'\n{2,}', text)):
    print num, repr(para)
    print

解释器显示如下:

>>> 
With findall:
0 ('Paragraph one\non two lines.', '\n\n')

1 ('Paragraph two.', '\n\n\n')

2 ('Paragraph three.', '')


With split:
0 'Paragraph one\non two lines.'

1 'Paragraph two.'

2 'Paragraph three.'

你可能感兴趣的:(Python标准库学习笔记1：文本)

python 读excel每行替换_Python脚本操作Excel实现批量替换功能 weixin_39646695 python 读excel每行替换
Python脚本操作Excel实现批量替换功能大家好，给大家分享下如何使用Python脚本操作Excel实现批量替换。使用的工具Openpyxl，一个处理excel的python库，处理excel，其实针对的就是WorkBook，Sheet，Cell这三个最根本的元素~明确需求原始excel如下我们的目标是把下面excel工作表的sheet1表页A列的内容“替换我吧”批量替换为B列的“我用来替换的
x86-64汇编语言训练程序与实战十除以十等于一
本文还有配套的精品资源，点击获取简介：汇编语言是一种低级语言，与机器代码紧密相关，特别适用于编写系统级代码及性能要求高的应用。nasm编译器是针对x86和x86-64架构的汇编语言编译器，支持多种语法风格和指令集。项目Euler提供数学和计算机科学问题，鼓励编程技巧应用，前100个问题的答案可共享。x86-64架构扩展了寄存器数量并引入新指令，提升了数据处理效率。学习汇编语言能够深入理解计算机底层
男士护肤品哪个牌子好？十大男士护肤品排行榜高省APP珊珊
很多男生意识到护肤的必要性，开始着手护肤，但不知道该选哪个男士护肤品品牌使用好。目前市面上很多男士护肤品品牌，可谓琳琅满目，让人眼花缭乱。男士挑选护肤品时，根据自己皮肤需求去正规渠道挑选合适的知名护肤品比较放心靠谱。高省APP，是2021年推出的平台，0投资，0风险、高省APP佣金更高，模式更好，终端用户不流失。【高省】是一个自用省钱佣金高，分享推广赚钱多的平台，百度有几百万篇报道，也期待你的加入
三菱PLC全套学习资料及应用手册 good2know
本文还有配套的精品资源，点击获取简介：三菱PLC作为工业自动化领域的核心设备，其系列产品的学习和应用需要全面深入的知识。本次资料包为学习者提供从基础到进阶的全方位学习资源，包括各种型号PLC的操作手册、编程指南、软件操作教程以及实际案例分析，旨在帮助用户系统掌握PLC的编程语言、指令系统及在各类工业应用中的实施。1.三菱PLC基础知识入门1.1PLC的基本概念可编程逻辑控制器（PLC）是工业自动化
2022-10-20 体力劳动者
不因感觉稍纵即逝就不加记录。在女儿睡觉后我记下今天的小故事。接手新班级后，今天是第二次收到家长的感谢信（微信）。是我表扬次数最多的两位学生家长致来的感谢，他们明显感受到孩子自信、阳光了不少，写作业由被动变为了主动，家庭氛围也由鸡飞狗跳变成了其乐融融。在被顽皮的学生气得头晕之后，我感到了久违的价值感，责任感甚至使命感，我回复家长这样一句话：我们也需要家长的反馈好让我们的教育工作更有劲头。我也认识到，
《玉骨遥》：大司命为什么不杀朱颜？原因没那么简单 windy天意晚晴
《玉骨遥》里，朱颜就是时影的命劫之人。重明与时影早就知道，他们一直瞒着大司命，如今大司命也知道了真相。可是大司命却没有杀朱颜，而是给朱颜下了诛心咒，还说时影的命劫已经破了，真的如此吗？1、计划总是赶不上变化的大司命从目前剧情来说，大司命还不如时影，他信心十足的事情总会有纰漏。他不让时影见命劫之女，结果时影还是遇上了。他想让时影走火入魔，一心复仇，结果时影在朱颜的劝说下放下了仇恨。大司命让时影开山收
移动端城市区县二级联动选择功能实现包 good2know
本文还有配套的精品资源，点击获取简介：本项目是一套为移动端设计的jQuery实现方案，用于简化用户在选择城市和区县时的流程。它包括所有必需文件：HTML、JavaScript、CSS及图片资源。通过动态更新下拉菜单选项，实现城市到区县的联动效果，支持数据异步加载。开发者可以轻松集成此功能到移动网站或应用，并可基于需求进行扩展和优化。1.jQuery移动端解决方案概述jQuery技术简介jQuery
15个小技巧，让我的Windows电脑更好用了！曹元_
01.桌面及文档处理第一部分的技巧，主要是围绕桌面的一些基本操作，包括主题设置、常用文档文件快捷打开的多种方式等等。主题换色默认情况下，我们的Win界面可能就是白色的文档界面，天蓝色的图表背景，说不出哪里不好看，但是就是觉得不够高级。imageimage说到高级感，本能第一反应就会和暗色模式联想起来，如果我们将整个界面换成黑夜模式的话，它会是这样的。imageimage更改主题颜色及暗色模式，我们
（二）SAP Group Reporting (GR) 核心子模块功能及数据流向架构解析
数据如何从子公司流转到合并报表的全过程，即数据采集→合并引擎→报表输出，特别是HANA内存计算如何优化传统ETL瓶颈。SAPGroupReporting(GR)核心模块功能及数据流向的架构解析，涵盖核心组件、数据处理流程和关键集成点，适用于S/4HANA1809+版本：一、核心功能模块概览模块功能关键事务码/FioriApp数据采集(DataCollection)整合子公司财务数据（SAP/非SA
9、汇编语言编程入门：从环境搭建到简单程序实现神经网络酱汇编语言 MEPIS GNU工具链
汇编语言编程入门：从环境搭建到简单程序实现1.数据存储介质问题解决在处理数据存储时，若要使用MEPIS系统，需确保有其可访问的存储介质。目前，MEPIS无法向采用NTFS格式（常用于Windows2000和XP工作站）的硬盘写入数据。不过，若硬盘采用FAT32格式，MEPIS就能进行写入操作。此外，MEPIS还能将文件写入软盘和大多数USB闪存驱动器。若工作站连接到局域网，还可通过FTP协议或挂载
day15｜前端框架学习和算法 universe_01 前端算法笔记
T22括号生成先把所有情况都画出来，然后（在满足什么情况下）把不符合条件的删除。T78子集要画树状图，把思路清晰。可以用暴力法、回溯法和DFS做这个题DFS深度搜索：每个边都走完，再回溯应用：二叉树搜索，图搜索回溯算法=DFS+剪枝T200岛屿数量（非常经典BFS宽度把树状转化成队列形式，lambda匿名函数“一次性的小函数，没有名字”setup语法糖：让代码更简洁好写的语法ref创建：基本类型的
贝多芬诞辰250周年纪念万千星河赴远方
就算不是古典音乐爱好者，你也一定听说过贝多芬。作为古典音乐史上最伟大的音乐家之一，他不仅是古典主义风格的集大成者，同时也是浪漫主义风格的开创者。贝多芬肖像画（1813年）贝多芬的一生共创作了9部交响曲、36首钢琴奏鸣曲、10部小提琴奏鸣曲、16首弦乐四重奏、1部歌剧及2部弥撒曲等等。数量虽然不及前辈海顿、莫扎特多，但他几乎改造了当时所有的音乐表达形式，赋予了它们全新的价值，对后世音乐的发展产生了极
IK分词初心myp
实现简单的分词功能，智能化分词添加依赖配置：4.10.4org.apache.lucenelucene-core${lucene.version}org.apache.lucenelucene-analyzers-common${lucene.version}org.apache.lucenelucene-queryparser${lucene.version}org.apache.lucenel
三件事—小白猫·雨天·八段锦咸鱼月亮
1.最近楼下出现一只非常漂亮的粘人小白猫，看着不像是流浪猫，非常亲人。眼睛比蓝球的还大，而且是绿色的，很漂亮。第一次遇到它，它就跟我到电梯口，如果我稍微招招手，肯定就跟我进电梯了。后来我喂过它几次，好可惜不能养它，一只蓝球就是我的极限了。2.下雨天就心烦，好奇怪。明明以前我超爱看窗外的雨和听雨声，看来近来的心情不够宁静了。3.最近在练八段锦，从第一次就爱上了这个运动，很轻松缓慢，但是却出汗。感觉可
25-1-2019 树藤与海岛呢
hello八月来报道了今天看到了一篇文章就只想记下那两句话：良田千顷不过一日三餐广夏万间只睡卧榻三尺大概的意思就是要珍惜当下不要等来不及的时候才珍惜分享今天的两餐最近没有时间运动呢下个月补回好了说完了哈哈goodnight图片发自App图片发自App
力扣热题100-------54. 螺旋矩阵海航Java之路力扣 leetcode 矩阵 java
给你一个m行n列的矩阵matrix，请按照顺时针螺旋顺序，返回矩阵中的所有元素。示例1：输入：matrix=[[1,2,3],[4,5,6],[7,8,9]]输出：[1,2,3,6,9,8,7,4,5]示例2：输入：matrix=[[1,2,3,4],[5,6,7,8],[9,10,11,12]]输出：[1,2,3,4,8,12,11,10,9,5,6,7]提示：m==matrix.lengthn
你要记住，最重要的是:随时做好准备，为了你可能成为更好的自己，放弃现在的自己。霖霖z
打卡人:周云日期:2018年11月09日【日精进打卡第180天】【知～学习】《六项精进》0遍共214遍《通篇》1遍共106遍《大学》2遍共347遍《坚强工作，温柔生活》ok《不抱怨的世界》104-108页《经典名句》你要记住，最重要的是:随时做好准备，为了你可能成为更好的自己，放弃现在的自己。【行～实践】一、修身：（对自己个人）1、坚持打卡二、齐家：（对家庭和家人）打扫卫生，接送孩子，洗衣做饭，陪
SpringMVC执行流程（原理），通俗易懂国服冰 SpringMVC spring mvc
SpringMVC执行流程（原理），通俗易懂一、图解SpringMVC流程二、进一步理解Springmvc的执行流程1、导入依赖2、建立展示的视图3、web.xml4、spring配置文件springmvc-servlet5、Controller6、tomcat配置7、访问的url8、视图页面一、图解SpringMVC流程图为SpringMVC的一个较完整的流程图，实线表示SpringMVC框架提
C++ 计数排序、归并排序、快速排序每天搬一点点砖 c++数据结构算法
计数排序：是一种基于哈希的排序算法。他的基本思想是通过统计每个元素的出现次数，然后根据统计结果将元素依次放入排序后的序列中。这种排序算法适用于范围较小的情况，例如整数范围在0到k之间计数排序步骤：1初始化一个长度为最大元素值加1的计数数组，所有元素初始化为02遍历原始数组，将每个元素值作为索引，在计数数组中对应位置加13将数组清空4遍历计数器数组，按照数组中的元素个数放回到元数组中计数排序的优点和
2023-11-02 一帆f
发现浸润心田的感觉：今天一个机缘之下突然想分享我的婆媳关系，我一边分享一边回忆我之前和儿媳妇关系的微妙变化，特别是分享到我能感受到儿媳妇的各种美好，现在也能心平气和的和老公平等对话，看到自己看到老公，以己推人以人推己自然而然的换位思考，心中有一种美好的能量在涌动，一种浸润心田的感觉从心胸向全身扩散，美好极了……我很想记住这种感觉，赶紧把它写下来以留纪念，也就是当我看见他人的美好，美好的美妙的浸润心
密码正则验证：大小写字母、数字、特殊字符至少8位 qq_21875331 渐进式的成长
正则表达式：密码必须包含大写字母、数字、特殊字符（四种里至少三种，且至少8位）写法一：/((^(?=.*[a-z])(?=.*[A-Z])(?=.*\W)[\da-zA-Z\W]{8,16}$)|(^(?=.*\d)(?=.*[A-Z])(?=.*\W)[\da-zA-Z\W]{8,16}$)|(^(?=.*\d)(?=.*[a-z])(?=.*\W)[\da-zA-Z\W]{8,16}$)|(^
48. 旋转图像 - 力扣（LeetCode） Fiee-77 #数组 leetcode 算法 python 数据结构数组
题目：给定一个n×n的二维矩阵matrix表示一个图像。请你将图像顺时针旋转90度。你必须在原地旋转图像，这意味着你需要直接修改输入的二维矩阵。请不要使用另一个矩阵来旋转图像。示例1：输入：matrix=[[1,2,3],[4,5,6],[7,8,9]]输出：[[7,4,1],[8,5,2],[9,6,3]]示例2：输入：matrix=[[5,1,9,11],[2,4,8,10],[13,3,6,
Git 与 GitHub 的对比与使用指南一念& 其它 git github
Git与GitHub的对比与使用指南在软件开发中，Git和GitHub是两个密切相关但本质不同的工具。下面我将逐步解释它们的定义、区别、核心概念以及如何协同使用，确保内容真实可靠，基于广泛的技术实践。1.什么是Git？Git是一个分布式版本控制系统，由LinusTorvalds于2005年创建。它的核心功能是跟踪代码文件的变化，帮助开发者管理项目历史记录、协作和回滚错误。Git是开源的，可以在本地
英伟达靠什么支撑起了4万亿？AI泡沫还能撑多久？
英伟达市值突破4万亿美元，既是AI算力需求爆发的直接体现，也暗含市场对未来的狂热预期。其支撑逻辑与潜在风险并存，而AI泡沫的可持续性则取决于技术、商业与地缘政治的复杂博弈。⚙️一、英伟达4万亿市值的核心支撑因素技术垄断与生态壁垒硬件优势：英伟达GPU在AI训练市场占有率超87%，H100芯片的FP16算力达1979TFLOPS，领先竞品3-5倍。CUDA生态：400万开发者构建的软件护城河，成为A
深入解析JVM工作原理：从字节码到机器指令的全过程
一、JVM概述Java虚拟机(JVM)是Java平台的核心组件，它实现了Java"一次编写，到处运行"的理念。JVM是一个抽象的计算机器，它有自己的指令集和运行时内存管理机制。JVM的主要职责：加载：读取.class文件并验证其正确性存储：管理内存分配和垃圾回收执行：解释或编译字节码为机器指令安全：提供沙箱环境限制恶意代码二、JVM架构详解JVM由三个主要子系统组成：1.类加载子系统类加载过程分为
2019-06-05 第十七把巴鲁克
今天去实验田里实习，见到了福寿螺真的可怕且牛皮，六级也快来了，说实话还是害怕。我昨天考了环工原理，真的太难了，太烦了，理工科真的难，烦。实验报告还是没写，要抓紧速度抓紧时间，还是应该学会努力学习，远离一些不上进的事物。
Linux系统配置（应用程序） 1风天云月 Linux linux 应用程序编译安装 rpm http
目录前言一、应用程序概述1、命令与程序的关系2、程序的组成3、软件包封装类型二、RPM1、RPM概述2、RPM用法三、编译安装1、解包2、配置3、编译4、安装5、启用httpd服务结语前言在Linux中的应用程序被视为将软件包安装到系统中后产生的各种文档，其中包括可执行文件、配置文件、用户手册等内容，这些文档被组织为一个有机的整体，为用户提供特定的功能，因此对于“安装软件包”与“安装应用程序”这两
中原焦点团队吴瑕瑜焦点解决初级第18期坚持分享第695天 2021年12月6号卿安
中原焦点团队吴瑕瑜焦点解决初级第18期坚持分享第695天2021年12月6号相信相信的力量。很多时候我们忽视了相信的力量，当看到孩子遇到困难、挫折，或者可能犯错时，我们急于去帮忙，这至少部分暗含不相信孩子有能力自己解决，“等不及”，少了对孩子有权决定是否需要帮忙的尊重，缺乏界限，容易引起冲突，并影响孩子的独立能力。对孩子的成长，很多时候，家长的相信比具体帮助更重要。
Flowable 实战落地核心：选型决策与坑点破解练习时长两年半的程序员小胡 Flowable 流程引擎实战指南低代码 BPMN 流程引擎 flowable 后端 java
在企业级流程引擎的落地过程中，选型的准确性和坑点的预见性直接决定项目成败。本文聚焦Flowable实战中最关键的“选型决策”与“常见坑点”，结合真实项目经验，提供可落地的解决方案。一、流程引擎选型：从业务本质出发1.1选型的三大核心维度企业在选择流程引擎时，需避免陷入“技术崇拜”，应回归业务本质。评估Flowable是否适用，可从三个维度判断：业务复杂度若流程涉及动态审批链（如按金额自动升级审批）
互信息：理论框架、跨学科应用与前沿进展大千AI助手人工智能 Python #OTHER 人工智能深度学习算法互信息香农通信随机变量
1.起源与核心定义互信息（MutualInformation,MI）由克劳德·香农（ClaudeShannon）在1948年开创性论文《AMathematicalTheoryofCommunication》中首次提出，该论文奠定了现代信息论的基础。互信息用于量化两个随机变量之间的统计依赖关系，定义为：若已知一个随机变量的取值，能为另一个随机变量提供的信息量。数学上，对于离散随机变量XXX和YYY，
web报表工具FineReport常见的数据集报错错误代码和解释老A不折腾 web报表 finereport 代码可视化工具
在使用finereport制作报表，若预览发生错误，很多朋友便手忙脚乱不知所措了，其实没什么，只要看懂报错代码和含义，可以很快的排除错误，这里我就分享一下finereport的数据集报错错误代码和解释，如果有说的不准确的地方，也请各位小伙伴纠正一下。 NS-war-remote=错误代码\:1117 压缩部署不支持远程设计 NS_LayerReport_MultiDs=错误代码
Java的WeakReference与WeakHashMap bylijinnan java 弱引用
首先看看 WeakReference wiki 上 Weak reference 的一个例子： public class ReferenceTest { public static void main(String[] args) throws InterruptedException { WeakReference r = new Wea
Linux——（hostname）主机名与ip的映射 eksliang linux hostname
一、什么是主机名无论在局域网还是INTERNET上，每台主机都有一个IP地址，是为了区分此台主机和彼台主机，也就是说IP地址就是主机的门牌号。但IP地址不方便记忆，所以又有了域名。域名只是在公网（INtERNET)中存在，每个域名都对应一个IP地址，但一个IP地址可有对应多个域名。域名类型 linuxsir.org 这样的；主机名是用于什么的呢？答：在一个局域网中，每台机器都有一个主
oracle 常用技巧 18289753290
oracle常用技巧 ①复制表结构和数据 create table temp_clientloginUser as select distinct userid from tbusrtloginlog ②仅复制数据如果表结构一样 insert into mytable select * &nb
使用c3p0数据库连接池时出现com.mchange.v2.resourcepool.TimeoutException 酷的飞上天空 exception
有一个线上环境使用的是c3p0数据库，为外部提供接口服务。最近访问压力增大后台tomcat的日志里面频繁出现 com.mchange.v2.resourcepool.TimeoutException: A client timed out while waiting to acquire a resource from com.mchange.v2.resourcepool.BasicResou
IT系统分析师如何学习大数据蓝儿唯美大数据
我是一名从事大数据项目的IT系统分析师。在深入这个项目前需要了解些什么呢？学习大数据的最佳方法就是先从了解信息系统是如何工作着手，尤其是数据库和基础设施。同样在开始前还需要了解大数据工具，如Cloudera、Hadoop、Spark、Hive、Pig、Flume、Sqoop与Mesos。系统分析师需要明白如何组织、管理和保护数据。在市面上有几十款数据管理产品可以用于管理数据。你的大数据数据库可能
spring学习——简介 a-john spring
Spring是一个开源框架，是为了解决企业应用开发的复杂性而创建的。Spring使用基本的JavaBean来完成以前只能由EJB完成的事情。然而Spring的用途不仅限于服务器端的开发，从简单性，可测试性和松耦合的角度而言，任何Java应用都可以从Spring中受益。其主要特征是依赖注入、AOP、持久化、事务、SpringMVC以及Acegi Security 为了降低Java开发的复杂性，
自定义颜色的xml文件 aijuans xml
<?xml version="1.0" encoding="utf-8"?> <resources> <color name="white">#FFFFFF</color> <color name="black">#000000</color> &
运营到底是做什么的？ aoyouzi 运营到底是做什么的？
文章来源：夏叔叔（微信号：woshixiashushu），欢迎大家关注！很久没有动笔写点东西，近些日子，由于爱狗团产品上线，不断面试，经常会被问道一个问题。问：爱狗团的运营主要做什么？答：带着用户一起嗨。为什么是带着用户玩起来呢？究竟什么是运营？运营到底是做什么的？那么，我们先来回答一个更简单的问题——互联网公司对运营考核什么？以爱狗团为例，绝大部分的移动互联网公司，对运营部门的考核分为三块——用
js面向对象类和对象百合不是茶 js 面向对象函数创建类和对象
接触js已经有几个月了,但是对js的面向对象的一些概念根本就是模糊的,js是一种面向对象的语言但又不像java一样有class,js不是严格的面向对象语言 ,js在java web开发的地位和java不相上下 ,其中web的数据的反馈现在主流的使用json,json的语法和js的类和属性的创建相似下面介绍一些js的类和对象的创建的技术一:类和对
web.xml之资源管理对象配置 resource-env-ref bijian1013 java web.xml servlet
resource-env-ref元素来指定对管理对象的servlet引用的声明，该对象与servlet环境中的资源相关联 <resource-env-ref> <resource-env-ref-name>资源名</resource-env-ref-name> <resource-env-ref-type>查找资源时返回的资源类
Create a composite component with a custom namespace sunjing
https://weblogs.java.net/blog/mriem/archive/2013/11/22/jsf-tip-45-create-composite-component-custom-namespace When you developed a composite component the namespace you would be seeing would
【MongoDB学习笔记十二】Mongo副本集服务器角色之Arbiter bit1129 mongodb
一、复本集为什么要加入Arbiter这个角色回答这个问题，要从复本集的存活条件和Aribter服务器的特性两方面来说。什么是Artiber？ An arbiter does not have a copy of data set and cannot become a primary. Replica sets may have arbiters to add a
Javascript开发笔记白糖_ JavaScript
获取iframe内的元素通常我们使用window.frames["frameId"].document.getElementById("divId").innerHTML这样的形式来获取iframe内的元素，这种写法在IE、safari、chrome下都是通过的，唯独在fireforx下不通过。其实jquery的contents方法提供了对if
Web浏览器Chrome打开一段时间后，运行alert无效 bozch Web chorme alert 无效
今天在开发的时候，突然间发现alert在chrome浏览器就没法弹出了，很是怪异。试了试其他浏览器，发现都是没有问题的。开始想以为是chorme浏览器有啥机制导致的，就开始尝试各种代码让alert出来。尝试结果是仍然没有显示出来。这样开发的结果，如果客户在使用的时候没有提示，那会带来致命的体验。哎，没啥办法了就关闭浏览器重启。结果就好了，这也太怪异了。难道是cho
编程之美-高效地安排会议图着色问题贪心算法 bylijinnan 编程之美
import java.util.ArrayList; import java.util.Collections; import java.util.List; import java.util.Random; public class GraphColoringProblem { /**编程之美高效地安排会议图着色问题贪心算法 * 假设要用很多个教室对一组
机器学习相关概念和开发工具 chenbowen00 算法 matlab 机器学习
基本概念：机器学习(Machine Learning, ML)是一门多领域交叉学科，涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为，以获取新的知识或技能，重新组织已有的知识结构使之不断改善自身的性能。它是人工智能的核心，是使计算机具有智能的根本途径，其应用遍及人工智能的各个领域，它主要使用归纳、综合而不是演绎。开发工具 M
[宇宙经济学]关于在太空建立永久定居点的可能性 comsci 经济
大家都知道,地球上的房地产都比较昂贵,而且土地证经常会因为新的政府的意志而变幻文本格式........ 所以,在地球议会尚不具有在太空行使法律和权力的力量之前,我们外太阳系统的友好联盟可以考虑在地月系的某些引力平衡点上面,修建规模较大的定居点
oracle 11g database control 证书错误 daizj oracle 证书错误 oracle 11G 安装
oracle 11g database control 证书错误 win7 安装完oracle11后打开 Database control 后，会打开em管理页面，提示证书错误，点“继续浏览此网站”，还是会继续停留在证书错误页面解决办法：是 KB2661254 这个更新补丁引起的，它限制了 RSA 密钥位长度少于 1024 位的证书的使用。具体可以看微软官方公告：
Java I/O之用FilenameFilter实现根据文件扩展名删除文件游其是你 FilenameFilter
在Java中，你可以通过实现FilenameFilter类并重写accept(File dir, String name) 方法实现文件过滤功能。在这个例子中，我们向你展示在“c:\\folder”路径下列出所有“.txt”格式的文件并删除。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
C语言数组的简单以及一维数组的简单排序算法示例，二维数组简单示例 dcj3sjt126com c array
# include <stdio.h> int main(void) { int a[5] = {1, 2, 3, 4, 5}; //a 是数组的名字 5是表示数组元素的个数，并且这五个元素分别用a[0], a[1]...a[4] int i; for (i=0; i<5; ++i) printf("%d\n",
PRIMARY, INDEX, UNIQUE 这3种是一类 PRIMARY 主键。就是唯一且不能为空。 INDEX 索引，普通的 UNIQUE 唯一索引 dcj3sjt126com primary
PRIMARY, INDEX, UNIQUE 这3种是一类PRIMARY 主键。就是唯一且不能为空。INDEX 索引，普通的UNIQUE 唯一索引。不允许有重复。FULLTEXT 是全文索引，用于在一篇文章中，检索文本信息的。举个例子来说，比如你在为某商场做一个会员卡的系统。这个系统有一个会员表有下列字段：会员编号 INT会员姓名
java集合辅助类 Collections、Arrays shuizhaosi888 Collections Arrays HashCode
Arrays、Collections 1 ）数组集合之间转换 public static <T> List<T> asList(T... a) { return new ArrayList<>(a); } a）Arrays.asL
Spring Security（10）——退出登录logout 234390216 logout Spring Security 退出登录 logout-url LogoutFilter
要实现退出登录的功能我们需要在http元素下定义logout元素，这样Spring Security将自动为我们添加用于处理退出登录的过滤器LogoutFilter到FilterChain。当我们指定了http元素的auto-config属性为true时logout定义是会自动配置的，此时我们默认退出登录的URL为“/j_spring_secu
透过源码学前端之 Backbone 三 Model 逐行分析JS源代码 backbone 源码分析 js学习
Backbone 分析第三部分 Model 概述： Model 提供了数据存储，将数据以JSON的形式保存在 Model的 attributes里，但重点功能在于其提供了一套功能强大，使用简单的存、取、删、改数据方法，并在不同的操作里加了相应的监听事件，如每次修改添加里都会触发 change，这在据模型变动来修改视图时很常用，并且与collection建立了关联。
SpringMVC源码总结（七）mvc:annotation-driven中的HttpMessageConverter 乒乓狂魔 springMVC
这一篇文章主要介绍下HttpMessageConverter整个注册过程包含自定义的HttpMessageConverter，然后对一些HttpMessageConverter进行具体介绍。 HttpMessageConverter接口介绍： public interface HttpMessageConverter<T> { /** * Indicate
分布式基础知识和算法理论 bluky999 算法 zookeeper 分布式一致性哈希 paxos
分布式基础知识和算法理论 BY [email protected] 本文永久链接：http://nodex.iteye.com/blog/2103218 在大数据的背景下，不管是做存储，做搜索，做数据分析，或者做产品或服务本身，面向互联网和移动互联网用户，已经不可避免地要面对分布式环境。笔者在此收录一些分布式相关的基础知识和算法理论介绍，在完善自我知识体系的同
Android Studio的.gitignore以及gitignore无效的解决 bell0901 android gitignore
　　github上.gitignore模板合集，里面有各种.gitignore ： https://github.com/github/gitignore 　　自己用的Android Studio下项目的.gitignore文件，对github上的android.gitignore添加了　　　　　　# OSX files　　　　　　//mac os下　　　　　　.DS_Store
成为高级程序员的10个步骤 tomcat_oracle 编程
What 软件工程师的职业生涯要历经以下几个阶段：初级、中级，最后才是高级。这篇文章主要是讲如何通过 10 个步骤助你成为一名高级软件工程师。 Why 得到更多的报酬！因为你的薪水会随着你水平的提高而增加提升你的职业生涯。成为了高级软件工程师之后，就可以朝着架构师、团队负责人、CTO 等职位前进历经更大的挑战。随着你的成长，各种影响力也会提高。
mongdb在linux下的安装 xtuhcy mongodb linux
一、查询linux版本号： lsb_release -a LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noa