最新消息:20210917 已从crifan.com换到crifan.org

【教程】详解Python正则表达式之: ‘|’ vertical bar 竖杠

Python re crifan 7037浏览 0评论

Python的手册中,是这么解释的:

'|'

A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B. An arbitrary number of REs can be separated by the '|' in this way. This can be used inside groups (see below) as well. As the target string is scanned, REs separated by '|' are tried from left to right. When one pattern completely matches, that branch is accepted. This means that once A matches, B will not be tested further, even if it would produce a longer overall match. In other words, the '|' operator is never greedy. To match a literal '|', use \|, or enclose it inside a character class, as in [|].

关于竖杠,表示或者的关系,其简单的用法,暂且不多解释。

下面只针对相对复杂一些的用法中,竖杠’|’,即或者,是如何使用的。


1.使用竖杠,匹配多个可能的字符串中的其中一种

最经典的例子要数匹配文件名后缀,图片后缀了。

此处,以图片后缀为例。

【需求】

希望匹配各种图片后缀,比如jpg,jpeg,gif,png,bmp等等,其中的一种。

【代码】

经过折腾,下述代码,是可以正常匹配的:

#!/usr/bin/python
# -*- coding: utf-8 -*-
"""
【教程】详解Python正则表达式之: '|' vertical bar 竖杠
【教程】详解Python正则表达式之: ‘|’ vertical bar 竖杠
Version: 2012-11-05 Author: Crifan """ import re; testStrList = [ "http://www.example.com/picture_name.jpg", "http://www.example.com/picture_name.jpeg", "http://www.example.com/picture_name.gif", "http://www.example.com/picture_name.bmp", "http://www.example.com/picture_name.png", "http://www.example.com/picture_name.ige", ]; for eachTestStr in testStrList: #foundPictureSuffix = re.search("http://[^:]+?\.(?P<pictureSuffix>[(jpg)|(jpeg)|(gif)|(png)|(bmp)])", eachTestStr); # all will match e #foundPictureSuffix = re.search("http://[^:]+?\.(?P<pictureSuffix>[(jpg)|(jpeg)|(gif)|(png)|(bmp)]{3,4})", eachTestStr); # also match .ige foundPictureSuffix = re.search("http://[^:]+?\.(?P<pictureSuffix>(jpg)|(jpeg)|(gif)|(png)|(bmp))", eachTestStr); # work ok, not match ige if(foundPictureSuffix): print "eachTestStr=%s, pictureSuffix=%s"%(eachTestStr, foundPictureSuffix.group("pictureSuffix")); else: print "eachTestStr=%s, pictureSuffix=NOT MATCH"%(eachTestStr); print "-----------------------------------------------------------------------"; for eachTestStr in testStrList: foundPictureSuffixNoGroupName = re.search("http://[^:]+?\.((jpg)|(jpeg)|(gif)|(png)|(bmp))", eachTestStr); # work ok, not match ige #foundPictureSuffixNoGroupName = re.search("http://[^:]+?\.(jpg)|(jpeg)|(gif)|(png)|(bmp)", eachTestStr); # only match .jpg #foundPictureSuffixNoGroupName = re.search("http://[^:]+?(\.jpg)|(\.jpeg)|(\.gif)|(\.png)|(\.bmp)", eachTestStr); #still only match .jpg #foundPictureSuffixNoGroupName = re.search("http://[^:]+?[(\.jpg)|(\.jpeg)|(\.gif)|(\.png)|(\.bmp)]", eachTestStr); # will error: IndexError: no such group #foundPictureSuffixNoGroupName = re.search("http://[^:]+?([(\.jpg)|(\.jpeg)|(\.gif)|(\.png)|(\.bmp)])", eachTestStr); # only match . if(foundPictureSuffixNoGroupName): print "eachTestStr=%s, pictureSuffix=%s"%(eachTestStr, foundPictureSuffixNoGroupName.group(1)); else: print "eachTestStr=%s, pictureSuffix=NOT MATCH"%(eachTestStr);

其中,被注释掉的代码,是各种尝试,其中对应的输出,已经标注出来了,感兴趣的,自己去试试即可。

【总结】

总的来说,还是在最外层需要一个圆括号,表示一个group,然后group内部,再用多个竖杠,区分出多个可能的字符串,然后每个字符串,也被括号括起来,表示自己本身是完整的集合,需要严格匹配的。

即,类似于

((aaa)|(bbb)|(ccc))

之类的形式,就可以匹配多种字符串中的其中一种了。

另外对应的,如果想要给组添加命名,则就是这样的形式了:

(?P<groupName>(aaa)|(bbb)|(ccc))

转载请注明:在路上 » 【教程】详解Python正则表达式之: ‘|’ vertical bar 竖杠

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
100 queries in 0.192 seconds, using 23.42MB memory