Python 正则表达式

学习 re 模块进行文本匹配和提取 · 难度：高级 · +15XP

Python 正则表达式

正则表达式（Regular Expression）是一种强大的文本模式匹配工具，用于在字符串中搜索、匹配、替换和提取特定模式的内容。Python 通过内置的 re 模块提供正则表达式支持。掌握正则表达式可以让你在处理文本数据时事半功倍。

re 模块基本函数

import re
text = '我的电话号码是 138-1234-5678 和 159-9876-5432'
# search - 搜索第一个匹配
match = re.search(r'\d{3}-\d{4}-\d{4}', text)
if match:
    print(match.group())  # 138-1234-5678
# findall - 查找所有匹配
phones = re.findall(r'\d{3}-\d{4}-\d{4}', text)
print(phones)  # ['138-1234-5678', '159-9876-5432']
# match - 从字符串开头匹配
result = re.match(r'abc', 'abcdef')
print(result is not None)  # True
result = re.match(r'abc', 'xabcdef')
print(result is not None)  # False
# sub - 替换
text2 = 'Python 2 and Python 3'
result = re.sub(r'Python (\d)', r'Python \1.0', text2)
print(result)  # Python 2.0 and Python 3.0

分组与捕获

import re
text = '姓名: 张三, 年龄: 25, 邮箱: zhangsan@example.com'
# 命名分组
pattern = r'姓名: (?P<name>\w+), 年龄: (?P<age>\d+), 邮箱: (?P<email>[\w.]+@\w+\.\w+)'
match = re.search(pattern, text)
if match:
    print(match.group('name'))   # 张三
    print(match.group('age'))    # 25
    print(match.group('email'))  # zhangsan@example.com
    print(match.groupdict())     # 字典形式
# finditer - 返回迭代器 (适合大文本)
text2 = '苹果10元，香蕉5元，西瓜15元'
for m in re.finditer(r'(\w+)(\d+)元', text2):
    print(f'{m.group(1)}: {m.group(2)}元')

常用正则表达式模式

import re
# 验证邮箱
def is_valid_email(email):
    pattern = r'^[\w\.-]+@[\w\.-]+\.\w+$'
    return bool(re.match(pattern, email))
# 验证手机号 (中国)
def is_valid_phone(phone):
    pattern = r'^1[3-9]\d{9}$'
    return bool(re.match(pattern, phone))
# 提取 URL
text = '访问 https://www.python.org 和 http://example.com'
urls = re.findall(r'https?://[\w\./-]+', text)
print(urls)  # ['https://www.python.org', 'http://example.com']

正则标志速查表

标志	简写	说明
re.IGNORECASE	re.I	忽略大小写
re.MULTILINE	re.M	多行模式，^和$匹配每行
re.DOTALL	re.S	. 匹配包括换行符的所有字符
re.VERBOSE	re.X	支持注释和空白格式化

Python 正则表达式

Python 正则表达式

re 模块基本函数

分组与捕获

常用正则表达式模式

正则标志速查表

🏆 学习排行

📢 推荐

🔧 工具

📊 统计