当前位置：首页 > article >正文

Python re 模块：正则表达式的强大工具

article 2024/10/25 2:32:48

文章目录

Python `re` 模块：正则表达式的强大工具
导入 `re` 模块
基本匹配方法
- re.match()
- re.search()
- re.findall()
- re.finditer()
替换操作
- re.sub()
分割字符串
- re.split()
捕获组和非捕获组
- 捕获组
- 非捕获组
常用模式符号
实际应用示例
- 验证电子邮件格式
- 提取 URL
预定义字符简介
- 示例代码
- - 1. 匹配数字
  - 2. 匹配非数字字符
  - 3. 匹配空白字符
  - 4. 匹配字母数字字符
性能考虑
总结

Python `re` 模块：正则表达式的强大工具

正则表达式是处理字符串的强大工具，而 Python 的 re 模块为我们提供了灵活且高效的方式来使用正则表达式。本文将深入探讨 re 模块的常用功能和实际应用示例。

导入 `re` 模块

在使用正则表达式之前，首先需要导入 re 模块：

import re

基本匹配方法

re.match()

re.match() 从字符串的开始位置匹配一个模式。

pattern = r'\d+'
string = '123abc'

match = re.match(pattern, string)
if match:
    print("匹配成功:", match.group())
else:
    print("匹配失败")

re.search()

re.search() 在整个字符串中搜索模式。

pattern = r'\d+'
string = 'abc123xyz'

search = re.search(pattern, string)
if search:
    print("找到的匹配:", search.group())
else:
    print("没有找到匹配")

re.findall()

re.findall() 返回字符串中所有匹配的子串，以列表形式返回。

pattern = r'\d+'
string = 'abc123xyz456'

matches = re.findall(pattern, string)
print("所有匹配:", matches)

re.finditer()

re.finditer() 返回一个迭代器，包含所有匹配的对象。

pattern = r'\d+'
string = 'abc123xyz456'

for match in re.finditer(pattern, string):
    print("匹配到:", match.group(), "位置:", match.start())

替换操作

re.sub()

re.sub() 用于替换匹配的字符串。

pattern = r'\d+'
string = 'abc123xyz456'
result = re.sub(pattern, '#', string)
print("替换后的字符串:", result)

分割字符串

re.split()

re.split() 按照正则表达式分割字符串。

pattern = r'\d+'
string = 'abc123xyz456'
result = re.split(pattern, string)
print("分割后的结果:", result)

捕获组和非捕获组

捕获组

使用括号 () 来定义捕获组。

pattern = r'(\d+)-(\d+)-(\d+)'
string = '2024-10-23'

match = re.match(pattern, string)
if match:
    print("年份:", match.group(1), "月份:", match.group(2), "日期:", match.group(3))

非捕获组

使用 (?:...) 来定义非捕获组。

pattern = r'(?:\d+)-(\d+)'
string = '2024-10'

match = re.match(pattern, string)
if match:
    print("月份:", match.group(1))

常用模式符号

符号	描述	示例
`.`	匹配除换行符外的任何字符	`a.b` 匹配 `acb`, `a1b`
`^`	匹配字符串的开头	`^abc` 匹配 `abcde`
`$`	匹配字符串的结尾	`xyz$` 匹配 `abcxyz`
`*`	匹配 0 次或多次	`ab*c` 匹配 `ac`, `abc`, `abbc`
`+`	匹配 1 次或多次	`ab+c` 匹配 `abc`, `abbc` 但不匹配 `ac`
`?`	匹配 0 次或 1 次	`ab?c` 匹配 `ac` 或 `abc`
`{m,n}`	匹配 m 到 n 次	`a{2,4}` 匹配 `aa`, `aaa`, `aaaa`
`[]`	匹配字符集	`[abc]` 匹配 `a`, `b`, `c`
`	`	或运算符
`()`	捕获组	`(abc)` 捕获 `abc`
`(?:...)`	非捕获组	`(?:abc)` 不捕获 `abc`
`\d`	匹配数字	`\d` 匹配 `0-9`
`\D`	匹配非数字	`\D` 匹配 `a-z`
`\w`	匹配字母、数字及下划线	`\w` 匹配 `a-z`, `A-Z`, `0-9`, `_`
`\W`	匹配非字母、数字及下划线	`\W` 匹配空格、标点等
`\s`	匹配空白字符	`\s` 匹配空格、制表符等
`\S`	匹配非空白字符	`\S` 匹配任何非空格字符

实际应用示例

验证电子邮件格式

def is_valid_email(email):
    pattern = r'^[\w\.-]+@[\w\.-]+\.\w+$'
    return re.match(pattern, email) is not None

email = 'test@example.com'
print("邮箱格式是否正确:", is_valid_email(email))

提取 URL

def extract_urls(text):
    pattern = r'https?://[^\s]+'
    return re.findall(pattern, text)

text = "访问我们的官网 https://example.com 和 http://test.com"
urls = extract_urls(text)
print("提取的 URL:", urls)

预定义字符简介

符号	描述	相当于
`\d`	匹配任何十进制数	`[0-9]`
`\D`	匹配任何非数字字符	`[^0-9]`
`\s`	匹配任何空白字符	`[\t\n\r\f\v]`
`\S`	匹配任何非空白字符	`[^\t\n\r\f\v]`
`\w`	匹配任何字母数字字符（包括下划线）	`[a-zA-Z0-9_]`
`\W`	匹配任何非字母数字字符（包括下划线）	`[^a-zA-Z0-9_]`

示例代码

1. 匹配数字

使用 \d 匹配字符串中的数字：

import re

text = "匹配规则这2个字符串3是否匹配规则5则则则7则"
matches = re.findall(r"\d", text)  # 匹配任何单个数字
print(matches)  # 输出: ['2', '3', '5', '7']

如果需要匹配一位或多位数字，可以使用 \d+：

import re

text = "匹配规则这2个字符串134444是否匹配规则5则则则7则"
matches = re.findall(r"\d+", text)  # 匹配一位或多位数字
print(matches)  # 输出: ['2', '134444', '5', '7']

2. 匹配非数字字符

使用 \D 匹配非数字字符：

import re

text = "匹配规则这2个字符串3是否匹配规则5则则则7则"
matches = re.findall(r"\D", text)  # 匹配任何非数字字符
print(matches)  # 输出: ['匹', '配', '规', '则', '这', '个', '字', '符', '串', '是', '否', '匹', '配', '规', '则', '则', '则', '则', '则']

3. 匹配空白字符

使用 \s 匹配字符串中的空白字符：

import re

text = "匹配规则   这2个字符串3是否匹\n配规则5则则则7则"
matches = re.findall(r"\s", text)  # 匹配任何空白字符
print(matches)  # 输出: [' ', ' ', ' ', '\n']

使用 \S 匹配非空白字符：

import re

text = "匹配规则   这2个字符串3是否匹\n配规则5则则则7则"
matches = re.findall(r"\S", text)  # 匹配任何非空白字符
print(matches)  # 输出: ['匹', '配', '规', '则', '这', '2', '个', '字', '符', '串', '3', '是', '否', '匹', '配', '规', '则', '5', '则', '则', '则', '7', '则']

4. 匹配字母数字字符

使用 \w 匹配包括下划线在内的字母数字字符：

import re

text = "https://www.cnblogs.com/"
matches = re.findall(r'\w', text)  # 匹配字母数字字符
print(matches)  # 输出: ['h', 't', 't', 'p', 's', 'w', 'w', 'w', 'c', 'n', 'b', 'l', 'o', 'g', 's', 'c', 'o', 'm']

使用 \W 匹配非字母数字字符：

import re

text = "https://www.cnblogs.com/"
matches = re.findall(r'\W', text)  # 匹配非字母数字字符
print(matches)  # 输出: [':', '/', '/', '.', '.', '/']

性能考虑

在处理大量数据时，正则表达式的性能可能会受到影响。可以考虑以下优化策略：

使用原始字符串（r''）来避免转义字符。
避免复杂的表达式，尽量简化模式。
使用编译后的正则表达式。

compiled_pattern = re.compile(r'\d+')
matches = compiled_pattern.findall('abc123xyz456')
print("匹配结果:", matches)

总结

Python 的 re 模块为字符串处理提供了强大的正则表达式支持。通过掌握基本用法和实际应用，能够高效地处理复杂的字符串匹配和替换任务。掌握正则表达式的语法和方法，可以显著提升你的 Python 编程能力。

查看全文

http://www.kler.cn/news/363817.html

ARM学习（33）英飞凌(infineon)PSOC 6 板子学习

echarts散点图

面经之一：Synchronized与ReentrantLock区别

【Linux】总线-设备-驱动模型

工作使用的工具

《Windows PE》6.4.2 远程注入DLL

全局滚动和局部滚动

多模态大语言模型（MLLM）-Deepseek Janus

Spring AI 1.0.0 M1版本新特性！

代码随想录算法训练营第二十二天|Day22 回溯算法

Oracle10g运维表增删改查

【Vue.js设计与实现】第三篇第11章：渲染器-快速 Diff 算法-阅读笔记

文案创作新思路：Python与文心一言API的完美结合

《计算机视觉》—— 基于dlib库的人脸关键部位的轮廓检测

【MySQL】详解表的约束

【途牛旅游网-注册/登录安全分析报告】

vue2.x中的数据劫持

视频剪辑和转换gif一体化UI页面【可以解决gif体积过大】

【YOLOv11】制作使用YOLOv11的docker环境

一道面试题：为什么要使用Docker？

Java项目-基于springboot框架的智慧外贸系统项目实战(附源码+文档)

COVON全意卫生巾凭借其轻薄、透气、绵柔的特点，在东南亚市场上迅速走红

攻坚金融关键业务系统，OceanBase亮相2024金融科技大会

调整Android板子的分辨率

内网python smtplib用ssh隧道通过跳板机发邮件

微积分复习笔记 Calculus Volume 1 - 3.2 he Derivative as a Function