当前位置: 首页 > article >正文

Python实现MySQL数据库对象的血缘分析

Python控制台的程序,实现遍历MySQL中所有的SQL对象(表、视图、用户定义函数、存储过程和触发器等),并取得它们之间之前的依赖性关系,并列出三张表,第一张表的第一列是所有的SQL对象名称,第二列是它的数据的生成路径,路径中的相邻SQL对象之间用“->”隔开,如果有多条路径,就存储多条记录,第二张表是根据依赖性生成所有表的列表,依懒性从上到下依次递增,第三张表是根据依赖性生成所有第二张表中表数据的存储过程列表,依懒性从上到下依次递增,存储在Excel文件中。

程序总体设计

1. 系统架构
+-----------------+     +-----------------+     +-----------------+     +-----------------+
| MySQL元数据采集 | --> | 依赖关系分析引擎 | --> | 数据路径生成器 | --> | Excel输出模块 |
+-----------------+     +-----------------+     +-----------------+     +-----------------+
2. 模块划分
  • 元数据采集模块:通过information_schema获取所有数据库对象
  • 依赖分析模块:解析SQL定义获取对象依赖关系
  • 路径生成模块:构建依赖图谱并生成所有可能路径
  • 拓扑排序模块:为表和存储过程生成依赖顺序
  • Excel输出模块:使用pandas生成多Sheet的Excel文件
3. 技术选型
  • Python 3.8+
  • 数据库驱动:mysql-connector-python
  • SQL解析:sqlparse + 正则表达式
  • 数据处理:pandas
  • Excel输出:openpyxl
4. 数据结构设计
class DBObject:
    def __init__(self, name, obj_type, definition):
        self.name = name          # 对象名称
        self.obj_type = obj_type # 对象类型(TABLE/VIEW/PROCEDURE等)
        self.definition = definition # 原始定义SQL
        self.dependencies = []    # 直接依赖对象列表

详细设计方案

1. 元数据采集(示例SQL)
# 获取所有表/视图
SELECT table_name, table_type 
FROM information_schema.tables 
WHERE table_schema = DATABASE()

# 获取存储过程/函数
SELECT routine_name, routine_type 
FROM information_schema.routines 
WHERE routine_schema = DATABASE()

# 获取触发器
SELECT trigger_name, event_object_table 
FROM information_schema.triggers 
WHERE trigger_schema = DATABASE()
2. 依赖关系分析算法
def parse_dependencies(obj: DBObject):
    if obj.obj_type == 'VIEW':
        # 使用正则匹配FROM和JOIN子句
        pattern = r'\b(?:FROM|JOIN)\s+([\w`]+)'
    elif obj.obj_type in ('PROCEDURE', 'FUNCTION'):
        # 匹配表引用和CALL语句
        pattern = r'\b(?:FROM|INTO|UPDATE|JOIN|CALL)\s+([\w`]+)'
    elif obj.obj_type == 'TRIGGER':
        # 触发器关联表已在元数据中
        pattern = None
    
    if pattern:
        matches = re.findall(pattern, obj.definition, re.IGNORECASE)
        obj.dependencies = [m.strip('`') for m in matches]
3. 依赖图谱构建
class DependencyGraph:
    def __init__(self):
        self.graph = defaultdict(list)  # 邻接表:{节点: [依赖项]}
        self.reverse_graph = defaultdict(list)  # 逆邻接表
    
    def add_edge(self, from_node, to_node):
        self.graph[to_node].append(from_node)
        self.reverse_graph[from_node].append(to_node)
4. 路径生成算法(DFS实现)
def find_all_paths(graph, start):
    paths = []
    
    def dfs(node, path):
        if node not in graph or not graph[node]:
            paths.append('->'.join(path[::-1]))
            return
        for neighbor in graph[node]:
            dfs(neighbor, path + [neighbor])
    
    dfs(start, [start])
    return paths
5. 拓扑排序算法(Kahn算法)
def topological_sort(graph):
    in_degree = {u:0 for u in graph}
    for u in graph:
        for v in graph[u]:
            in_degree[v] += 1
    
    queue = deque([u for u in in_degree if in_degree[u] == 0])
    sorted_list = []
    
    while queue:
        u = queue.popleft()
        sorted_list.append(u)
        for v in graph[u]:
            in_degree[v] -= 1
            if in_degree[v] == 0:
                queue.append(v)
    
    return sorted_list
6. Excel输出结构
with pd.ExcelWriter('output.xlsx') as writer:
    # Sheet1: 对象路径表
    path_df.to_excel(writer, sheet_name='对象路径', index=False)
    
    # Sheet2: 表依赖顺序
    table_order_df.to_excel(writer, sheet_name='表依赖顺序', index=False)
    
    # Sheet3: 存储过程依赖顺序
    sp_order_df.to_excel(writer, sheet_name='存储过程顺序', index=False)

执行流程

  1. 初始化数据库连接
  2. 采集所有数据库对象元数据
  3. 构建依赖关系图谱
  4. 验证并处理循环依赖
  5. 生成路径数据(Sheet1)
  6. 分别对表和存储过程进行拓扑排序(Sheet2、Sheet3)
  7. 格式化输出到Excel文件

异常处理设计

  1. 数据库连接失败:捕获OperationalError,提示检查连接参数
  2. 循环依赖检测:使用Tarjan算法检测强连通分量
  3. SQL解析失败:记录解析错误日志,跳过当前对象
  4. 文件写入异常:捕获PermissionError,提示关闭已打开的Excel文件

优化策略

  1. 缓存机制:对已解析对象建立缓存字典
  2. 并行处理:使用ThreadPoolExecutor并行解析不同对象
  3. 增量更新:记录最后处理时间戳,支持增量分析
  4. 批处理优化:使用executemany批量查询对象定义

示例输出格式

表1:对象路径表
对象名称生成路径
view1table_a->view1
view1table_b->view1
proc1table_a->view1->proc1
表2:表依赖顺序
表名
table_a
table_b
table_c
表3:存储过程顺序
存储过程名
proc_base
proc_main

这个设计方案可以实现对MySQL数据库对象的全面依赖分析,最终输出结构清晰的Excel报告。实际开发时需要特别注意SQL解析的准确性和异常处理的完备性。

此代码实现了对MySQL数据库对象的元数据采集、依赖关系分析、路径生成和拓扑排序,并将结果输出到Excel文件中。同时包含了基本的异常处理和优化策略。

import mysql.connector
import sqlparse
import re
from collections import defaultdict, deque
import pandas as pd
from tarjan import tarjan as tarjan_algorithm
import traceback
from concurrent.futures import ThreadPoolExecutor
import time


class DBObject:
    def __init__(self, name, obj_type, definition):
        self.name = name
        self.obj_type = obj_type
        self.definition = definition
        self.dependencies = []


def get_metadata(cnx):
    cursor = cnx.cursor()

    tables = []
    cursor.execute("""
        SELECT table_name, table_type 
        FROM information_schema.tables 
        WHERE table_schema = DATABASE()
    """)
    for row in cursor:
        name, obj_type = row
        cursor2 = cnx.cursor(dictionary=True)
        cursor2.execute(f"SHOW CREATE {obj_type.lower()} {name}")
        result = cursor2.fetchone()
        tables.append(DBObject(name, obj_type, result[f'Create {obj_type}']))
        cursor2.close()

    routines = []
    cursor.execute("""
        SELECT routine_name, routine_type 
        FROM information_schema.routines 
        WHERE routine_schema = DATABASE()
    """)
    for row in cursor:
        name, obj_type = row
        cursor2 = cnx.cursor(dictionary=True)
        cursor2.execute(f"SHOW CREATE {obj_type.lower()} {name}")
        result = cursor2.fetchone()
        routines.append(DBObject(name, obj_type, result[f'Create {obj_type}']))
        cursor2.close()

    triggers = []
    cursor.execute("""
        SELECT trigger_name, event_object_table 
        FROM information_schema.triggers 
        WHERE trigger_schema = DATABASE()
    """)
    for row in cursor:
        name, table = row
        triggers.append(DBObject(name, 'TRIGGER', table))

    all_objects = tables + routines + triggers
    cursor.close()
    return all_objects


def parse_dependencies(obj: DBObject):
    if obj.obj_type == 'VIEW':
        pattern = r'\b(?:FROM|JOIN)\s+([\w`]+)'
    elif obj.obj_type in ('PROCEDURE', 'FUNCTION'):
        pattern = r'\b(?:FROM|INTO|UPDATE|JOIN|CALL)\s+([\w`]+)'
    elif obj.obj_type == 'TRIGGER':
        pattern = None

    if pattern:
        matches = re.findall(pattern, obj.definition, re.IGNORECASE)
        obj.dependencies = [m.strip('`') for m in matches]


class DependencyGraph:
    def __init__(self):
        self.graph = defaultdict(list)
        self.reverse_graph = defaultdict(list)

    def add_edge(self, from_node, to_node):
        self.graph[to_node].append(from_node)
        self.reverse_graph[from_node].append(to_node)


def find_all_paths(graph, start):
    paths = []

    def dfs(node, path):
        if node not in graph or not graph[node]:
            paths.append('->'.join(path[::-1]))
            return
        for neighbor in graph[node]:
            dfs(neighbor, path + [neighbor])

    dfs(start, [start])
    return paths


def topological_sort(graph):
    in_degree = {u: 0 for u in graph}
    for u in graph:
        for v in graph[u]:
            in_degree[v] += 1

    queue = deque([u for u in in_degree if in_degree[u] == 0])
    sorted_list = []

    while queue:
        u = queue.popleft()
        sorted_list.append(u)
        for v in graph[u]:
            in_degree[v] -= 1
            if in_degree[v] == 0:
                queue.append(v)

    return sorted_list


def detect_cycles(graph):
    components = tarjan_algorithm(graph)
    for component in components:
        if len(component) > 1:
            return True
    return False


def main():
    try:
        cnx = mysql.connector.connect(
            user='your_username',
            password='your_password',
            host='your_host',
            database='your_database'
        )

        all_objects = get_metadata(cnx)

        cache = {}
        graph = DependencyGraph()

        def parse_object(obj):
            if obj.name in cache:
                return cache[obj.name]
            parse_dependencies(obj)
            for dep in obj.dependencies:
                graph.add_edge(obj.name, dep)
            cache[obj.name] = obj
            return obj

        with ThreadPoolExecutor() as executor:
            all_objects = list(executor.map(parse_object, all_objects))

        if detect_cycles(graph.graph):
            print("检测到循环依赖,请检查数据库对象的定义。")
            return

        path_data = []
        for obj in all_objects:
            paths = find_all_paths(graph.graph, obj.name)
            for path in paths:
                path_data.append({'对象名称': obj.name, '生成路径': path})

        path_df = pd.DataFrame(path_data)

        tables = [obj.name for obj in all_objects if obj.obj_type == 'TABLE']
        table_graph = {k: v for k, v in graph.graph.items() if k in tables}
        table_order = topological_sort(table_graph)
        table_order_df = pd.DataFrame({'表名': table_order})

        procs = [obj.name for obj in all_objects if obj.obj_type in ('PROCEDURE', 'FUNCTION')]
        proc_graph = {k: v for k, v in graph.graph.items() if k in procs}
        proc_order = topological_sort(proc_graph)
        sp_order_df = pd.DataFrame({'存储过程名': proc_order})

        with pd.ExcelWriter('output.xlsx') as writer:
            path_df.to_excel(writer, sheet_name='对象路径', index=False)
            table_order_df.to_excel(writer, sheet_name='表依赖顺序', index=False)
            sp_order_df.to_excel(writer, sheet_name='存储过程顺序', index=False)

        cnx.close()
    except mysql.connector.OperationalError as e:
        print(f"数据库连接失败: {e}")
    except Exception as e:
        print(f"发生错误: {e}")
        traceback.print_exc()


if __name__ == "__main__":
    main()

http://www.kler.cn/a/599207.html

相关文章:

  • 智慧路灯杆:点亮未来城市的科技基石
  • 架构思维:通用系统设计方法论_从复杂度分析到技术实现指南
  • mysql入门操作
  • 多线程 --- 进程和线程的基本知识
  • 图解AUTOSAR_SWS_WatchdogInterface
  • Bash语言的物联网
  • python基础之--包和模块
  • 【简单学习】Prompt Engineering 提示词工程
  • FACTR赋能Franka机器人:触觉-视觉融合决策的颠覆性突破
  • C++实现决策树与随机森林调优困境:从性能瓶颈到高效突破
  • Apollo 相关知识点
  • 浅谈ai工程落地 - 蒸馏 vs 剪枝 vs 量化
  • 客服机器人怎么才能精准的回答用户问题?
  • UML的使用
  • Android Compose 框架组件可见性(Visibility、LocalDensity)深入剖析(十九)
  • 纯文本驱动的数据可视化革命——AI生成图表「图表狐」全场景深度解析
  • SpringCould微服务架构之Docker(1)
  • 处理 macOS 终端打开时会卡在 xcodebuild
  • 105.在 Vue 3 中使用 OpenLayers 加载静态图片作为地图底图
  • 前端知识点---innerHTML和innerText