OpenMetadata 获取 MySQL 数据库表血缘关系详解
概述
OpenMetadata 是一个开源的元数据管理平台,支持端到端的血缘关系追踪。对于 MySQL 数据库,OpenMetadata 通过解析表的外键约束、视图定义及查询日志(可选)构建表级血缘。本文结合源码分析其实现机制。
环境配置与数据摄取
1. 配置文件示例(YAML)
source:
type: mysql
serviceName: mysql_dev
serviceConnection:
config:
type: Mysql
username: admin
password: pass
hostPort: localhost:3306
databaseSchema: sales_db
sourceConfig:
config:
includeViews: true
includeTables: true
markDeletedTables: true
lineageQuery: "SELECT * FROM information_schema.views WHERE view_definition LIKE '%{table}%';"
sink:
type: metadata-rest
config: {
}
workflowConfig:
openMetadataServerConfig:
hostPort: "http://localhost:8585/api"
authProvider: openmetadata
securityConfig:
jwtToken: "token"
2. 关键配置项
lineageQuery
: 自定义血缘分析 SQL(可选)includeViews
: 是否解析视图血缘markDeletedTables
: 处理已删除表
源码解析与核心流程
1. 入口类:MysqlSource
路径:openmetadata-ingestion/src/metadata/ingestion/source/database/mysql/connection.py
class MysqlSource(RDBMSSource):
def __init__(self, config: WorkflowSource, metadata_config: OpenMetadataConnection):
super().__init__(config, metadata_config)
self.connection = MysqlConnection(config.serviceConnection.__root__.config)
2. 血缘提取核心方法
路径:openmetadata-ingestion/src/metadata/ingestion/source/database/common_db_source.py
class CommonDbSourceService(ABC):
def process_table_lineage(self, table: Table) -> None:
# 通过外键解析直接血缘
for column in table.columns:
if column.foreignKeys:
self._build_foreign_key_lineage(table, column)
# 通过视图定义解析
if self.config.sourceConfig.config.includeViews:
view_def = self._get_view_definition(table.name)
self._parse_view_lineage(view_def, table)
3. SQL 解析器
路径:openmetadata-ingestion/src/metadata/ingestion/source/database/lineage/parser.py
class LineageParser:
@staticmethod
def parse(sql: str) -> List[LineageEdge]:
# 使用 ANTLR 解析 SQL,生成语法树
parser = SqlLineageParser(sql)
return parser.get_lineage_edges()