当前位置：首页 > article >正文

血缘解析＜二＞:如何解析带CTE语句的Sql

article 2025/1/12 1:07:41

blood

- 一、思路
- 二、代码

一、思路

之前文章血缘解析介绍了血缘解析的思路,但是对于带CTE语句的sql解析不到
eg:

with tmp as (
	select id,name,age
	  from personinfo
),tmp1 as (
	select a.id,a.name,a.age,b.classno
	  from tmp a
	  join classinfo b
	    on a.id = b.id
)
	select id,name,age,classno from tmp1

上述sql在解析的时候会获取到源表是tmp1,而不是cte中的personinfo和classinfo,原因是:com.alibaba.druid.sql获取getSelect().getQuery()只能获取到不包含cte的select语句,如下所示

        try{
            statements = SQLUtils.parseStatements(sql, JdbcConstants.MYSQL);
        }catch (Exception e){
            System.out.println("can't parser by druid oracle"+e);
        }
        // 只考虑一条语句
        SQLStatement statement = statements.get(0);
        // 只考虑查询语句
        SQLSelectStatement sqlSelectStatement = (SQLSelectStatement) statement;
        SQLSelectQuery sqlSelectQuery = sqlSelectStatement.getSelect().getQuery();

后面我想到了一种思路:带with的cte语句实际上也是临时表,tmp是临时表的表名,包含的select语句是临时表名的加工语句,可以把所有cte中的select语句都拿出来解析,最后通过递归获取最原始的血缘关系,对应关系如下所示:

假设上述加工语句是写入test表
第一层:

target_tab	target_col	source_tab	source_col
test	id	tmp1	id
test	name	tmp1	name
test	age	tmp1	age
test	classno	tmp1	classno

第二层:

target_tab	target_col	source_tab	source_col
tmp1	id	tmp	id
tmp1	name	tmp	name
tmp1	age	tmp	age
tmp1	classno	classinfo	classno

第三层:

target_tab	target_col	source_tab	source_col
tmp	id	personinfo	id
tmp	name	personinfo	name
tmp	age	personinfo	age

会把上述所有层的对应关系都写到Mysql的表中,最后会利用Mysql的递归获取最原始的对应关系,递归代码如下所示:

with recursive cte (target_tab,target_col,source_tab,source_col,level) as 
(
	select target_tab,target_col,source_tab,source_col,1 as level
	  from blood_tab
	 union all
	select t1.target_tab,t1.target_col,t.source_tab,t.source_col,t1.level + 1
	  from blood_tab t
	  join cte1 t1
	    on t.target_tab = t1.source_tab
	   and t.target_col = t1.source_col
),cte1 as (
	select target_tab,target_col,source_tab,source_col,
	       rank()over(partition by target_tab,target_col order by level desc) as rn
	  from cte
)
	select target_tab,
	       target_col,
	       source_tab,
	       source_col
	  from cte1 t
	 where t.rn <= 1

二、代码

如何获取所有cte的别名和包含的select语句?,可以用正则表达式匹配或者其它方式,我这里还是利用com.alibaba.druid.sql包进行解析获取

package com.lan.lineage.operate;


import com.alibaba.druid.sql.SQLUtils;
import com.alibaba.druid.sql.ast.SQLStatement;
import com.alibaba.druid.sql.ast.statement.SQLSelectStatement;
import com.alibaba.druid.sql.ast.statement.SQLWithSubqueryClause;
import com.alibaba.druid.util.JdbcConstants;
import com.lan.lineage.common.EmptyUtils;

import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.atomic.AtomicReference;


public class CteSpilt {

    public static void splitcte(String sql, String target_tab, String target_schema) {
        String target_tab_with;
        String target_sql;
        String text;


        if ( EmptyUtils.isEmpty(sql)){
            return;
        }
        AtomicReference<Boolean> isContinue = new AtomicReference<>(false);
        List<SQLStatement> statements = new ArrayList<>();
        // 解析
        try{
            statements = SQLUtils.parseStatements(sql, JdbcConstants.MYSQL);
        }catch (Exception mysqlerr) {
            System.out.println("can't parser by druid mysql" + mysqlerr);
            try {
                statements = SQLUtils.parseStatements(sql, JdbcConstants.ORACLE);
            } catch (Exception oracleerr){
                System.out.println("can't parser by druid ORACLE" + oracleerr);
            }
        }

        try {
            SQLStatement statement = statements.get(0);
            SQLSelectStatement sqlSelectStatement = (SQLSelectStatement) statement;
            SQLWithSubqueryClause withSub = sqlSelectStatement.getSelect().getWithSubQuery();

            if (withSub != null) {
                List<SQLWithSubqueryClause.Entry> withSubEntry = withSub.getEntries();
                for (int i = 0; i < withSubEntry.size(); i++) {
                    target_tab_with = withSubEntry.get(i).getAlias();
                    target_sql = String.valueOf(withSubEntry.get(i).getSubQuery());
                }
            }
        } catch (Exception e){
            e.printStackTrace();
        }

    }
}