血缘解析<二>:如何解析带CTE语句的Sql
blood
- 一、思路
- 二、代码
一、思路
之前文章血缘解析介绍了血缘解析的思路,但是对于带CTE语句的sql解析不到
eg:
with tmp as (
select id,name,age
from personinfo
),tmp1 as (
select a.id,a.name,a.age,b.classno
from tmp a
join classinfo b
on a.id = b.id
)
select id,name,age,classno from tmp1
上述sql在解析的时候会获取到源表是tmp1,而不是cte中的personinfo和classinfo,原因是:com.alibaba.druid.sql获取getSelect().getQuery()只能获取到不包含cte的select语句,如下所示
try{
statements = SQLUtils.parseStatements(sql, JdbcConstants.MYSQL);
}catch (Exception e){
System.out.println("can't parser by druid oracle"+e);
}
// 只考虑一条语句
SQLStatement statement = statements.get(0);
// 只考虑查询语句
SQLSelectStatement sqlSelectStatement = (SQLSelectStatement) statement;
SQLSelectQuery sqlSelectQuery = sqlSelectStatement.getSelect().getQuery();
后面我想到了一种思路:带with的cte语句实际上也是临时表,tmp是临时表的表名,包含的select语句是临时表名的加工语句,可以把所有cte中的select语句都拿出来解析,最后通过递归获取最原始的血缘关系,对应关系如下所示:
- 假设上述加工语句是写入test表
- 第一层:
target_tab | target_col | source_tab | source_col |
---|---|---|---|
test | id | tmp1 | id |
test | name | tmp1 | name |
test | age | tmp1 | age |
test | classno | tmp1 | classno |
- 第二层:
target_tab | target_col | source_tab | source_col |
---|---|---|---|
tmp1 | id | tmp | id |
tmp1 | name | tmp | name |
tmp1 | age | tmp | age |
tmp1 | classno | classinfo | classno |
- 第三层:
target_tab | target_col | source_tab | source_col |
---|---|---|---|
tmp | id | personinfo | id |
tmp | name | personinfo | name |
tmp | age | personinfo | age |
会把上述所有层的对应关系都写到Mysql的表中,最后会利用Mysql的递归获取最原始的对应关系,递归代码如下所示:
with recursive cte (target_tab,target_col,source_tab,source_col,level) as
(
select target_tab,target_col,source_tab,source_col,1 as level
from blood_tab
union all
select t1.target_tab,t1.target_col,t.source_tab,t.source_col,t1.level + 1
from blood_tab t
join cte1 t1
on t.target_tab = t1.source_tab
and t.target_col = t1.source_col
),cte1 as (
select target_tab,target_col,source_tab,source_col,
rank()over(partition by target_tab,target_col order by level desc) as rn
from cte
)
select target_tab,
target_col,
source_tab,
source_col
from cte1 t
where t.rn <= 1
二、代码
- 如何获取所有cte的别名和包含的select语句?,可以用正则表达式匹配或者其它方式,我这里还是利用com.alibaba.druid.sql包进行解析获取
package com.lan.lineage.operate;
import com.alibaba.druid.sql.SQLUtils;
import com.alibaba.druid.sql.ast.SQLStatement;
import com.alibaba.druid.sql.ast.statement.SQLSelectStatement;
import com.alibaba.druid.sql.ast.statement.SQLWithSubqueryClause;
import com.alibaba.druid.util.JdbcConstants;
import com.lan.lineage.common.EmptyUtils;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.atomic.AtomicReference;
public class CteSpilt {
public static void splitcte(String sql, String target_tab, String target_schema) {
String target_tab_with;
String target_sql;
String text;
if ( EmptyUtils.isEmpty(sql)){
return;
}
AtomicReference<Boolean> isContinue = new AtomicReference<>(false);
List<SQLStatement> statements = new ArrayList<>();
// 解析
try{
statements = SQLUtils.parseStatements(sql, JdbcConstants.MYSQL);
}catch (Exception mysqlerr) {
System.out.println("can't parser by druid mysql" + mysqlerr);
try {
statements = SQLUtils.parseStatements(sql, JdbcConstants.ORACLE);
} catch (Exception oracleerr){
System.out.println("can't parser by druid ORACLE" + oracleerr);
}
}
try {
SQLStatement statement = statements.get(0);
SQLSelectStatement sqlSelectStatement = (SQLSelectStatement) statement;
SQLWithSubqueryClause withSub = sqlSelectStatement.getSelect().getWithSubQuery();
if (withSub != null) {
List<SQLWithSubqueryClause.Entry> withSubEntry = withSub.getEntries();
for (int i = 0; i < withSubEntry.size(); i++) {
target_tab_with = withSubEntry.get(i).getAlias();
target_sql = String.valueOf(withSubEntry.get(i).getSubQuery());
}
}
} catch (Exception e){
e.printStackTrace();
}
}
}
- 上述代码的target_tab_with就是cte名,target_sql就是cte中包含的select语句