oracle排查长时间没提交的事务造成的阻塞案例
一 问题描述
开发同事反馈生产环境某个接口慢,一个普通的按主键更新的update竟然需要5分钟,而我手动执行秒返回,猜测是发生了阻塞,需要排查出阻塞源。
有时,一个事务里会包含多个sql,有的还包含上传附件等操作,一个事务长时间不提交会导致其他相关sql被阻塞,比如事务里的sql执行得很快,但上传附件卡住的话,那也会阻塞其他相关sql。
查看锁阻塞的sql只能查看到阻塞方事务最近执行的一条sql,查不到之前的sql,所以有时候查出的阻塞结果不准确,这时候需要结合对这个表的审计来确定阻塞方sql。
二 排查思路
1.创建一个针对这个被阻塞的表的审计
2.让同事复现下这个问题
3.查下阻塞,抓下阻塞方的audsid
4.根据audsid查看这个会话的审计结果,看看对这个表都做了啥操作。
三 排查过程
#这里以在自己测试环境模拟一个简单的锁阻塞为例
3.1 创建FGA审计,只审计被阻塞的表
#审计SCOTT.T2表
begin
dbms_fga.add_policy (
object_schema => 'SCOTT',
object_name => 'T2',
policy_name => 'TEST',
statement_types => 'SELECT,INSERT,UPDATE,DELETE'
);
end;
/
--必须指定statement_types => 'SELECT,INSERT,UPDATE,DELETE',否则默认只审计SELECT。
#查看当前有哪些审计
select * from DBA_AUDIT_POLICIES;
3.2 故障模拟
#在会话1用scott用户登录,执行一个sql,但不提交
SQL> update scott.t2 set name='ffff' where id=2;
#在会话2也修改同一条记录
SQL> update scott.t2 set name='ggggg' where id=2;
会话2会被会话1阻塞。
#在会话1再执行个其他的sql
SQL> select 1 from dual;
3.3 查看阻塞
3.3.1 查看阻塞链
select *
from (select a.inst_id,a.sid, a.serial#,
a.sql_id,
a.event,
a.status,
connect_by_isleaf as isleaf,
sys_connect_by_path(SID, '<-') tree,
level as tree_level
from gv$session a
start with a.blocking_session is not null
connect by nocycle a.sid = prior a.blocking_session)
where isleaf = 1
order by tree_level asc;
可以看到3152这个会话阻塞了3635。
3.3.2 查看具体阻塞
SELECT DISTINCT s1.inst_id as blocking_inst_id,s1.username as blocking_username,s1.machine as blocking_machine,s1.module as blocking_module,s1.sid as blocking_sid,s1.audsid as blocking_audsid,s1.serial# as blocking_serial#,c1.sql_text as blocking_sql_text,s1.status as blocking_staus,s1.event as blocking_event,
s2.inst_id as waiting_inst_id,s2.username as waiting_username,s2.machine waiting_machine,s2.module as waiting_module,s2.sid as waiting_sid,s2.audsid as waiting_audsid,s2.serial# as waiting_serial#,c2.sql_text as waiting_sql_text,s2.status as blocking_staus,s2.event as waiting_event
FROM gv$lock l1,
gv$session s1,
gv$lock l2,
gv$session s2,
gv$sqlarea c1,
gv$sqlarea c2,
gv$process b1,
gv$process b2
WHERE s1.sid = l1.sid
AND s2.sid = l2.sid
AND s1.inst_id = l1.inst_id
AND s2.inst_id = l2.inst_id
AND s1.paddr = b1.addr
AND s2.paddr = b2.addr
AND c1.SQL_ID=s1.PREV_SQL_ID
AND s2.sql_hash_value = c2.hash_value
AND l1.block > 0
AND l2.request > 0
AND l1.id1 = l2.id1
AND l1.id2 = l2.id2
UNION
SELECT DISTINCT s1.inst_id as blocking_inst_id,s1.username as blocking_username,s1.machine as blocking_machine,s1.module as blocking_module,s1.sid as blocking_sid,s1.audsid as blocking_audsid,s1.serial# as blocking_serial#,c1.sql_text as blocking_sql_text,s1.status as blocking_staus,s1.event as blocking_event,
s2.inst_id as waiting_inst_id,s2.username as waiting_username,s2.machine waiting_machine,s2.module as waiting_module,s2.sid as waiting_sid,s2.audsid as waiting_audsid,s2.serial# as waiting_serial#,c2.sql_text as waiting_sql_text,s2.status as blocking_staus,s2.event as waiting_event
FROM gv$lock l1,
gv$session s1,
gv$lock l2,
gv$session s2,
gv$sqlarea c1,
gv$sqlarea c2,
gv$process b1,
gv$process b2
WHERE s1.sid = l1.sid
AND s2.sid = l2.sid
AND s1.inst_id = l1.inst_id
AND s2.inst_id = l2.inst_id
AND s1.paddr = b1.addr
AND s2.paddr = b2.addr
AND c1.hash_value=s1.sql_hash_value
AND c1.address=s1.sql_address
AND s2.sql_hash_value = c2.hash_value
AND l1.block > 0
AND l2.request > 0
AND l1.id1 = l2.id1
AND l1.id2 = l2.id2
这里看到sid为3152的会话阻塞了sid为3635的会话。
阻塞方sql为select 1 from dual
被阻塞方sql为update scott.t2 set name='ggggg' where id=2
这显然不合理。因为select 1 from dual不会阻塞任何sql。
记录下阻塞源的audsid。
这里是75356902。
3.4 模拟结束后关闭审计
begin
dbms_fga.drop_policy(object_schema=>'SCOTT',object_name=>'T2',policy_name=>'TEST');
end;
/
3.5 查看审计结果
select * from dba_fga_audit_trail where session_id=阻塞方audsid;
#这里是75356902
select * from dba_fga_audit_trail where session_id=75356902;
找到造成阻塞的sql了:
update scott.t2 set name='ffff' where id=2
#备注
之前审计过的记录在审计结束后,记录不会清空。