当前位置: 首页 > article >正文

GTID下复制问题和解决

环境介绍

数据库1主2从,mysql版本是v5.19

表结构

一、主库新增记录,从库提示主键冲突

模拟故障

1, master上关闭 sql_log_bin,删除id =103 后打开

2, 确认此时从库有id=103,主库没有

3, master insert id=103 主从异常

             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1062
                   Last_Error: Could not execute Write_rows event on table test.hero3; Duplicate entry '103' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000045, end_log_pos 3056

修复故障

根据报错test.hero3; Duplicate entry '103' for key 'PRIMARY,去slave中找到对应记录,删除即可

mysql> delete from test.hero3 where id=103;
Query OK, 1 row affected (0.00 sec)

mysql> stop slave;start slave;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.10.129
                  Master_User: repl
                  Master_Port: 3358
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000046
          Read_Master_Log_Pos: 194
               Relay_Log_File: mha3-relay-bin.000012
                Relay_Log_Pos: 407
        Relay_Master_Log_File: mysql-bin.000046
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

二、主库UPDATE,从库找不到记录

模拟故障

1.1, slave上将id=1 的记录delete

1.2, master上执行update xxx where id = 1;

1.3,    show slave status\G查看主从同步异常

# slave:
mysql> delete from hero3 where id = 1;
Query OK, 1 row affected (0.01 sec)

# master
mysql> update hero3 set age = 1 where id = 1;
Query OK, 1 row affected (0.03 sec)
Rows matched: 1  Changed: 1  Warnings: 0

# slave;
mysql>  show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.10.129
                  Master_User: repl
                  Master_Port: 3358
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000045
          Read_Master_Log_Pos: 2819
               Relay_Log_File: mha2-relay-bin.000009
                Relay_Log_Pos: 866
        Relay_Master_Log_File: mysql-bin.000045
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1032
                   Last_Error: Could not execute Update_rows event on table test.hero3; Can't find record in 'hero3', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000045, end_log_pos 2788

修复故障

从库中找到对应的记录,在slave上将该记录插入;

1,根据同步异常错误mysql-bin.000045, end_log_pos 2788,去主库对应sql

2, mysqlbinlog --no-defaults -vv --base64-output=decode-rows mysql-bin.000045 | grep -A 20 'end_log_pos 2788' 命令查找执行的sql

[root@mha1 mysql]# mysqlbinlog --no-defaults -vv --base64-output=decode-rows mysql-bin.000045 | grep -A 20 'end_log_pos 2788'
#241222 10:38:37 server id 10129  end_log_pos 2788 CRC32 0x1bfc92fc     Update_rows: table id 224 flags: STMT_END_F
### UPDATE `test`.`hero3`
### WHERE
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2='mao1' /* VARSTRING(60) meta=60 nullable=0 is_null=0 */
###   @3=110 /* INT meta=0 nullable=0 is_null=0 */
### SET
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2='mao1' /* VARSTRING(60) meta=60 nullable=0 is_null=0 */
###   @3=1 /* INT meta=0 nullable=0 is_null=0 */

3,根据 UPDATE `test`.`hero3` WHERE @1=1 @2='mao1' @3=110 拼接insert并插入slave

mysql>    insert into `test`.`hero3` values (1,'mao1',110);
Query OK, 1 row affected (0.03 sec)

mysql> stop slave; start slave;
Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.10.129
                  Master_User: repl
                  Master_Port: 3358
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000045
          Read_Master_Log_Pos: 2819
               Relay_Log_File: mha2-relay-bin.000010
                Relay_Log_Pos: 454
        Relay_Master_Log_File: mysql-bin.000045
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

三、主库DELETE,从库找不到记录

模拟故障

主库在关闭binlog的情况下insert一条数据,然后打开binlog并删除该数据

mysql> set sql_log_bin = 0; 
Query OK, 0 rows affected (0.00 sec)

mysql> insert into hero3(name,age) values('delete_test',22);
Query OK, 1 row affected (0.01 sec)

mysql> set sql_log_bin = 1;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from hero3 where id >= 102;
+-----+-------------+-----+
| id  | name        | age |
+-----+-------------+-----+
| 102 | mao2        | 111 |
| 103 | mao2        | 109 |
| 104 | delete_test |  22 |
+-----+-------------+-----+
3 rows in set (0.00 sec)

mysql> delete from hero3 where id = 104;
Query OK, 1 row affected (0.03 sec)

此时从库报错信息如下

             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1032
                   Last_Error: Could not execute Delete_rows event on table test.hero3; Can't find record in 'hero3', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000045, end_log_pos 2092

 修复故障

由于在执行delete的时候,slave找不到该记录,那么我们就可以跳过该事务

Retrieved_Gtid_Set 表示slave已经收到的gtid集合,Executed_Gtid_Set表示slave已经执行的gtid集合,从中可以看到 4855c186-ba02-11ef-8bf4-000c297511da:11没有执行

由于删除的数据在slave中找不到,所以我们就可以跳过该事务; 

mysql> stop slave;
Query OK, 0 rows affected (0.00 sec)

mysql> set gtid_next = '4855c186-ba02-11ef-8bf4-000c297511da:11'; #设定下一个要执行的GTID:11
Query OK, 0 rows affected (0.02 sec)

mysql> begin;commit; #产生一个空事务
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

mysql> set gtid_next='AUTOMATIC'; # 将next设为自动
Query OK, 0 rows affected (0.00 sec)

mysql> start slave;
Query OK, 0 rows affected (0.02 sec)

mysql> show slave status\G      # 可以看到主从状态已恢复                                                                                                                                             *************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.10.129
                  Master_User: repl
                  Master_Port: 3358
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000045
          Read_Master_Log_Pos: 2123
               Relay_Log_File: mha3-relay-bin.000008
                Relay_Log_Pos: 454
        Relay_Master_Log_File: mysql-bin.000045
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

四、主库日志被purged,主从同步失败

模拟故障

4.1.1 stop slave断开主从同步

4.1.2 主库上执行一些事务,使GTID持续增长

mysql> flush logs;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into hero3(name,age) select name,age from hero3;
Query OK, 28 rows affected (0.00 sec)
Records: 28  Duplicates: 0  Warnings: 0

mysql> flush logs;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into hero3(name,age) select name,age from hero3;
Query OK, 56 rows affected (0.00 sec)
Records: 56  Duplicates: 0  Warnings: 0

mysql> flush logs;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into hero3(name,age) select name,age from hero3;
Query OK, 112 rows affected (0.00 sec)
Records: 112  Duplicates: 0  Warnings: 0

mysql> flush logs;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into hero3(name,age) select name,age from hero3;
Query OK, 224 rows affected (0.00 sec)
Records: 224  Duplicates: 0  Warnings: 0

mysql> show master logs;
+------------------+-----------+
| Log_name         | File_size |
+------------------+-----------+
| mysql-bin.000053 |       734 |
| mysql-bin.000054 |       972 |
| mysql-bin.000055 |      1448 |
| mysql-bin.000056 |      2400 |
| mysql-bin.000057 |      4257 |
+------------------+-----------+

4.1.3 清理binlog,造成master上事务缺失

mysql> purge binary logs to 'mysql-bin.000058';
Query OK, 0 rows affected (0.00 sec)

mysql> show master logs;
+------------------+-----------+
| Log_name         | File_size |
+------------------+-----------+
| mysql-bin.000058 |       194 |
+------------------+-----------+
1 row in set (0.00 sec)

4.1.4 slave上执行start slave,观察主从异常,提示“but the master has purged binary logs containing GTIDs that the slave requires”

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: 192.168.10.129
                  Master_User: repl
                  Master_Port: 3358
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000050
          Read_Master_Log_Pos: 194
               Relay_Log_File: mha3-relay-bin.000021
                Relay_Log_Pos: 407
        Relay_Master_Log_File: mysql-bin.000050
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'

修复故障

由于master上这部分binlog已经被清理掉了,所以这部分数据已经无法同步到从库。 及时恢复了主从同步状态,这部分数据也是丢失的; 工作的时候遇到这种情况有两种处理方法

4.2.1        master上根据select @@gtid_purged;可以查看主库GTID执行了1-27

mysql> select @@gtid_purged;
+-------------------------------------------+
| @@gtid_purged                             |
+-------------------------------------------+
| 4855c186-ba02-11ef-8bf4-000c297511da:1-27 |
+-------------------------------------------+

从库show slave status命令可以看到Retrieved_Gtid_Set 和Executed_Gtid_Set 中显示从库已经获取并执行了GTID:1-20

           Retrieved_Gtid_Set: 4855c186-ba02-11ef-8bf4-000c297511da:6-20
            Executed_Gtid_Set: 090b3bdd-ba02-11ef-9ce1-000c292443d0:1-4,
4855c186-ba02-11ef-8bf4-000c297511da:1-20

证明中间有21-27,一共7个事务丢失;

4.2.2  尝试将gtid_purged设置为1-27

mysql>  set global gtid_purged = '4855c186-ba02-11ef-8bf4-000c297511da:1-27';
ERROR 1840 (HY000): @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.
mysql> 
mysql> reset slve^C
mysql> set @@GLOBAL.GTID_EXECUTED = '';
ERROR 1238 (HY000): Variable 'gtid_executed' is a read only variable

发现无法设置,我们可以reset master,清空从库的binlog和gtid_executed

mysql> reset master;
Query OK, 0 rows affected (0.00 sec)

mysql> set global gtid_purged='4855c186-ba02-11ef-8bf4-000c297511da:1-27';
Query OK, 0 rows affected (0.00 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.10.129
                  Master_User: repl
                  Master_Port: 3358
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000058
          Read_Master_Log_Pos: 194
               Relay_Log_File: mha3-relay-bin.000022
                Relay_Log_Pos: 407
        Relay_Master_Log_File: mysql-bin.000058
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

4.2.3 虽然此时同步恢复正常,但是主从是不一致的,后面执行其他事务的时候是正常的,但是当执行到丢失的7个事务中的数据的时候还是会报错

# master
mysql>  select count(1) from hero3;
+----------+
| count(1) |
+----------+
|      448 |
+----------+
#slave
mysql>   select count(1) from hero3;
+----------+
| count(1) |
+----------+
|      208 |
+----------+
1 row in set (0.00 sec)

后面等异常的时候可以按照上三种情况进行修复,发现一例修复一例;

4.2.4  我们不知道上面7个事务究竟影响了多少数据量,如果条件允许,可以从主库上dump一份全量数据,重新搭建一个新的从库,这样可以保证数据完全一致,不用每次都手动修复


http://www.kler.cn/a/447706.html

相关文章:

  • 基于微信小程序的乡村旅游系统
  • 车载网关性能 --- 缓存buffer划分要求
  • PHP 微信棋牌开发全解析:高级教程
  • 单片机上电后程序不运行怎么排查问题?
  • [创业之路-202]:任正非管理华为的思想与毛泽东管理党、军队、国家的思想的相似性与差异性
  • [JavaScript] 我该怎么去写一个canvas游戏
  • 分布式系统架构5:限流设计模式
  • Python趣味游戏---自己改成图片,跟着鼠标移动,一直克隆自己,0.3秒钟后消除克隆体
  • Java学习,数组中查找指定元素
  • 数据分析时的json to excel 转换的好用小工具
  • 计算机毕业设计原创定制(免费送源码):NodeJS+MVVM+MySQL 樱花在线视频网站
  • 「Mac畅玩鸿蒙与硬件46」UI互动应用篇23 - 自定义天气预报组件
  • ORB-SLAM2源码学习:Tracking.cc:Tracking::NeedNewKeyFrame是否需要插入关键帧
  • Android settings命令详解
  • 科研笔记:CCS CONCEPTS
  • Spark和Hive的区别
  • 服务器上加入SFTP------(小白篇 1)
  • Bayes-GRU-Attention的数据多特征分类预测Matlab实现
  • 方正畅享全媒体新闻采编系统 screen.do SQL注入漏洞复现(附脚本)
  • 基于xiaothink对Wanyv-50M模型进行c-eval评估
  • 2024年企业中生成式 AI 的现状报告
  • wget -q -O
  • 数据结构 C/C++(实验七:排序)
  • macos 隐藏、加密磁盘、文件
  • Electric Capital 2024 年开发者报告 | 洞见未来 Web3 生态发展方向
  • 【C++——临时对象,const T】