当前位置: 首页 > article >正文

lookup join 使用缓存参数和不使用缓存参数的执行前后对比

0.先看结论

#缓存开启参数,默认关闭
  'lookup.cache.max-rows' = '1000',  -- 设置最大缓存条目数为 1000
  'lookup.cache.ttl' = '10 min'     -- 设置缓存条目的最大存活时间为 10 分钟

启用缓存

  • 查询时性能较高,因为数据直接从缓存中读取。
  • 缓存未过期时,MySQL 的查询负载较低。
  • 适用于维表数据变化较少的场景。

不启用缓存

  • 每次查询都会访问 MySQL,性能取决于 MySQL 查询效率。
  • MySQL 负载较高,不适合高频查询场景。

1.kafka准备工作

#(1)启动zk
[root@node1 server]# /export/server/zookeeper/bin/zkServer.sh start

#确认 Zookeeper 是否在 2181 端口监听:
[root@node1 server]# netstat -tulnp | grep 2181
tcp6       0      0 :::2181                 :::*                    LISTEN      4651/java  

结果显示在监听中,没问题

#(2)启动kafka
[root@node1 bin]# cd /export/server/kafka/bin
[root@node1 bin]# kafka-server-start.sh -daemon /export/server/kafka/config/server.properties
[root@node1 bin]# netstat -tulnp | grep 9092
tcp6       0      0 192.168.77.161:9092     :::*                    LISTEN      23613/java    

结果显示在监听中,没问题

#(3)创建topic
[root@node1 bin]# kafka-topics.sh --create \
    --bootstrap-server node1:9092 \
    --replication-factor 1 \
    --partitions 1 \
    --topic orders_topic

#检查是否创建成功
[root@node1 bin]# ./kafka-topics.sh --bootstrap-server node1:9092 --list | grep orders_topic
orders_topic

有显示表示创建成功

#(4)生产数据
[root@node1 bin]# cd /export/server/kafka/bin
[root@node1 bin]# kafka-console-producer.sh --broker-list node1:9092 --topic orders_topic

{"order_id": 1, "customer_id": 101, "order_time": "2024-01-01 10:00:00"}
{"order_id": 2, "customer_id": 102, "order_time": "2024-01-01 10:05:00"}
{"order_id": 3, "customer_id": 103, "order_time": "2024-01-01 10:10:00"}


#(5)测试kafka消费数据
/export/server/kafka/bin/kafka-console-consumer.sh --bootstrap-server node1:9092 --topic orders_topic --from-beginning
/export/server/kafka/bin/kafka-console-consumer.sh --bootstrap-server node1:9092 --topic orders_topic --group group_test --from-beginning


-----备用-------
#删除kafka toopic
/export/server/kafka/bin/kafka-topics.sh --delete --topic orders_topic --bootstrap-server node1:9092
#重新创建相同的 Topic:
/export/server/kafka/bin/kafka-topics.sh --create --topic orders_topic --bootstrap-server node1:9092 --partitions 1 --replication-factor 1

------------

2.mysql准备工作

#创建表 和插入数据
CREATE DATABASE IF NOT EXISTS test;
USE test;

CREATE TABLE dim_customer (
    customer_id BIGINT PRIMARY KEY,
    customer_name VARCHAR(255)
);


INSERT INTO dim_customer (customer_id, customer_name) VALUES
(101, 'Alice'),
(102, 'Bob'),
(103, 'Charlie');

3.flinksql

#hadoop,我的checkpoint 数据是存hdfs的,所以要启动
start-dfs.sh

#启动flink
cd /export/server/flink
bin/start-cluster.sh

#启动flink sql客户端
sql-client.sh 


#创建一个从 Kafka 读取订单流数据的表
CREATE TABLE orders (
    order_id BIGINT,
    customer_id BIGINT,
    order_time TIMESTAMP(3),
    proc_time AS PROCTIME() -- 定义 Processing Time
) WITH (
    'connector' = 'kafka',
    'topic' = 'orders_topic',
    'properties.bootstrap.servers' = 'node1:9092',
    'properties.group.id' = 'flink_group',
    'format' = 'json',
    'scan.startup.mode' = 'earliest-offset'
);


select * from orders limit 5;

#创建一个从 MySQL 查询客户信息的维表:
CREATE TABLE dim_customer (
    customer_id BIGINT,
    customer_name STRING,
    PRIMARY KEY (customer_id) NOT ENFORCED
) WITH (
    'connector' = 'jdbc',
    'url' = 'jdbc:mysql://node1:3306/test',
    'table-name' = 'dim_customer',
    'username' = 'root',
    'password' = '123456',
    'lookup.cache.max-rows' = '1000',  -- 最大缓存行数
    'lookup.cache.ttl' = '10 min'     -- 缓存有效时间为 10 分钟
);
select * from dim_customer;

#实现订单流与客户信息的 Lookup Join:
SELECT
    o.order_id,
    o.customer_id,
    c.customer_name,
    o.order_time
FROM orders AS o
LEFT JOIN dim_customer FOR SYSTEM_TIME AS OF o.proc_time AS c
ON o.customer_id = c.customer_id;


---------不启动缓存查询---------
CREATE TABLE dim_customer_no_cache (
    customer_id BIGINT,
    customer_name STRING,
    PRIMARY KEY (customer_id) NOT ENFORCED
) WITH (
    'connector' = 'jdbc',
    'url' = 'jdbc:mysql://node1:3306/test',
    'table-name' = 'dim_customer',
    'username' = 'root',
    'password' = '123456'
);

SELECT
    o.order_id,
    o.customer_id,
    c.customer_name,
    o.order_time
FROM orders AS o
LEFT JOIN dim_customer_no_cache FOR SYSTEM_TIME AS OF o.order_time AS c
ON o.customer_id = c.customer_id;
更新维表数据(在查询过程中):

UPDATE dim_customer SET customer_name = 'Alice Updated' WHERE customer_id = 101;
UPDATE dim_customer SET customer_name = 'Bob Updated' WHERE customer_id = 102;
1. 启用缓存
表现:
(1)如果缓存未过期,查询结果不会反映 dim_customer 表的实时更新。
(2)更新 dim_customer 后,直到缓存失效(TTL 到期),才会刷新缓存并获取最新数据。
(3)预期结果如下
初始查询结果:无变化
+------------+-------------+---------------+---------------------+
| order_id   | customer_id | customer_name | order_time          |
+------------+-------------+---------------+---------------------+
| 1          | 101         | Alice         | 2024-01-01 10:00:00 |
| 2          | 102         | Bob           | 2024-01-01 10:05:00 |
| 3          | 103         | Charlie       | 2024-01-01 10:10:00 |
+------------+-------------+---------------+---------------------+

更新 dim_customer 后(缓存未过期):无变化
+------------+-------------+---------------+---------------------+
| order_id   | customer_id | customer_name | order_time          |
+------------+-------------+---------------+---------------------+
| 1          | 101         | Alice         | 2024-01-01 10:00:00 |
| 2          | 102         | Bob           | 2024-01-01 10:05:00 |
| 3          | 103         | Charlie       | 2024-01-01 10:10:00 |
+------------+-------------+---------------+---------------------+
缓存过期后(10 分钟 TTL 到期):更新了
+------------+-------------+------------------+---------------------+
| order_id   | customer_id | customer_name    | order_time          |
+------------+-------------+------------------+---------------------+
| 1          | 101         | Alice Updated    | 2024-01-01 10:00:00 |
| 2          | 102         | Bob Updated      | 2024-01-01 10:05:00 |
| 3          | 103         | Charlie          | 2024-01-01 10:10:00 |
+------------+-------------+------------------+---------------------+
------------------------------------------------------------------------------------
2. 不启用缓存
表现:
(1)每次查询都会直接访问 MySQL,查询结果能够实时反映 dim_customer 表的最新数据。
(2)实时性强,但性能可能较差。
(3)预期结果如下: 

初始查询结果:
+------------+-------------+---------------+---------------------+
| order_id   | customer_id | customer_name | order_time          |
+------------+-------------+---------------+---------------------+
| 1          | 101         | Alice         | 2024-01-01 10:00:00 |
| 2          | 102         | Bob           | 2024-01-01 10:05:00 |
| 3          | 103         | Charlie       | 2024-01-01 10:10:00 |
+------------+-------------+---------------+---------------------+

更新 dim_customer 后:
+------------+-------------+------------------+---------------------+
| order_id   | customer_id | customer_name    | order_time          |
+------------+-------------+------------------+---------------------+
| 1          | 101         | Alice Updated    | 2024-01-01 10:00:00 |
| 2          | 102         | Bob Updated      | 2024-01-01 10:05:00 |
| 3          | 103         | Charlie          | 2024-01-01 10:10:00 |
+------------+-------------+------------------+---------------------+


http://www.kler.cn/a/451331.html

相关文章:

  • 《计算机组成及汇编语言原理》阅读笔记:p86-p115
  • 一文掌握如何编写可重复执行的SQL
  • Linux的启动流程
  • 2.利用docker进行gitlab服务器迁移
  • 企业数字化转型加速,现代 IT 如何用 Datadog 全面提升可观测性?
  • STM32HAL库中RTC闹钟设置时分秒,年月日
  • 区块链平台安全属性解释
  • C语言-结构体内存大小
  • 【产品更新】汇匠源保证金保函平台v2.0.23
  • TipTap编辑器:现代化的富文本编辑解决方案
  • 24/12/24 力扣每日一题 # LeetCode 524. 通过删除字母匹配到字典里最长单词 全网最详细解释
  • Vue中使用a标签下载静态资源文件(比如excel、pdf等),纯前端操作
  • 【落羽的落羽 C语言篇】自定义类型——联合体、枚举
  • 【每日学点鸿蒙知识】沙箱目录、图片压缩、characteristicsArray、gm-crypto 国密加解密、通知权限
  • PyTorch 神经网络回归(Regression)任务:关系拟合与优化过程
  • 首次接触结构安全自动化监测系统,价格高吗?后期维护?
  • FreeRTOS的任务挂起和恢复
  • 高阶:基于Python paddleocr库 提取pdf 文档高亮显示的内容
  • eNSP安装教程(内含安装包)
  • 如何制作期末成绩查询小程序系统?
  • 【magic-dash】01:magic-dash创建单页面应用及二次开发
  • Cornerstone3d 基础概念
  • ECharts散点图-气泡图,附视频讲解与代码下载
  • Pytorch文件夹结构
  • 2024 年12月英语六级CET6听力原文(Long Conersation和Passage)
  • Java期末复习JDBC|网课笔记+校课总结