当前位置: 首页 > news >正文

paimon中批和流查看过去的快照的数据及变动的数据

1、批处理

创建表并插入三条数据

CREATE TABLE ws_t (id INT,ts BIGINT,vc INT,PRIMARY KEY (id) NOT ENFORCED
);
INSERT INTO ws_t VALUES(2,2,2),(3,3,3),(4,4,4),(5,5,5);
--设置执行模式为批处理
RESET 'execution.checkpointing.interval';
SET 'execution.runtime-mode' = 'batch';
+----+----+----+
| id | ts | vc |
+----+----+----+
|  2 |  2 |  2 |
|  3 |  3 |  3 |
|  4 |  4 |  4 |
|  5 |  5 |  5 |
+----+----+----+
4 rows in set

变动两次数据让数据产生新的快照

第一次删除一条数据
delte from ws_t where id=2;
+----+----+----+
| id | ts | vc |
+----+----+----+
|  3 |  3 |  3 |
|  4 |  4 |  4 |
|  5 |  5 |  5 |
+----+----+----+
3 rows in set
----------------------------------------
第二次变更一次数据
update ws_t set ts=6 where id=3;
+----+----+----+
| id | ts | vc |
+----+----+----+
|  3 |  6 |  3 |
|  4 |  4 |  4 |
|  5 |  5 |  5 |
+----+----+----+
3 rows in set

现在在hdfs上有三个快照

1.1、通过指定快照id来查询数据

SELECT * FROM ws_t /*+ OPTIONS('scan.snapshot-id' = '1') */;

1.2、读取指定时间戳的快照

SELECT * FROM ws_t$snapshots;
+-------------+-----------+--------------------------------+---------------------+-------------+-------------------------+--------------------+--------------------+------------------------+----------------------+
| snapshot_id | schema_id |                    commit_user |   commit_identifier | commit_kind |             commit_time | total_record_count | delta_record_count | changelog_record_count |            watermark |
+-------------+-----------+--------------------------------+---------------------+-------------+-------------------------+--------------------+--------------------+------------------------+----------------------+
|           1 |         0 | fdf3e8ff-d8e4-47a6-a9b2-759... | 9223372036854775807 |      APPEND | 2025-05-13 17:22:18.457 |                  4 |                  4 |                      0 | -9223372036854775808 |
|           2 |         0 | a631b852-19f4-463b-aee8-67a... | 9223372036854775807 |      APPEND | 2025-05-13 17:30:21.173 |                  5 |                  1 |                      0 | -9223372036854775808 |
|           3 |         0 | 9940bb36-7e55-40b7-90cf-cc1... | 9223372036854775807 |      APPEND | 2025-05-13 17:51:10.679 |                  6 |                  1 |                      0 | -9223372036854775808 |
+-------------+-----------+--------------------------------+---------------------+-------------+-------------------------+--------------------+--------------------+------------------------+----------------------+
需要把时间转换为数字,也可以直接去hdfs页面查询
SELECT * FROM ws_t /*+ OPTIONS('scan.timestamp-millis' = '1688369660841') */;

1.3、增量查询(查询几个快照之间的变化)

只能查询出修改的
SELECT * FROM ws_t /*+ OPTIONS('incremental-between' = '1,3') */;
+----+----+----+
| id | ts | vc |
+----+----+----+
|  3 |  6 |  3 |
+----+----+----+
1 row in setaudit_log 系统表能查询出不同快照之间删除的数据
SELECT * FROM ws_t$audit_log /*+ OPTIONS('incremental-between' = '1,2') */;
+---------+----+----+----+
| rowkind | id | ts | vc |
+---------+----+----+----+
|      -D |  2 |  2 |  2 |
+---------+----+----+----+
1 row in set

2、流式查询

先把环境修改成流式

SET 'execution.checkpointing.interval'='30s';
SET 'execution.runtime-mode' = 'streaming';

op值对应的含义

+I:插入操作。-U:使用更新行的先前内容进行更新操作。+U:使用更新行的新内容进行更新操作。-D:删除操作。

2.1从指定快照id开始读取变更数据

SELECT * FROM ws_t /*+ OPTIONS('scan.snapshot-id' = '1') */;
+----+-------------+----------------------+-------------+
| op |          id |                   ts |          vc |
+----+-------------+----------------------+-------------+
| +I |           2 |                    2 |           2 |
| +I |           3 |                    3 |           3 |
| +I |           4 |                    4 |           4 |
| +I |           5 |                    5 |           5 || -D |           2 |                    2 |           2 |
| -U |           3 |                    3 |           3 |
| +U |           3 |                    6 |           3 |

2.2、从指定时间戳开始读取

SELECT * FROM ws_t /*+ OPTIONS('scan.timestamp-millis' = '1688369660841') */;

2.3、第一次启动时读取指定快照数据,并继续读取变化

SELECT * FROM ws_t /*+ OPTIONS('scan.mode'='from-snapshot-full','scan.snapshot-id' = '2') */;
+----+-------------+----------------------+-------------+
| op |          id |                   ts |          vc |
+----+-------------+----------------------+-------------+
| +I |           3 |                    3 |           3 |
| +I |           4 |                    4 |           4 |
| +I |           5 |                    5 |           5 |
| -U |           3 |                    3 |           3 |
| +U |           3 |                    6 |           3 |

http://www.xdnf.cn/news/429859.html

相关文章:

  • #S4U2SELF#S4U2Proxy#CVE-2021-42278/42287以及手动复现
  • 脑机接口技术:开启人类与机器融合的新时代
  • 《从像素到身份:Flutter如何打通社交应用人脸识别的技术闭环》
  • 本地缓存的三种实现
  • 检索增强生成(RAG)简介
  • Codeforces Round 998 (Div. 3)
  • STM32F103_LL库+寄存器学习笔记22 - 基础定时器TIM实现1ms周期回调
  • 深入浅出:C++数据处理类与计算机网络的巧妙类比
  • Oracle OCP认证考试考点详解083系列15
  • CVE-2016-4977 漏洞深度分析
  • TensorFlow之微分求导
  • 力扣-101.对称二叉树
  • JIT+Opcache如何配置才能达到性能最优
  • Spring Boot 自动装配原理详解
  • openGauss与海量100数据库对比
  • vscode点击函数名/变量/文件名无法跳转
  • 项目图标组件处理
  • LabVIEW 程序运行时内存不足报错原因
  • STM32 __rt_entry
  • AD19基础应用技巧:Via 尺寸设置界面 (Size and Shape)
  • AI视频生成工具开发与搭建:从技术到应用的全方位指南
  • linux中fork()函数的小问题
  • solidwors插件库收集
  • 社区商业增值服务生态薄弱?停车反哺+商户联盟激活双向收益
  • 最大子段和(递推)
  • 2.4GHz无线通信芯片选型指南:集成SOC与低功耗方案解析
  • Python+1688 API 开发教程:实现商品实时数据采集的完整接入方案
  • 云蝠智能大模型呼叫接入通义千问qwen3模型!
  • 2025年RIS SCI2区,改进白鲸优化算法+复杂非线性方程组求解,深度解析+性能实测
  • 超标量处理器设计5-指令集体系