当前位置: 首页 > ds >正文

数据库 | 使用timescaledb和大模型进行数据分析

时序数据库:timescaledb
大模型:通义千问2.5

对话开始前提示词:

我正在做数据分析,以下是已知信息:
数据库:timescaledb,表名:dm_tag_value,tag_name列是位号名,app_time列是时间,tag_value列是值计算前要转换下CAST(tag_value AS FLOAT),
其中DRLZM.FIC3101A.DACA.PV是位号1,DRLZM.AT0502A.DACA.PV是位号2,TAG82是位号3
表数据如下:
---
"id","ds_id","tag_name","tag_value","tag_time","app_time","quality","create_time"
1002364859414740992,,DRLZM.FIC3101A.DACA.PV,"20.695463180541992",2024-05-29 00:00:04.000,2024-05-29 00:00:04.000,192,2024-05-29 00:00:05.000
1002364859435712513,,DRLZM.AT0502A.DACA.PV,-24.399999618530273,2024-05-29 00:00:04.000,2024-05-29 00:00:04.000,192,2024-05-29 00:00:05.000
1002364859460878336,,TAG40,"20.695463180541992",2024-05-29 00:00:04.000,2024-05-29 00:00:05.000,192,2024-05-29 00:00:05.000
1002364859469266945,,DRLZM.AT0502A.DACA.PV,"21.059999465942383",2024-05-29 00:00:04.000,2024-05-29 00:00:04.000,192,2024-05-29 00:00:05.000
1002364863663570944,,DRLZM.AT0502A.DACA.PV,-62.939998626708984,2024-05-29 00:00:04.000,2024-05-29 00:00:05.000,192,2024-05-29 00:00:06.000
1002364863684542465,,TAG7000_ISYS7.1,-6.4000000953674316,2024-05-29 00:00:04.000,2024-05-29 00:00:05.000,192,2024-05-29 00:00:06.000
1002364863709708288,,TAG6000_ISYS7.1,-84.94000244140625,2024-05-29 00:00:05.000,2024-05-29 00:00:05.000,192,2024-05-29 00:00:06.000
1002364863709708288,,TAG6000_ISYS7.1,-84.94000244140625,2024-05-29 00:00:05.000,2024-05-29 00:00:05.000,192,2024-05-29 00:00:06.000
1002364880306569217,,DRLZM.FIC3101A.DACA.PV,"4.3653912544250488",2024-05-29 00:00:09.000,2024-05-29 00:00:09.000,192,2024-05-29 00:00:10.000
1002364901273894912,,DRLZM.FIC3101A.DACA.PV,"0.26307547092437744",2024-05-29 00:00:14.000,2024-05-29 00:00:14.000,192,2024-05-29 00:00:15.000
1002364922245414913,,DRLZM.FIC3101A.DACA.PV,"9.4877281188964844",2024-05-29 00:00:19.000,2024-05-29 00:00:19.000,192,2024-05-29 00:00:20.000
1002364943279849472,,TAG81,"20.695463180541992",2024-05-29 00:00:04.000,2024-05-29 00:00:04.000,192,2024-05-29 00:00:25.000
1002364943279849473,,TAG81,"9.4877281188964844",2024-05-29 00:00:19.000,2024-05-29 00:00:19.000,192,2024-05-29 00:00:25.000
1002364943279849474,,TAG82,"20.695463180541992",2024-05-29 00:00:04.000,2024-05-29 00:00:04.000,192,2024-05-29 00:00:25.000
1002364943279849475,,TAG82,"9.4877281188964844",2024-05-29 00:00:19.000,2024-05-29 00:00:19.000,192,2024-05-29 00:00:25.000
1002364943279849476,,TAG83,"20.695463180541992",2024-05-29 00:00:04.000,2024-05-29 00:00:04.000,192,2024-05-29 00:00:25.000
---
根据上面的已知信息,我会问一系列数据分析的问题,分析位号、时间和值,每个问题的回答要求:
生成SQL语句,要避免ambiguous错, 注意:(仅输出SQL查询语句,不要包含任何其他文本或解释)

测试CASE

**提示词:**将今天位号1数据从小到大排序,请注意输出格式,我只需要SQL查询语句,不需要其他解释或信息.
返回:

SELECT app_time, CAST(tag_value AS FLOAT) AS value
FROM dm_tag_value
WHERE tag_name = 'DRLZM.FIC3101A.DACA.PV' AND DATE(app_time) = CURRENT_DATE
ORDER BY value ASC;

**提示词:**使用线性拟合今天位号1与位号2数据,我只需要SQL查询语句,不需要其他解释或信息.
不支持,需要UDF配合
**提示词:**使用均值滤波对位号1进行处理,我只需要SQL查询语句,不需要其他解释或信息.
返回:

WITH ordered_data AS (SELECTapp_time,CAST(tag_value AS FLOAT) AS value,ROW_NUMBER() OVER (ORDER BY app_time) AS row_numFROM dm_tag_valueWHERE tag_name = 'DRLZM.FIC3101A.DACA.PV' AND DATE(app_time) = CURRENT_DATE
),
smoothed_data AS (SELECTo1.app_time,AVG(o1.value + COALESCE(o2.value, o1.value) + COALESCE(o3.value, o1.value)) AS mean_filtered_valueFROM ordered_data o1LEFT JOIN ordered_data o2 ON o1.row_num = o2.row_num - 1LEFT JOIN ordered_data o3 ON o1.row_num = o3.row_num + 1GROUP BY o1.app_time, o1.value
)
SELECT * FROM smoothed_data;

**提示词:**针对今天A位号1与位号2数据做相关性分析,请注意输出格式,我只需要SQL查询语句,不需要其他解释或信息.
返回:

WITH today_data AS (SELECT CAST(tag_value AS FLOAT) AS value, LAG(CAST(tag_value AS FLOAT)) OVER (PARTITION BY tag_name ORDER BY app_time) AS prev_value,app_timeFROM dm_tag_valueWHERE tag_name IN ('DRLZM.FIC3101A.DACA.PV', 'DRLZM.AT0502A.DACA.PV')AND DATE(app_time) = CURRENT_DATE
),
correlation_data AS (SELECTCASE WHEN tag_name = 'DRLZM.FIC3101A.DACA.PV' THEN value ELSE prev_value END AS value_a,CASE WHEN tag_name = 'DRLZM.AT0502A.DACA.PV' THEN value ELSE prev_value END AS value_bFROM today_dataWHERE tag_name IN ('DRLZM.FIC3101A.DACA.PV', 'DRLZM.AT0502A.DACA.PV')
),
agg_data AS (SELECTSUM(value_a * value_b) AS sum_ab,SUM(value_a * value_a) AS sum_aa,SUM(value_b * value_b) AS sum_bb,COUNT(*) AS countFROM correlation_data
)
SELECT(sum_ab * count - SUM(value_a) * SUM(value_b)) / (SQRT(sum_aa * count - POWER(SUM(value_a), 2)) * SQRT(sum_bb * count - POWER(SUM(value_b), 2))) AS correlation_coefficient
FROM agg_data;
------------------------------需要修改后才能运行WITH today_data AS (SELECT tag_name,CAST(tag_value AS FLOAT) AS value, LAG(CAST(tag_value AS FLOAT)) OVER (PARTITION BY tag_name ORDER BY app_time) AS prev_value,app_timeFROM dm_tag_valueWHERE tag_name IN ('DRLZM.FIC3101A.DACA.PV', 'DRLZM.AT0502A.DACA.PV')AND DATE(app_time) = CURRENT_DATE
),
correlation_data AS (SELECTCASE WHEN tag_name = 'DRLZM.FIC3101A.DACA.PV' THEN value ELSE prev_value END AS value_a,CASE WHEN tag_name = 'DRLZM.AT0502A.DACA.PV' THEN value ELSE prev_value END AS value_bFROM today_dataWHERE tag_name IN ('DRLZM.FIC3101A.DACA.PV', 'DRLZM.AT0502A.DACA.PV')
),
agg_data AS (selectsum(value_a) as sum_a,sum(value_b) as sum_b,SUM(value_a * value_b) AS sum_ab,SUM(value_a * value_a) AS sum_aa,SUM(value_b * value_b) AS sum_bb,COUNT(*) AS countFROM correlation_data
)
SELECT(sum_ab * count - sum_a * sum_b) / (SQRT(sum_aa * count - POWER(sum_a, 2)) * SQRT(sum_bb * count - POWER(sum_b, 2))) AS correlation_coefficient
FROM agg_data;

**提示词:**我们把超过平均值1个标准差的数据定义为异常值,请你给出今天位号1异常值有哪些
请仅输出SQL查询语句,不要包含任何其他文本或解释
返回:

WITH today_data AS (SELECT CAST(tag_value AS FLOAT) AS value,app_timeFROM dm_tag_valueWHERE tag_name = 'DRLZM.FIC3101A.DACA.PV'AND DATE(app_time) = CURRENT_DATE
),
stats AS (SELECTAVG(value) AS average,STDDEV(value) AS std_devFROM today_data
)
SELECTtd.app_time,td.value
FROM today_data td
JOIN stats ON TRUE
WHERE td.value > stats.average + stats.std_dev OR td.value < stats.average - stats.std_dev;

**提示词:**给出本月位号1与位号2对比趋势,请仅输出SQL查询语句,不要包含任何其他文本或解释
返回:

SELECT
dv1.app_time,
CAST(dv1.tag_value AS FLOAT) AS DRLZM_FIC3101A_DACA_PV_value,
CAST(dv2.tag_value AS FLOAT) AS DRLZM_AT0502A_DACA_PV_value
FROM
dm_tag_value dv1
JOIN
dm_tag_value dv2
ON dv1.app_time = dv2.app_time
WHERE
dv1.tag_name = 'DRLZM.FIC3101A.DACA.PV'
AND dv2.tag_name = 'DRLZM.AT0502A.DACA.PV'
AND dv1.app_time >= DATE_TRUNC('month', CURRENT_DATE)
AND dv1.app_time < DATE_TRUNC('month', CURRENT_DATE) + INTERVAL '1 month'
ORDER BY
dv1.app_time;

**提示词:**分析今日位号1的平均值和最大值,请仅输出SQL查询语句,不要包含任何其他文本或解释
返回:

SELECT
AVG(CAST(tag_value AS FLOAT)) AS TAG80_avg,
MAX(CAST(tag_value AS FLOAT)) AS TAG80_max
FROM
dm_tag_value
WHERE
tag_name = 'DRLZM.FIC3101A.DACA.PV'
AND app_time >= CURRENT_DATE
AND app_time < CURRENT_DATE + INTERVAL '1 day';

**提示词:**分析今日位号1如何变化,按小时取平均,请仅输出SQL查询语句,不要包含任何其他文本或解释
返回:

SELECT
date_trunc('hour', app_time) AS hour_bucket,
AVG(CAST(tag_value AS FLOAT)) AS TAG80_avg
FROM
dm_tag_value
WHERE
tag_name = 'DRLZM.FIC3101A.DACA.PV'
AND app_time >= CURRENT_DATE
AND app_time < CURRENT_DATE + INTERVAL '1 day'
GROUP BY
hour_bucket
ORDER BY
hour_bucket;

**提示词:**分析位号1和位号2这个月的趋势,按小时分组取平均,每个位号一列,请仅输出SQL查询语句,不要包含任何其他文本或解释
返回:

SELECT DATE_TRUNC('hour', t1.app_time) AS hour,AVG(CAST(t1.tag_value AS FLOAT)) AS FIC3101A_avg_value,AVG(CAST(t2.tag_value AS FLOAT)) AS AT0502A_avg_value
FROM dm_tag_value t1
JOIN dm_tag_value t2 ON DATE_TRUNC('hour', t1.app_time) = DATE_TRUNC('hour', t2.app_time)
WHERE t1.tag_name = 'DRLZM.FIC3101A.DACA.PV' AND t2.tag_name = 'DRLZM.AT0502A.DACA.PV' AND EXTRACT(MONTH FROM t1.app_time) = EXTRACT(MONTH FROM CURRENT_DATE) AND EXTRACT(YEAR FROM t1.app_time) = EXTRACT(YEAR FROM CURRENT_DATE)
GROUP BY DATE_TRUNC('hour', t1.app_time)
ORDER BY hour;

**提示词:**分析位号1,取整点数据,请仅输出SQL查询语句,不要包含任何其他文本或解释
返回:

SELECT app_time, CAST(tag_value AS FLOAT) AS value
FROM dm_tag_value
WHERE tag_name = 'DRLZM.FIC3101A.DACA.PV' AND EXTRACT(MINUTE FROM app_time) = 0 AND EXTRACT(SECOND FROM app_time) = 0
ORDER BY app_time;

总结

用大模型对时序数据进行分析,对于简单的数据分析没有问题,对于复杂或高级的分析有概率说错或处理不了

http://www.xdnf.cn/news/10027.html

相关文章:

  • Allegro 输出生产数据详解
  • BigemapPro 数据坐标转度分秒格式教程
  • C++_核心编程_ 左移运算符重载 “<<” 左移运算符
  • 解释k8s种ConfigMap和Secret的作用,如何在Pod中挂载环境变
  • PortSwigger-04-cors
  • 快递物流查询接口如何用C#进行调用?
  • 微机消谐装置:保障PT柜安全的核心设备
  • 第二章支线一:影之殿堂 · 阴影与过渡魔法
  • 下一代数模ai智能体研发意见征集
  • Win11安装Dify
  • 【Linux】(1)—进程概念-③Linux进程概念与PCB
  • 重温经典算法——选择排序
  • 从“刚性扩容”到“弹性供给”:移动充电服务重构配电网边际成本
  • Node.js 中 require 源码解析
  • 从零起步认知微内核操作系统(AGI)
  • JavaScript 循环语句总结
  • Linux浅谈
  • VMware安装esd格式系统镜像
  • Java实现命令行图书管理系统(附完整源码)
  • 使用 C/C++ 和 OpenCV 调用摄像头
  • Java打包在加载外面的yaml文件
  • 2025LitCTF 复现
  • 【SQL】关键字
  • 【笔记】suna部署之获取 Supabase API key 和 project URL
  • 【笔记】suna部署之获取 Tavily API key
  • 第42节:模型优化与部署:Web服务部署(Flask, FastAPI)
  • Arduino LCD 1602液晶显示器2(I2C总线)
  • ubuntu 22.04 编译安装nignx 报错 openssl 问题
  • 【VMware Workstation 17 Pro许可证密钥】
  • 如何站在指标体系之巅看智能数据建模产品(GAI)