多维度指标交叉计算查询方案
业务方经常需要把“来源不同、口径不同、甚至维度列不完全一致”的 N 个指标拉到同一张结果表里做加减乘除。
过去我们见过两种实现:
多层 FULL OUTER JOIN
每多一个指标就再 JOIN 一次,JOIN 条件里还要层层 COALESCE,SQL 很快失控。多层 CTE + FULL JOIN
比第一种稍好,但仍逃不掉 FULL JOIN 带来的复杂度 O(n²)。
本文用一套“行转列”范式(UNION ALL + SUM(CASE)
)彻底解决扩展性问题,并给出维度缺失、类型兼容、动态 SQL 等常见坑的填法。下文所有表名均已脱敏为 tbl_xxx
、tbl_yyy
,字段名保持原义便于阅读。
1. 需求抽象
维度 | 指标1 | 指标2 | 指标3 | 衍生计算 |
---|---|---|---|---|
dealer_id, dealer_name, month | 成交订单数 | 活跃门店数 | 退款订单数 | 指标1+指标2-指标3 |
三张表(或三个 SQL 条件)分别给出三个指标;
维度列可能缺失(如退款表没有 dealer_name);
后续还要加指标 4、5、6……
2. 错误示范:多层 FULL OUTER JOIN
WITH
t1 AS (SELECT dealer_id, dealer_name, month, cnt FROM tbl_order),
t2 AS (SELECT dealer_id, dealer_name, month, cnt FROM tbl_active),
t3 AS (SELECT dealer_id, month, cnt FROM tbl_refund) -- 少了 dealer_name
SELECTCOALESCE(t1.dealer_id, t2.dealer_id, t3.dealer_id) AS dealer_id,COALESCE(t1.dealer_name, t2.dealer_name, 'Unknown') AS dealer_name,COALESCE(t1.month, t2.month, t3.month) AS month,COALESCE(t1.cnt,0) + COALESCE(t2.cnt,0) - COALESCE(t3.cnt,0) AS final_cnt
FROM t1
FULL JOIN t2 ON ...
FULL JOIN t3 ON ...;
问题
每个新指标都要再写一段 FULL JOIN;
COALESCE
链指数级增长;维度缺失时对齐麻烦。
实现1
with temp as (select count(distinct order_no) as "one", dealer_name, dealer_id, to_char(order_create_time,'YYYY-MM-DD') as "月"from dms.ordergroup by dealer_name, dealer_id, "月"
),
temp2 as (select count(distinct order_no) as "two", dealer_name, dealer_id, to_char(order_create_time,'YYYY-MM-DD') as "月"from dms.order -- 假设这里应该是不同的表或不同条件的查询group by dealer_name, dealer_id, "月"
),
temp3 as (select count(distinct order_no) as "three", dealer_name, dealer_id, to_char(order_create_time,'YYYY-MM-DD') as "月"from dms.order -- 假设这里应该是不同的表或不同条件的查询group by dealer_name, dealer_id, "月"
),
temp4 as (select count(distinct order_no) as "four", dealer_name, dealer_id, to_char(order_create_time,'YYYY-MM-DD') as "月"from dms.order -- 假设这里应该是不同的表或不同条件的查询group by dealer_name, dealer_id, "月"
)
selectcoalesce(temp.one, 0) + coalesce(temp2.two, 0) + coalesce(temp3.three, 0) + coalesce(temp4.four, 0) as "total",coalesce(temp.dealer_name, temp2.dealer_name, temp3.dealer_name, temp4.dealer_name) as "dealer_name",coalesce(temp.dealer_id, temp2.dealer_id, temp3.dealer_id, temp4.dealer_id) as "dealer_id",coalesce(temp."月", temp2."月", temp3."月", temp4."月") as "月"
from temp
full outer join temp2
on temp.dealer_name = temp2.dealer_name and temp.dealer_id = temp2.dealer_id and temp."月" = temp2."月"
full outer join temp3
on coalesce(temp.dealer_name, temp2.dealer_name) = temp3.dealer_nameand coalesce(temp.dealer_id, temp2.dealer_id) = temp3.dealer_idand coalesce(temp."月", temp2."月") = temp3."月"
full outer join temp4
on coalesce(temp.dealer_name, temp2.dealer_name, temp3.dealer_name) = temp4.dealer_nameand coalesce(temp.dealer_id, temp2.dealer_id, temp3.dealer_id) = temp4.dealer_idand coalesce(temp."月", temp2."月", temp3."月") = temp4."月";
实现2
WITH
a AS (select count(distinct dealer_name) AS Indicator,install_province_name AS Province from "dma"."main_bb" group by install_province_name),
b AS (SELECT count(distinct dealer_team_name) AS Indicator,install_province_name AS Province from "dma"."dma_dop_dlsjyfxyb_main_bb" group by install_province_name)
SELECT COALESCE(a.Indicator,b.Indicator) AS Indicator,COALESCE(a.Province,b.Province) AS Province,-- 把 NULL 变 0,再计算表达式(COALESCE(a.Indicator,0) * (COALESCE(b.Indicator,0) - COALESCE(a.Indicator,0)/2 * (COALESCE(b.Indicator,0)+3))) AS final_val
FROM a
FULL JOIN b USING (Province)
3. 正确范式:行转列(UNION ALL + SUM(CASE))
WITH all_metrics AS (/* 指标1:成交订单 */SELECTdealer_id,COALESCE(dealer_name, 'Unknown') AS dealer_name,month,cnt AS metric_value,'order_cnt' AS metric_typeFROM tbl_orderUNION ALL/* 指标2:活跃门店 */SELECTdealer_id,COALESCE(dealer_name, 'Unknown') AS dealer_name,month,cnt,'active_cnt'FROM tbl_activeUNION ALL/* 指标3:退款订单 */SELECTdealer_id,'Unknown' AS dealer_name, -- 退款表本身无 dealer_namemonth,cnt,'refund_cnt'FROM tbl_refund/* 要继续加指标,往下再 UNION ALL 即可 */
)SELECTdealer_id,dealer_name,month,SUM(CASE WHEN metric_type = 'order_cnt' THEN metric_value ELSE 0 END) AS order_cnt,SUM(CASE WHEN metric_type = 'active_cnt' THEN metric_value ELSE 0 END) AS active_cnt,SUM(CASE WHEN metric_type = 'refund_cnt' THEN metric_value ELSE 0 END) AS refund_cnt,SUM(CASE WHEN metric_type = 'order_cnt' THEN metric_value ELSE 0 END) +SUM(CASE WHEN metric_type = 'active_cnt' THEN metric_value ELSE 0 END) -SUM(CASE WHEN metric_type = 'refund_cnt' THEN metric_value ELSE 0 END) AS final_cnt
FROM all_metrics
GROUP BY dealer_id, dealer_name, month;
与之前几种实现的对比
维度 | 你现在的写法 | 实现1 (多层 FULL JOIN) | 实现2 (多层 CTE + FULL JOIN) |
---|---|---|---|
扩展性 | ★★★★★ | ★ | ★★ |
SQL 引擎优化 | UNION ALL → 哈希聚合,执行计划简单 | 多次扫描主表 + 多路 FULL JOIN,更易写出笛卡尔 | 同上 |
NULL 处理 | 用 0 占位,无需 COALESCE 链 | 需要大量 COALESCE | 需要 COALESCE |
可读性 | 指标逻辑彼此独立,一眼能看明白 | 超过 3 个 JOIN 就没人愿意看 | 中等 |
维度不一致 | 可以单独补 NULL 或用默认值 | 必须保证所有维度列都能对齐 | 必须保证所有维度列都能对齐 |
结论:你现在的写法就是 “行转列”范式,在扩展性、可读性、性能三个维度上,全面优于实现1/2,也优于“列转行”方案。
4. 常见疑问 & 填坑指南
疑问 | 结论 |
---|---|
维度列缺失会报错吗? | 不会,只要列数一致、对应位置类型兼容即可。缺失列用常量或 NULL 占位。 |
子查询里的别名重要吗? | 不重要。最终列名以第一条 SELECT 为准。 |
类型必须完全一致吗? | 只需隐式可转换(INT↔BIGINT、DATE↔TIMESTAMP),否则需显式 CAST。 |
指标爆炸怎么办? |
5. 性能小贴士
每个子查询尽量只扫需要的列和分区,避免
SELECT *
;维度列建立联合索引
(dealer_id, month)
可显著减少 GROUP BY 的 Hash/排序开销;指标量级差异极大时可考虑把
all_metrics
写成临时表再聚合。
6. 小结
拒绝多层 FULL JOIN,复杂度 O(n²)。
UNION ALL + SUM(CASE) 是当前最通用、最易扩展的范式,复杂度 O(n)。
维度列缺失用常量补位,类型不兼容用 CAST,指标过多用动态 SQL。
把这套范式固化成模板后,再加 10 个、100 个指标都只是一行配置的事。Happy SQL!