gbase8s统计更新(UPDATE STATISTICS)介绍
统计更新作用
优化器确定执行sql的最有效策略,使用系统表的信息来确定最佳查询策略
更新系统表中 systables,sysindexes,sysdistrib, sysprocplan等系统表元数据信息,使数据库优化器能够生成最优 的查询计划,加快查询速度
统计更新级别
low,medium,high
区别
update statistics low for table t1;
收集表的rows,page,索引等信息
更新 systables、syscolumns 和 和 sysindexes 系统目录表,但不更新数据分布
update statistics low for table t1 (c1) drop distributions;
更新 systables、syscolumns 和 和 sysindexes 系统目录表的同时 强制删除c1列数据分布信息,如果不指定列则删除所有列的数据分布信息
update statistics low for table t1 (c1) drop distributions only;
删除数据分布的同时更新systables.version 列,而不收集任何 LOW 模式表和索引统计信息
update statistics medium for table t1;
update statistics high for table t1;
二者除了收集low级别的数据外还会生成新数据分布信息,数据分布信息存储在sysdistrib系统表中
部分数据如下
encdat AMkAAAAAAAAAkh_tOnCJ_z4AAIA_ACAwMDAwRUI3RDM2NDg0QzIwOEFDODcwOUIyNDkwMjA0QTg4NkNDMzQ2MTY2RUNEMTUAACmLozsAIDAxNTAxQTkyRTQyRTQ2RjRBMzU4MjVDNDYyQTFEMTFDODg2Q0MzNDYxNjZFQ0QxNQAAKYujOwAgMDI5QUVBMTY4RUExNDc4N0JDRUY2RDU2REM3Q0YwNDM4ODZDQzM0NjE2NkVDRDE1AAApi6M7ACAw encdat ADNERjk4RTBFQTFDNDBEOUEwNTQ4QzMzNzE4NkM2RjM4ODZDQzM0NjE2NkVDRDE1AAApi6M7ACAwNTFCRjNFNzdGNjA0NTU1OTY1ODI0RUIxMDRGRjYyQTg4NkNDMzQ2MTY2RUNEMTUAACmLozsAIDA2NjBGNERDNzRFMjQyNDRCRkNGQUEwMjFDODJBNzYyODg2Q0MzNDYxNjZFQ0QxNQAAKYujOwAgMDdBRDNCRTMzQkZBNDBGQkEyNThBMURF encdat ADIyNjI0NkY1ODg2Q0MzNDYxNjZFQ0QxNQAAKYujOwAgMDhGNkY4QTk3OUU3NDI3Q0IzRTk2RDhERjU4Nzc5Qjg4ODZDQzM0NjE2NkVDRDE1AAApi6M7ACAwQTNBMjk1N0E0MjE0OEExOTkwQkY5OTEzMzQ2MEI3NDg4NkNDMzQ2MTY2RUNEMTUAACmLozsAIDBCNzUzQkQ5QUVDMTRBOUJCNUZCRDZBRDBCQzA0MDQ0ODg2Q0MzNDYxNjZFQ0Qx encdat ADUAACmLozsAIDBDQThDRDY4OEM0MzRDMjc5QkUwQTkwRTZEOTk2NjE2ODg2Q0MzNDYxNjZFQ0QxNQAAKYujOwAgMEUwNDZCNjIwN0YzNDcwNEE4MTkwNzQ2NDAyM0Y4Njc4ODZDQzM0NjE2NkVDRDE1AAApi6M7ACAwRjNFNTIzN0M1NTI0NjQ4QjdFMURGRUVEOEMzNTI5Qjg4NkNDMzQ2MTY2RUNEMTUAACmLozsAIDEwN0M2ODhERkU4RjQz encdat ADk3QjNDNTJCMDU3ODUyOUZDMzg4NkNDMzQ2MTY2RUNEMTUAACmLozsAIDExQzcxQkI5MTdDNDRBMkU4MzNCQTMxOUMyM0ZFOTNGODg2Q0MzNDYxNjZFQ0QxNQAAKYujOwAgMTMwNENFMUM3RjJENEU4OTgyNjY0NDRDQ0NDMzlGMTQ4ODZDQzM0NjE2NkVDRDE1AAApi6M7ACAxNDU2QzU1REI0QjE0QTVEQTY2OUYxRkFEOUMxREU0NTg4NkND encdat ADM0NjE2NkVDRDE1AAApi6M7ACAxNUExRTU0ODNEOUU0RTgwQTM4RjZEQzFFOEFGMUY2RTg4NkNDMzQ2MTY2RUNEMTUAACmLozsAIDE2Rjc5N0JEODk4OTQzN0VCMEI1NjA4NDA1OUE2QUE1ODg2Q0MzNDYxNjZFQ0QxNQAAKYujOwAgMTgzRTlDRDcwMjI4NEUxRDkyQTVBRTgxRUMxRDhCMTM4ODZDQzM0NjE2NkVDRDE1AAApi6M7ACAxOTc3 encdat ADQyQkU4Rjg0NDg3OTgwQ0E2OUNEN0VEQ0Q3OUU4ODZDQzM0NjE2NkVDRDE1AAApi6M7ACAxQUFFRENDRUMyMDY0NjdGOEI5QkNGQTczNDRFMTNCRjg4NkNDMzQ2MTY2RUNEMTUAACmLozsAIDFDMEMyMzM1NzJENDQwRkU4MUNGMjVGQ0RFM0Q4RDEyODg2Q0MzNDYxNjZFQ0QxNQAAKYujOwAgMUQ2ODU0RThBNEYwNDdBRDlBODFEODZCRDc0 encdat ADQ0NDgzODg2Q0MzNDYxNjZFQ0QxNQAAKYujOwAgMUVBQjU4NUQ5NUE0NDFBRThDRTM1REU1RTEzOTc1NDM4ODZDQzM0NjE2NkVDRDE1AAApi6M7ACAxRkY4M0FDRTdBQjA0OERCQTUxOEVCQkIxQUEzOTAyMTg4NkNDMzQ2MTY2RUNEMTUAACmLozsAIDIxMjNFQTkwNzg4NTQzRTg4NEFFMTBGOTA0REI2RDZGODg2Q0MzNDYxNjZFQ0QxNQAA
二者差异为采样行为的不同
medium 默认的 bin 的采样的行的 percent 为 2.5 这个值将每一列的值的范围分成大约40 个间隔。
high 默认的 bin 的采样的行的 percent 为 0.5 这个值将每一列的值的范围分成大约 200 个间隔。
因采样行为不同所以二者执行时间也有差异 high 比medium 要慢上许多
查看统计更新的详细信息
使用set explain on可以输出统计更新的详细信息
例如
UPDATE STATISTICS:
==================Table: gbasedbt.tin
Mode: HIGH
Number of Bins: 268 Bin size 549
Sort data 6.6 MB Sort memory granted 6.9 MB
Estimated number of table scans 1
PASS #1 c1
Scan 0 Sort 0 Build 0 Insert 0 Close 0 Total 0
Completed pass 1 in 0 minutes 0 seconds
bins 指将数据范围划分为若干个连续区间,用于统计每个区间内数据出现的频次
bin 每个区间的大小
如上图
对tin 表ci列做high 级别统计更新
总数据量 为 109999 按照 high的200间隔取样 109999/200=549.995
共划分268 个连续区间
对c1列数据排序生成数据分布信息,排序数据为6.6M 申请内存 6.9M
查看数据分布信息
oncheck -hd tin -d test
{Distribution for gbasedbt.tin.c1Constructed on 2025-05-23 15:34:35.59131High Mode, 0.500000 Resolution--- DISTRIBUTION ---( 0000EB7D36484C208AC8709B2490204A )1: ( 549, 549, 01501A92E42E46F4A35825C462A1D11C )2: ( 549, 549, 029AEA168EA14787BCEF6D56DC7CF043 )3: ( 549, 549, 03DF98E0EA1C40D9A0548C337186C6F3 )4: ( 549, 549, 051BF3E77F604555965824EB104FF62A )5: ( 549, 549, 0660F4DC74E24244BFCFAA021C82A762 )6: ( 549, 549, 07AD3BE33BFA40FBA258A1DE226246F5 )7: ( 549, 549, 08F6F8A979E7427CB3E96D8DF58779B8 )8: ( 549, 549, 0A3A2957A42148A1990BF99133460B74 )9: ( 549, 549, 0B753BD9AEC14A9BB5FBD6AD0BC04044 )10: ( 549, 549, 0CA8CD688C434C279BE0A90E6D996616 )11: ( 549, 549, 0E046B6207F34704A81907464023F867 )12: ( 549, 549, 0F3E5237C5524648B7E1DFEED8C3529B )13: ( 549, 549, 107C688DFE8F4397B3C52B0578529FC3 )14: ( 549, 549, 11C71BB917C44A2E833BA319C23FE93F )15: ( 549, 549, 1304CE1C7F2D4E898266444CCCC39F14 )... 194: ( 549, 549, F7CDA741CECF47E8A3E51781DD5DCDDB )195: ( 549, 549, F918AB2A0466491E8CFE8223C0408E44 )196: ( 549, 549, FA615C7F221F42C690FE5983C5C22335 )197: ( 549, 549, FBB5AA81DFE342D0AD6BD429CC0E079C )198: ( 549, 549, FCFA0976475B46A8A41F4F17B747E54F )199: ( 549, 549, FE4388A4A6C34F89B210171C5A0FDC51 )200: ( 549, 549, FF905D14D4674ECA87DEE45C0A3B9BB4 )201: ( 199, 199, FFFFC9B4A8B04F3594DE66A87CBA6E19 )}
查看索引分布信息
oncheck -pT test:tin#index_tin_1
TBLspace Usage Report for test:gbasedbt.tinType Pages Empty Semi-Full Full Very-Full---------------- ---------- ---------- ---------- ---------- ----------Free 46Bit-Map 1Index 465Data (Home) 0----------Total Pages 512Unused Space SummaryUnused data slots 0Unused bytes per data page 4Total unused bytes in data pages 0Home Data Page Version SummaryVersion Count0 (current) 0Index Usage Report for index index_tin_1 on test:gbasedbt.tinAverage Average AverageLevel Total No. Keys Free Bytes Del Keys----- -------- -------- ---------- --------1 1 2 163062 2 231 66713 462 238 6117 0----- -------- -------- ---------- --------Total 465 237 6142 0
对例程的统计更新
update statistics for procedure p1;
update statistics for function f1;
update statistics ROUTINE p1;
作用 ,当例程中直接或间接引用的对象发生变化时 alter drop 等,
比如 p1 中引用了 t1.c1列 ,当发生 alter table t1 modify 时 ,p1可能会存在错误风险,此时统计更新后会生成新的执行计划,新计划中包含了 直接或间接引用的对象变化后的信息
最佳实践方案
对全表做 low级别统计更新
对单列索引做 high 级别统计更新
对联合索引最左列做 high 级别统计更新
对联合索引非最左列做medium 级别统计更新
对 自定义存储过程,函数做统计更新