clickhouse迁移工具clickhouse-copier
原来的集群没有高可用而且配置很低,所有准备将一个3分片1副本集群迁到1分片2副本的高配置集群,数据量比较大。
虽然官方已经标记clickhouse-copier已经过时,但为了方便和高效还是用了copier,效果还挺好
以下是使用步骤:
- 服务器上下载copier解压后,首先需要准备keeper.xml文件,目标端使用的是clickhouse-keeper,这里配置还是zookeeper但是不影响
<!-- config.xml -->
<clickhouse><logger><level>trace</level><size>100M</size><log>./log/clickhouse-copier/copier/log.log</log><errorlog>./log/clickhouse-copier/copier/log.err.log</errorlog><count>3</count></logger># 填入对应的 keeper的ip地址<zookeeper><node index="1"><host>172.23.3.44</host><port>9181</port></node><node index="2"><host>172.23.3.55</host><port>9181</port></node><node index="3"><host>172.23.3.66</host><port>9181</port></node></zookeeper></clickhouse>
- 准备迁移任务文件task_migrate_all.xml,可以将需要迁移的表放在一个文件,或者分成多个文件都可,但保证需要迁移的表在源集群所有节点都有,可以建空表来跳过检查,否则会报错
<clickhouse><remote_servers>##源集群地址<source_cluster><shard><internal_replication>false</internal_replication><replica><host>172.23.3.11</host> <port>9000</port><user>user_name</user><password>pwd</password></replica></shard><shard><internal_replication>false</internal_replication><replica><host>172.23.3.12</host> <port>9000</port><user>user_name</user><password>pwd</password></replica></shard><shard><internal_replication>false</internal_replication><replica><host>172.23.3.13</host> <port>9000</port><user>user_name</user><password>pwd</password></replica></shard></source_cluster>#目标集群地址<destination_cluster><shard><internal_replication>true</internal_replication><replica><host>172.23.3.44</host> <port>9000</port><user>user_name</user><password>pwd</password></replica><replica><host>172.23.3.55</host> <port>9000</port><user>user_name</user><password>pwd</password></replica></shard></destination_cluster></remote_servers># 并发线程<max_workers>12</max_workers>#对源库只读<settings_pull><readonly>1</readonly></settings_pull><!-- Setting used to insert (push) data to destination cluster tables --><settings_push><readonly>0</readonly></settings_push><settings><connect_timeout>3</connect_timeout><!-- Sync insert is set forcibly, leave it here just in case. --><distributed_foreground_insert>1</distributed_foreground_insert></settings># 需要迁移的表,可以tables里面可以写多张表<tables><table_sms_send><cluster_pull>source_cluster</cluster_pull><database_pull>test_qwe</database_pull><table_pull>sms_send</table_pull><cluster_push>destination_cluster</cluster_push><database_push>test_qwe</database_push><table_push>sms_send</table_push># 因为是多副本,需要使用复制表,否则只有单节点有数据<engine>ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{cluster}/{shard}/{database}/{table}', '{replica}')PARTITION BY toYYYYMM(create_time)ORDER BY (id, phone_number, state, create_time)</engine># 1分片不需要分片键,这里设置为1<sharding_key>1</sharding_key></table_sms_send><table_use_car><cluster_pull>source_cluster</cluster_pull><database_pull>test_qwe</database_pull><table_pull>use_car</table_pull><cluster_push>destination_cluster</cluster_push><database_push>testqwe</database_push><table_push>use_car</table_push><engine>ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/{cluster}/{shard}/{database}/{table}', '{replica}', add_time)ORDER BY (id, detail_id)</engine><sharding_key>1</sharding_key></table_use_car></tables></clickhouse>
- 在clickhouse-copier解压目录执行命令开始迁移 .
# 这里task-path表示keeper存储的迁移任务路径
/clickhouse-copier --config keeper.xml --task-path /clickhouse/copier_task/mig_test --task-file=task_migrate_all.xml