重要实现思路
- 1、在clickhouse中创建MySQL引擎表。
- 2、根据MySQL引擎表的信息创建目标表。
- 3、实现canal实时增量同步MySQL数据到clickhouse。
MySQL 的准备
修改设置文件开启 Binlog
- [root@hadoop100 module]$ sudo vim /etc/my.cnf
- server-id=1
- log-bin=mysql-bin
- binlog_format=row
- binlog-do-db=test
复制代码
- 注意:binlog-do-db 根据本身的情况进行修改,指定具体要同步的数据库,如果不设置则表现所有数据库均开启 Binlog。
MySQL中需要同步的库表
- MySQL [testck]> show databases;
- +--------------------+
- | Database |
- +--------------------+
- | information_schema |
- | innodb |
- | mysql |
- | performance_schema |
- | sys |
- | testck |
- | tmp |
- +--------------------+
- 11 rows in set (0.00 sec)
- MySQL [testck]> use testck;
- Reading table information for completion of table and column names
- You can turn off this feature to get a quicker startup with -A
- Database changed
- MySQL [testck]> show tables;
- +------------------+
- | Tables_in_testck |
- +------------------+
- | t_organization |
- | t_user |
- +------------------+
- 2 rows in set (0.00 sec)
- MySQL [testck]>
复制代码 clickhouse 的准备
- canal实时同步MySQL的数据需要在clickhouse中提前建好库表
在clickhouse建MySQL引擎的库
- 把MySQL某个库中的所有表结构信息映射到clickhouse中,这样在clickhouse中就可以远程操作MySQL。
- cnnxpredn02 :) CREATE DATABASE t_tmp ENGINE = MySQL('ip:3306', 'testck', 'user', 'pass');
- CREATE DATABASE t_tmp
- ENGINE = MySQL('ip:3306', 'testck', 'user', 'pass')
- Query id: 9f6d7179-3c97-47c2-93ea-abc0c6ca873b
- Ok.
- 0 rows in set. Elapsed: 0.013 sec.
- cnnxpredn02 :) show databases;
- SHOW DATABASES
- Query id: 5b5baaf5-86f5-46c1-ac34-2618c34462f1
- ┌─name────┐
- │ default │
- │ system │
- │ t_tmp │
- └─────────┘
- 3 rows in set. Elapsed: 0.009 sec.
- cnnxpredn02 :)
复制代码 创建clickhouse中的库
- cnnxpredn02 :) create database testck;
- CREATE DATABASE testck
- Query id: 397261c0-a8f0-48c6-b4f6-71d121b975b8
- Ok.
- 0 rows in set. Elapsed: 0.004 sec.
- cnnxvopredn02 :) show databases;
- SHOW DATABASES
- Query id: f467f4bb-99fa-4322-a3c1-e33be74b6e81
- ┌─name────┐
- │ default │
- │ system │
- │ t_tmp │
- │ testck │
- └─────────┘
- 4 rows in set. Elapsed: 0.002 sec.
- cnnxvpredn02 :)
复制代码 根据创建的MySQL引擎表创建clickhouse目标表结构
- cnnxvpredn02 :) create table testck.t_organization as t_tmp.t_organization;
- :-] create table testck.t_user as t_tmp.t_user;
- CREATE TABLE testck.t_organization AS t_tmp.t_organization
- Query id: f942cf5d-701f-4dbd-9ffd-de2206eec851
- Ok.
- 0 rows in set. Elapsed: 0.006 sec.
- CREATE TABLE testck.t_user AS t_tmp.t_user
- Query id: ef7dddd7-64b2-4dbc-88be-a105b49aff57
- Ok.
- 0 rows in set. Elapsed: 0.004 sec.
- cnnxvopredn02 :)
复制代码 Canal 的下载和安
下载并解压 Jar 包
- mkdir /opt/module/canal
- tar -zxvf canal.deployer-1.1.2.tar.gz -C /opt/module/canal
复制代码 修改 canal.properties 的设置
- #################################################
- ######### common argument #############
- #################################################
- # tcp bind ip
- canal.ip =
- # register ip to zookeeper
- canal.register.ip =
- canal.port = 11111
- canal.metrics.pull.port = 11112
- # canal instance user/passwd
- # canal.user = canal
- # canal.passwd = E3619321C1A937C46A0D8BD1DAC39F93B27D4458
- # canal admin config
- #canal.admin.manager = 127.0.0.1:8089
- canal.admin.port = 11110
- canal.admin.user = admin
- canal.admin.passwd = 4ACFE3202A5FF5CF467898FC58AAB1D615029441
- # admin auto register
- #canal.admin.register.auto = true
- #canal.admin.register.cluster =
- #canal.admin.register.name =
- canal.zkServers =
- # flush data to zk
- canal.zookeeper.flush.period = 1000
- canal.withoutNetty = false
- # tcp, kafka, rocketMQ, rabbitMQ
- canal.serverMode = tcp
- # flush meta cursor/parse position to file
- canal.file.data.dir = ${canal.conf.dir}
- canal.file.flush.period = 1000
- ## memory store RingBuffer size, should be Math.pow(2,n)
- ......
复制代码
- 说明:这个文件是canal的基本通用设置,canal端口号默认就是 11111,修改canal的输出model,默认tcp,改为输出到 kafka多实例设置如果创建多个实例,通过前面canal架构,我们可以知道,一个canal服务中可以有多个instance,conf/下的每一个example即是一个实例,每个实例下面都有独立的设置文件。默认只有一个实例example,如果需要多个实例处理不同的 MySQL 数据的话,直接拷贝出多个 example,并对其重新命名,命名和设置文件中指定的名称划一,然后修改
canal.properties 中的 canal.destinations=实例 1,实例 2,实例 3。
- #################################################
- ######### destinations #############
- #################################################
- canal.destinations = example
复制代码 修改 instance.properties
设置 MySQL 服务器地点
- #################################################
- ## mysql serverId , v1.0.26+ will autoGen
- canal.instance.mysql.slaveId=20
- # enable gtid use true/false
- canal.instance.gtidon=false
- # position info
- canal.instance.master.address=hadoop100:3306
复制代码 设置连接 MySQL 的用户名和密码,默认就是我们前面授权的 canal
- # username/password
- canal.instance.dbUsername=canal
- canal.instance.dbPassword=canal
- canal.instance.connectionCharset = UTF-8
- canal.instance.defaultDatabaseName =test
- # enable druid Decrypt database password
- canal.instance.enableDruid=false
复制代码 启动服务
- [root@cnnxpredn02 canal]# cat canal.log
- 2022-03-07 08:47:21.349 [main] INFO com.alibaba.otter.canal.deployer.CanalLauncher - ## set default uncaught exception handler
- 2022-03-07 08:47:21.398 [main] INFO com.alibaba.otter.canal.deployer.CanalLauncher - ## load canal configurations
- 2022-03-07 08:47:21.415 [main] INFO com.alibaba.otter.canal.deployer.CanalStarter - ## start the canal server.
- 2022-03-07 08:47:21.475 [main] INFO com.alibaba.otter.canal.deployer.CanalController - ## start the canal server):11111]
- 2022-03-07 08:47:23.137 [main] INFO com.alibaba.otter.canal.deployer.CanalStarter - ## the canal server is running now ......
复制代码 canal-adapter 的下载和安装
下载并解
在application.yml中设置MySQL和clickhouse的连接信息
- ...
- # srcDataSources:
- defaultDS:
- url: jdbc:mysql://ip:3306/database?useUnicode=true
- username: user
- password: pass
- canalAdapters:
- - instance: test # canal instance Name or mq topic name
- groups:
- - groupId: g1
- outerAdapters:
- - name: logger
- - name: rdb
- key: mysql1
- properties:
- jdbc.driverClassName: ru.yandex.clickhouse.ClickHouseDriver
- jdbc.url: jdbc:clickhouse://ip:port/database
- jdbc.username: user
- jdbc.password: pass
- # - name: rdb
- ...
复制代码 在/opt/soft/canal-adapter/conf/rdb/mytest_user.yml中设置需要同步的库表信息
- canal可以同步整个库的所有表和单表的表,设置都在这里进行设置,如果需要同步多张表的数据,就在rdb下创建多个雷同的设置文件,文件名可以用表名来命名。
- Mirror schema synchronize config
- dataSourceKey: defaultDS
- destination: test
- groupId: g1
- outerAdapterKey: mysql1
- concurrent: true
- dbMapping:
- mirrorDb: true
- database: testck
复制代码 启动canal-adapter服务
- 在/.../canal-adapter/logs/adapter下查看日记
本文由博客一文多发平台 OpenWrite 发布!
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |