ClickHouse 集群摆设(不必要 Zookeeper)

打印 上一主题 下一主题

主题 874|帖子 874|积分 2632

目录
一、单节点设置
1. 下载
2. 安装
3. 启动
4. 验证
二、导入示例数据集
1. 下载并提取表数据
2. 创建库表
3. 导入数据
4. 优化表
5. 查询示例
三、集群摆设
0. 安装前准备
1. 安装设置 ClickHouse Keeper
(1)安装 ClickHouse Server 和 ClickHouse Client
(3)重启 ClickHouse
(4)验证 Keeper 实例是否正在运行
(5)确认 zookeeper 体系表
2. 设置 ClickHouse 集群
(1)在新主机(node4)上安装 ClickHouse Server 和 ClickHouse Client
(2)修改新主机(node4)上的设置文件
(3)更新设置
(4)重启 ClickHouse 并验证集群已创建
3. 创建分布式表
(1)创建数据库
(2)创建当地表
(3)在一个分片实例上(node1)新增两行
(4)在另一个分片实例上(node3)新增两行
(5)在所有实例上分别实行 SELECT 查询
(6)创建分布式表
(7)在所有实例上分别实行 SELECT 查询
4. 数据自动分片
(1)创建带有分片键的当地表
(2)创建提供集群当地表视图的分布式表
(3)向分布式表中插入数据
(4)在所有实例上分别实行 SELECT 查询
5. 使用 ReplicatedMergeTree 表引擎复制数据
(1)定义宏
(2)创建当地表
(3)创建分布式表
(4)向分布式表中插入数据
(5)在所有实例上分别实行 SELECT 查询
参考:


        ClickHouse 可以在任何具有x86_64、AArch64 或 PowerPC64LE CPU 架构的 Linux,FreeBSD 或 Mac OS X 上运行。官方预构建的二进制文件通常针对 x86_64 举行编译,并利用 SSE 4.2 指令集,因此,除非尚有阐明,支持它的 CPU 使用将成为额外的体系需求。下面是查抄当前 CPU 是否支持 SSE 4.2 的下令:
  1. $ grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported"
复制代码
一、单节点设置

1. 下载

  1. # 查看版本
  2. https://github.com/ClickHouse/ClickHouse/releases
  3. # 最新稳定版本安装包下载地址
  4. https://github.com/ClickHouse/ClickHouse/releases/download/v24.1.8.22-stable/clickhouse-client-24.1.8.22.x86_64.rpm
  5. https://github.com/ClickHouse/ClickHouse/releases/download/v24.1.8.22-stable/clickhouse-common-static-24.1.8.22.x86_64.rpm
  6. https://github.com/ClickHouse/ClickHouse/releases/download/v24.1.8.22-stable/clickhouse-server-24.1.8.22.x86_64.rpm
复制代码
2. 安装

  1. # 首先安装通用依赖包
  2. rpm -ivh clickhouse-common-static-24.1.8.22.x86_64.rpm
  3. # 安装服务器,遇到 Enter password for default user 提示时输入密码
  4. rpm -ivh clickhouse-server-24.1.8.22.x86_64.rpm
  5. # 安装命令行客户端
  6. rpm -ivh clickhouse-client-24.1.8.22.x86_64.rpm
复制代码
3. 启动

  1. sudo service clickhouse-server start
复制代码
        查看启动后进程:
  1. [root@vvml-yz-hbase-test~]#ps -ef | grep clickhouse | grep -v grep
  2. clickho+  5322     1  0 08:49 ?        00:00:00 clickhouse-watchdog        --config-file /etc/clickhouse-server/config.xml --pid-file /var/run/clickhouse-server/clickhouse-server.pid --daemon
  3. clickho+  5323  5322  6 08:49 ?        00:00:00 /usr/bin/clickhouse-server --config-file /etc/clickhouse-server/config.xml --pid-file /var/run/clickhouse-server/clickhouse-server.pid --daemon
  4. [root@vvml-yz-hbase-test~]#
复制代码
        查看监听端口
  1. [root@vvml-yz-hbase-test~]#netstat -antpl | grep clickhouse
  2. tcp        0      0 127.0.0.1:9004          0.0.0.0:*               LISTEN      5323/clickhouse-ser 
  3. tcp        0      0 127.0.0.1:9005          0.0.0.0:*               LISTEN      5323/clickhouse-ser 
  4. tcp        0      0 127.0.0.1:9009          0.0.0.0:*               LISTEN      5323/clickhouse-ser 
  5. tcp        0      0 127.0.0.1:8123          0.0.0.0:*               LISTEN      5323/clickhouse-ser 
  6. tcp        0      0 127.0.0.1:9000          0.0.0.0:*               LISTEN      5323/clickhouse-ser 
  7. tcp6       0      0 ::1:9004                :::*                    LISTEN      5323/clickhouse-ser 
  8. tcp6       0      0 ::1:9005                :::*                    LISTEN      5323/clickhouse-ser 
  9. tcp6       0      0 ::1:9009                :::*                    LISTEN      5323/clickhouse-ser 
  10. tcp6       0      0 ::1:8123                :::*                    LISTEN      5323/clickhouse-ser 
  11. tcp6       0      0 ::1:9000                :::*                    LISTEN      5323/clickhouse-ser 
  12. [root@vvml-yz-hbase-test~]#
复制代码
        服务端日记的默认位置是 /var/log/clickhouse-server/。当服务端在日记中记录 Ready for connections 消息,即表示服务端已准备好处理客户端毗连。一旦 clickhouse-server 启动并运行,可以利用 clickhouse-client 毗连到服务端,并运行一些测试查询。
4. 验证

  1. [root@vvml-yz-hbase-test~]#clickhouse-client --password="123456" --query "select version();"
  2. 24.1.8.22
  3. [root@vvml-yz-hbase-test~]#clickhouse-client --password="123456" -n --query "select 1;select 2;"
  4. 1
  5. 2
  6. [root@vvml-yz-hbase-test~]#
复制代码
二、导入示例数据集

1. 下载并提取表数据

  1. curl https://datasets.clickhouse.com/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv
  2. curl https://datasets.clickhouse.com/visits/tsv/visits_v1.tsv.xz | unxz --threads=`nproc` > visits_v1.tsv
复制代码
2. 创建库表

        创建 sample.sql 文件,内容如下(注意嵌套表定义):
  1. create database if not exists tutorial;
  2. CREATE TABLE tutorial.hits_v1
  3. (
  4.     `WatchID` UInt64,
  5.     `JavaEnable` UInt8,
  6.     `Title` String,
  7.     `GoodEvent` Int16,
  8.     `EventTime` DateTime,
  9.     `EventDate` Date,
  10.     `CounterID` UInt32,
  11.     `ClientIP` UInt32,
  12.     `ClientIP6` FixedString(16),
  13.     `RegionID` UInt32,
  14.     `UserID` UInt64,
  15.     `CounterClass` Int8,
  16.     `OS` UInt8,
  17.     `UserAgent` UInt8,
  18.     `URL` String,
  19.     `Referer` String,
  20.     `URLDomain` String,
  21.     `RefererDomain` String,
  22.     `Refresh` UInt8,
  23.     `IsRobot` UInt8,
  24.     `RefererCategories` Array(UInt16),
  25.     `URLCategories` Array(UInt16),
  26.     `URLRegions` Array(UInt32),
  27.     `RefererRegions` Array(UInt32),
  28.     `ResolutionWidth` UInt16,
  29.     `ResolutionHeight` UInt16,
  30.     `ResolutionDepth` UInt8,
  31.     `FlashMajor` UInt8,
  32.     `FlashMinor` UInt8,
  33.     `FlashMinor2` String,
  34.     `NetMajor` UInt8,
  35.     `NetMinor` UInt8,
  36.     `UserAgentMajor` UInt16,
  37.     `UserAgentMinor` FixedString(2),
  38.     `CookieEnable` UInt8,
  39.     `JavascriptEnable` UInt8,
  40.     `IsMobile` UInt8,
  41.     `MobilePhone` UInt8,
  42.     `MobilePhoneModel` String,
  43.     `Params` String,
  44.     `IPNetworkID` UInt32,
  45.     `TraficSourceID` Int8,
  46.     `SearchEngineID` UInt16,
  47.     `SearchPhrase` String,
  48.     `AdvEngineID` UInt8,
  49.     `IsArtifical` UInt8,
  50.     `WindowClientWidth` UInt16,
  51.     `WindowClientHeight` UInt16,
  52.     `ClientTimeZone` Int16,
  53.     `ClientEventTime` DateTime,
  54.     `SilverlightVersion1` UInt8,
  55.     `SilverlightVersion2` UInt8,
  56.     `SilverlightVersion3` UInt32,
  57.     `SilverlightVersion4` UInt16,
  58.     `PageCharset` String,
  59.     `CodeVersion` UInt32,
  60.     `IsLink` UInt8,
  61.     `IsDownload` UInt8,
  62.     `IsNotBounce` UInt8,
  63.     `FUniqID` UInt64,
  64.     `HID` UInt32,
  65.     `IsOldCounter` UInt8,
  66.     `IsEvent` UInt8,
  67.     `IsParameter` UInt8,
  68.     `DontCountHits` UInt8,
  69.     `WithHash` UInt8,
  70.     `HitColor` FixedString(1),
  71.     `UTCEventTime` DateTime,
  72.     `Age` UInt8,
  73.     `Sex` UInt8,
  74.     `Income` UInt8,
  75.     `Interests` UInt16,
  76.     `Robotness` UInt8,
  77.     `GeneralInterests` Array(UInt16),
  78.     `RemoteIP` UInt32,
  79.     `RemoteIP6` FixedString(16),
  80.     `WindowName` Int32,
  81.     `OpenerName` Int32,
  82.     `HistoryLength` Int16,
  83.     `BrowserLanguage` FixedString(2),
  84.     `BrowserCountry` FixedString(2),
  85.     `SocialNetwork` String,
  86.     `SocialAction` String,
  87.     `HTTPError` UInt16,
  88.     `SendTiming` Int32,
  89.     `DNSTiming` Int32,
  90.     `ConnectTiming` Int32,
  91.     `ResponseStartTiming` Int32,
  92.     `ResponseEndTiming` Int32,
  93.     `FetchTiming` Int32,
  94.     `RedirectTiming` Int32,
  95.     `DOMInteractiveTiming` Int32,
  96.     `DOMContentLoadedTiming` Int32,
  97.     `DOMCompleteTiming` Int32,
  98.     `LoadEventStartTiming` Int32,
  99.     `LoadEventEndTiming` Int32,
  100.     `NSToDOMContentLoadedTiming` Int32,
  101.     `FirstPaintTiming` Int32,
  102.     `RedirectCount` Int8,
  103.     `SocialSourceNetworkID` UInt8,
  104.     `SocialSourcePage` String,
  105.     `ParamPrice` Int64,
  106.     `ParamOrderID` String,
  107.     `ParamCurrency` FixedString(3),
  108.     `ParamCurrencyID` UInt16,
  109.     `GoalsReached` Array(UInt32),
  110.     `OpenstatServiceName` String,
  111.     `OpenstatCampaignID` String,
  112.     `OpenstatAdID` String,
  113.     `OpenstatSourceID` String,
  114.     `UTMSource` String,
  115.     `UTMMedium` String,
  116.     `UTMCampaign` String,
  117.     `UTMContent` String,
  118.     `UTMTerm` String,
  119.     `FromTag` String,
  120.     `HasGCLID` UInt8,
  121.     `RefererHash` UInt64,
  122.     `URLHash` UInt64,
  123.     `CLID` UInt32,
  124.     `YCLID` UInt64,
  125.     `ShareService` String,
  126.     `ShareURL` String,
  127.     `ShareTitle` String,
  128.     `ParsedParams` Nested(
  129.         Key1 String,
  130.         Key2 String,
  131.         Key3 String,
  132.         Key4 String,
  133.         Key5 String,
  134.         ValueDouble Float64),
  135.     `IslandID` FixedString(16),
  136.     `RequestNum` UInt32,
  137.     `RequestTry` UInt8
  138. )
  139. ENGINE = MergeTree()
  140. PARTITION BY toYYYYMM(EventDate)
  141. ORDER BY (CounterID, EventDate, intHash32(UserID))
  142. SAMPLE BY intHash32(UserID);
  143. CREATE TABLE tutorial.visits_v1
  144. (
  145.     `CounterID` UInt32,
  146.     `StartDate` Date,
  147.     `Sign` Int8,
  148.     `IsNew` UInt8,
  149.     `VisitID` UInt64,
  150.     `UserID` UInt64,
  151.     `StartTime` DateTime,
  152.     `Duration` UInt32,
  153.     `UTCStartTime` DateTime,
  154.     `PageViews` Int32,
  155.     `Hits` Int32,
  156.     `IsBounce` UInt8,
  157.     `Referer` String,
  158.     `StartURL` String,
  159.     `RefererDomain` String,
  160.     `StartURLDomain` String,
  161.     `EndURL` String,
  162.     `LinkURL` String,
  163.     `IsDownload` UInt8,
  164.     `TraficSourceID` Int8,
  165.     `SearchEngineID` UInt16,
  166.     `SearchPhrase` String,
  167.     `AdvEngineID` UInt8,
  168.     `PlaceID` Int32,
  169.     `RefererCategories` Array(UInt16),
  170.     `URLCategories` Array(UInt16),
  171.     `URLRegions` Array(UInt32),
  172.     `RefererRegions` Array(UInt32),
  173.     `IsYandex` UInt8,
  174.     `GoalReachesDepth` Int32,
  175.     `GoalReachesURL` Int32,
  176.     `GoalReachesAny` Int32,
  177.     `SocialSourceNetworkID` UInt8,
  178.     `SocialSourcePage` String,
  179.     `MobilePhoneModel` String,
  180.     `ClientEventTime` DateTime,
  181.     `RegionID` UInt32,
  182.     `ClientIP` UInt32,
  183.     `ClientIP6` FixedString(16),
  184.     `RemoteIP` UInt32,
  185.     `RemoteIP6` FixedString(16),
  186.     `IPNetworkID` UInt32,
  187.     `SilverlightVersion3` UInt32,
  188.     `CodeVersion` UInt32,
  189.     `ResolutionWidth` UInt16,
  190.     `ResolutionHeight` UInt16,
  191.     `UserAgentMajor` UInt16,
  192.     `UserAgentMinor` UInt16,
  193.     `WindowClientWidth` UInt16,
  194.     `WindowClientHeight` UInt16,
  195.     `SilverlightVersion2` UInt8,
  196.     `SilverlightVersion4` UInt16,
  197.     `FlashVersion3` UInt16,
  198.     `FlashVersion4` UInt16,
  199.     `ClientTimeZone` Int16,
  200.     `OS` UInt8,
  201.     `UserAgent` UInt8,
  202.     `ResolutionDepth` UInt8,
  203.     `FlashMajor` UInt8,
  204.     `FlashMinor` UInt8,
  205.     `NetMajor` UInt8,
  206.     `NetMinor` UInt8,
  207.     `MobilePhone` UInt8,
  208.     `SilverlightVersion1` UInt8,
  209.     `Age` UInt8,
  210.     `Sex` UInt8,
  211.     `Income` UInt8,
  212.     `JavaEnable` UInt8,
  213.     `CookieEnable` UInt8,
  214.     `JavascriptEnable` UInt8,
  215.     `IsMobile` UInt8,
  216.     `BrowserLanguage` UInt16,
  217.     `BrowserCountry` UInt16,
  218.     `Interests` UInt16,
  219.     `Robotness` UInt8,
  220.     `GeneralInterests` Array(UInt16),
  221.     `Params` Array(String),
  222.     `Goals` Nested(
  223.         ID UInt32,
  224.         Serial UInt32,
  225.         EventTime DateTime,
  226.         Price Int64,
  227.         OrderID String,
  228.         CurrencyID UInt32),
  229.     `WatchIDs` Array(UInt64),
  230.     `ParamSumPrice` Int64,
  231.     `ParamCurrency` FixedString(3),
  232.     `ParamCurrencyID` UInt16,
  233.     `ClickLogID` UInt64,
  234.     `ClickEventID` Int32,
  235.     `ClickGoodEvent` Int32,
  236.     `ClickEventTime` DateTime,
  237.     `ClickPriorityID` Int32,
  238.     `ClickPhraseID` Int32,
  239.     `ClickPageID` Int32,
  240.     `ClickPlaceID` Int32,
  241.     `ClickTypeID` Int32,
  242.     `ClickResourceID` Int32,
  243.     `ClickCost` UInt32,
  244.     `ClickClientIP` UInt32,
  245.     `ClickDomainID` UInt32,
  246.     `ClickURL` String,
  247.     `ClickAttempt` UInt8,
  248.     `ClickOrderID` UInt32,
  249.     `ClickBannerID` UInt32,
  250.     `ClickMarketCategoryID` UInt32,
  251.     `ClickMarketPP` UInt32,
  252.     `ClickMarketCategoryName` String,
  253.     `ClickMarketPPName` String,
  254.     `ClickAWAPSCampaignName` String,
  255.     `ClickPageName` String,
  256.     `ClickTargetType` UInt16,
  257.     `ClickTargetPhraseID` UInt64,
  258.     `ClickContextType` UInt8,
  259.     `ClickSelectType` Int8,
  260.     `ClickOptions` String,
  261.     `ClickGroupBannerID` Int32,
  262.     `OpenstatServiceName` String,
  263.     `OpenstatCampaignID` String,
  264.     `OpenstatAdID` String,
  265.     `OpenstatSourceID` String,
  266.     `UTMSource` String,
  267.     `UTMMedium` String,
  268.     `UTMCampaign` String,
  269.     `UTMContent` String,
  270.     `UTMTerm` String,
  271.     `FromTag` String,
  272.     `HasGCLID` UInt8,
  273.     `FirstVisit` DateTime,
  274.     `PredLastVisit` Date,
  275.     `LastVisit` Date,
  276.     `TotalVisits` UInt32,
  277.     `TraficSource` Nested(
  278.         ID Int8,
  279.         SearchEngineID UInt16,
  280.         AdvEngineID UInt8,
  281.         PlaceID UInt16,
  282.         SocialSourceNetworkID UInt8,
  283.         Domain String,
  284.         SearchPhrase String,
  285.         SocialSourcePage String),
  286.     `Attendance` FixedString(16),
  287.     `CLID` UInt32,
  288.     `YCLID` UInt64,
  289.     `NormalizedRefererHash` UInt64,
  290.     `SearchPhraseHash` UInt64,
  291.     `RefererDomainHash` UInt64,
  292.     `NormalizedStartURLHash` UInt64,
  293.     `StartURLDomainHash` UInt64,
  294.     `NormalizedEndURLHash` UInt64,
  295.     `TopLevelDomain` UInt64,
  296.     `URLScheme` UInt64,
  297.     `OpenstatServiceNameHash` UInt64,
  298.     `OpenstatCampaignIDHash` UInt64,
  299.     `OpenstatAdIDHash` UInt64,
  300.     `OpenstatSourceIDHash` UInt64,
  301.     `UTMSourceHash` UInt64,
  302.     `UTMMediumHash` UInt64,
  303.     `UTMCampaignHash` UInt64,
  304.     `UTMContentHash` UInt64,
  305.     `UTMTermHash` UInt64,
  306.     `FromHash` UInt64,
  307.     `WebVisorEnabled` UInt8,
  308.     `WebVisorActivity` UInt32,
  309.     `ParsedParams` Nested(
  310.         Key1 String,
  311.         Key2 String,
  312.         Key3 String,
  313.         Key4 String,
  314.         Key5 String,
  315.         ValueDouble Float64),
  316.     `Market` Nested(
  317.         Type UInt8,
  318.         GoalID UInt32,
  319.         OrderID String,
  320.         OrderPrice Int64,
  321.         PP UInt32,
  322.         DirectPlaceID UInt32,
  323.         DirectOrderID UInt32,
  324.         DirectBannerID UInt32,
  325.         GoodID String,
  326.         GoodName String,
  327.         GoodQuantity Int32,
  328.         GoodPrice Int64),
  329.     `IslandID` FixedString(16)
  330. )
  331. ENGINE = CollapsingMergeTree(Sign)
  332. PARTITION BY toYYYYMM(StartDate)
  333. ORDER BY (CounterID, StartDate, intHash32(UserID), VisitID)
  334. SAMPLE BY intHash32(UserID);
复制代码
        实行 sample.sql 文件:
  1. clickhouse-client --password="123456" --queries-file sample.sql
复制代码
3. 导入数据

  1. clickhouse-client --password="123456" --query "INSERT INTO tutorial.hits_v1 FORMAT TSV" --max_insert_block_size=100000 < hits_v1.tsv
  2. clickhouse-client --password="123456" --query "INSERT INTO tutorial.visits_v1 FORMAT TSV" --max_insert_block_size=100000 < visits_v1.tsv
复制代码
        找出可用的设置、寄义及其默认值的最简朴方法是查询 system.settings 表:
  1. vvml-yz-hbase-test.172.18.4.126 :) select name, value, changed, description
  2.   from system.settings
  3.  where name like '%max_insert_b%';
  4. SELECT
  5.     name,
  6.     value,
  7.     changed,
  8.     description
  9. FROM system.settings
  10. WHERE name LIKE '%max_insert_b%'
  11. Query id: 05bc6241-2d1f-432e-87b3-f35a3ad612c8
  12. ┌─name──────────────────┬─value───┬─changed─┬─description───────────────────────────────────────────────────────────────────────────────┐
  13. │ max_insert_block_size │ 1048449 │       0 │ The maximum block size for insertion, if we control the creation of blocks for insertion. │
  14. └───────────────────────┴─────────┴─────────┴───────────────────────────────────────────────────────────────────────────────────────────┘
  15. 1 row in set. Elapsed: 0.002 sec. 
  16. vvml-yz-hbase-test.172.18.4.126 :) 
复制代码
4. 优化表

        可以 OPTIMIZE 导入后的表。使用 MergeTree-family 引擎设置的表总是在后台归并数据部分以优化数据存储(或至少查抄是否故意义)。这些查询强制表引擎立刻举行存储优化(较慢,审慎手工实行):
  1. clickhouse-client --password="123456" --query "OPTIMIZE TABLE tutorial.hits_v1 FINAL"
  2. clickhouse-client --password="123456" --query "OPTIMIZE TABLE tutorial.visits_v1 FINAL"
复制代码
5. 查询示例

  1. vvml-yz-hbase-test.172.18.4.126 :) SELECT
  2.     StartURL AS URL,
  3.     AVG(Duration) AS AvgDuration
  4. FROM tutorial.visits_v1
  5. WHERE StartDate BETWEEN '2014-03-23' AND '2014-03-30'
  6. GROUP BY URL
  7. ORDER BY AvgDuration DESC
  8. LIMIT 10;
  9. SELECT
  10.     StartURL AS URL,
  11.     AVG(Duration) AS AvgDuration
  12. FROM tutorial.visits_v1
  13. WHERE (StartDate >= '2014-03-23') AND (StartDate <= '2014-03-30')
  14. GROUP BY URL
  15. ORDER BY AvgDuration DESC
  16. LIMIT 10
  17. Query id: fc2e8638-5081-496a-964b-a679eab63af9
  18. ┌─URL─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─AvgDuration─┐
  19. │ http://itpalanija-pri-patrivative=0&ads_app_user                                                                                                                                                │       60127 │
  20. │ http://renaul-myd-ukraine                                                                                                                                                                       │       58938 │
  21. │ http://karta/Futbol/dynamo.kiev.ua/kawaica.su/648                                                                                                                                               │       56538 │
  22. │ https://moda/vyikroforum1/top.ru/moscow/delo-product/trend_sms/multitryaset/news/2014/03/201000                                                                                                 │       55218 │
  23. │ http://e.mail=on&default?abid=2061&scd=yes&option?r=city_inter.com/menu&site-zaferio.ru/c/m.ensor.net/ru/login=false&orderStage.php?Brandidatamalystyle/20Mar2014%2F007%2F94dc8d2e06e56ed56bbdd │       51378 │
  24. │ http://karta/Futbol/dynas.com/haberler.ru/messages.yandsearchives/494503_lte_13800200319                                                                                                        │       49078 │
  25. │ http://xmusic/vstreatings of speeds                                                                                                                                                             │       36925 │
  26. │ http://news.ru/yandex.ru/api.php&api=http://toberria.ru/aphorizana                                                                                                                              │       36902 │
  27. │ http://bashmelnykh-metode.net/video/#!/video/emberkas.ru/detskij-yazi.com/iframe/default.aspx?id=760928&noreask=1&source                                                                        │       34323 │
  28. │ http://censonhaber/547-popalientLog=0&strizhki-petro%3D&comeback=search?lr=213&text                                                                                                             │       31773 │
  29. └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────┘
  30. 10 rows in set. Elapsed: 0.080 sec. Processed 1.43 million rows, 112.12 MB (17.91 million rows/s., 1.40 GB/s.)
  31. Peak memory usage: 45.07 MiB.
  32. vvml-yz-hbase-test.172.18.4.126 :) SELECT
  33.     sum(Sign) AS visits,
  34.     sumIf(Sign, has(Goals.ID, 1105530)) AS goal_visits,
  35.     (100. * goal_visits) / visits AS goal_percent
  36. FROM tutorial.visits_v1
  37. WHERE (CounterID = 912887) AND (toYYYYMM(StartDate) = 201403) AND (domain(StartURL) = 'yandex.ru');
  38. SELECT
  39.     sum(Sign) AS visits,
  40.     sumIf(Sign, has(Goals.ID, 1105530)) AS goal_visits,
  41.     (100. * goal_visits) / visits AS goal_percent
  42. FROM tutorial.visits_v1
  43. WHERE (CounterID = 912887) AND (toYYYYMM(StartDate) = 201403) AND (domain(StartURL) = 'yandex.ru')
  44. Query id: 02df86cc-ef97-4cec-9892-6ba92dc87d2c
  45. ┌─visits─┬─goal_visits─┬──────goal_percent─┐
  46. │  10543 │        8553 │ 81.12491700654462 │
  47. └────────┴─────────────┴───────────────────┘
  48. 1 row in set. Elapsed: 0.014 sec. Processed 19.72 thousand rows, 3.45 MB (1.46 million rows/s., 255.24 MB/s.)
  49. Peak memory usage: 5.01 MiB.
  50. vvml-yz-hbase-test.172.18.4.126 :) 
复制代码
三、集群摆设

        本次摆设使用四台主机,构建两个分片,每个分片两个副本的 ClickHouse 集群,IP 和主机名如下:
  1. 172.18.4.126    node1
  2. 172.18.4.188    node2
  3. 172.18.4.71    node3
  4. 172.18.4.86    node4
复制代码
        注意,ClickHouse 要求每个分片的每个副本必须设置在单独的实例上,也就是说在整个集群范围内,一共有多少个副本,就必要创建多少个 ClickHouse 实例。最佳实践是3分片2副本6实例。
        ClickHouse 推荐使用 ClickHouse Keeper 替换 Zookeeper(https://clickhouse.com/docs/knowledgebase/why_recommend_clickhouse_keeper_over_zookeeper)。
        下面使用 ClickHouse Keeper 设置 ClickHouse 集群。具体利用步骤参考 ClickHouse Keeper 用户指南(https://clickhouse.com/docs/en/guides/sre/keeper/clickhouse-keeper#clickhouse-keeper-user-guide)。
        ClickHouse 服务器中捆绑了 clickhouse-keeper。如果已经安装服务器,则无法单独安装 clickhouse-keeper,会收到冲突错误:
  1. [root@vvml-yz-hbase-test~]#rpm -ivh clickhouse-keeper-24.1.8.22.x86_64.rpm 
  2. error: Failed dependencies:
  3.     clickhouse-server conflicts with clickhouse-keeper-0:24.1.8.22-1.x86_64
  4. [root@vvml-yz-hbase-test~]#
复制代码
        但如果有仅用作 clickhouse-keeper 的服务器,则仅可以单独安装 clickhouse-keeper。
  1. # 查看版本
  2. https://github.com/ClickHouse/ClickHouse/releases
  3. # 最新稳定版本安装包下载地址
  4. https://github.com/ClickHouse/ClickHouse/releases/download/v24.1.8.22-stable/clickhouse-keeper-24.1.8.22.x86_64.rpm
复制代码
        本次摆设在每个主机安装 clickhouse-server,而不但独安装 clickhouse-keeper。分片、副本、keeper规划如下:
  1. node1    分片1副本1    keeper
  2. node2    分片1副本2    keeper
  3. node3    分片2副本1    keeper
  4. node4    分片2副本2
复制代码
        四台主机都作为数据分片,每个数据分片有两个副本。keeper 摆设到三个实例上,奇数实例用于实现 ClickHouse Keeper 中要求的票选数。
0. 安装前准备

        在设置 ClickHouse 集群前,必要在全部四台机器上完成以下准备工作:


  • 启动 NTP 时钟同步
  • /etc/hosts 文件中添加构成集群的所有主机名
  • 设置所有主机间 ssh 免密
  • 修改用户可打开文件数与进程数
  • 禁用防火墙
  • 禁用 transparent hugepage
1. 安装设置 ClickHouse Keeper

        在 node1、node2、node3 三台主机上实行下面的利用步骤。
(1)安装 ClickHouse Server 和 ClickHouse Client

  1. rpm -ivh clickhouse-common-static-24.1.8.22.x86_64.rpm
  2. rpm -ivh clickhouse-server-24.1.8.22.x86_64.rpm
  3. rpm -ivh clickhouse-client-24.1.8.22.x86_64.rpm
复制代码
(2)修改 ClickHouse 主设置文件 
  1. # 修改前先备份
  2. cp /etc/clickhouse-server/config.xml /etc/clickhouse-server/config.xml.bak
复制代码
        修改 /etc/clickhouse-server/config.xml 主设置文件,在根节点 <clickhouse> 下添加以下内容:
  1. <!-- 添加以下条目以允许通过网络接口进行外部通信。 -->
  2. <listen_host>0.0.0.0</listen_host>
  3. <!-- 指定实例启用 ClickHouse Keeper。更新每台服务器的<server_id>设置,node1为1、node2为2、node3为3。-->
  4. <keeper_server>
  5.     <tcp_port>9181</tcp_port>
  6.     <server_id>1</server_id>
  7.     <log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
  8.     <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
  9.     <coordination_settings>
  10.         <operation_timeout_ms>10000</operation_timeout_ms>
  11.         <session_timeout_ms>30000</session_timeout_ms>
  12.         <raft_logs_level>warning</raft_logs_level>
  13.     </coordination_settings>
  14.     <raft_configuration>
  15.         <server>
  16.             <id>1</id>
  17.             <hostname>node1</hostname>
  18.             <port>9234</port>
  19.         </server>
  20.         <server>
  21.             <id>2</id>
  22.             <hostname>node2</hostname>
  23.             <port>9234</port>
  24.         </server>
  25.         <server>
  26.             <id>3</id>
  27.             <hostname>node3</hostname>
  28.             <port>9234</port>
  29.         </server>
  30.     </raft_configuration>
  31. </keeper_server>
  32. <!-- 指定实例所使用的 ClickHouse Keeper -->
  33. <zookeeper>
  34.     <node>
  35.         <host>node1</host>
  36.         <port>9181</port>
  37.     </node>
  38.     <node>
  39.         <host>node2</host>
  40.         <port>9181</port>
  41.     </node>
  42.     <node>
  43.         <host>node3</host>
  44.         <port>9181</port>
  45.     </node>
  46. </zookeeper>
复制代码
(3)重启 ClickHouse

  1. sudo service clickhouse-server restart
复制代码
(4)验证 Keeper 实例是否正在运行

        在 node1、node2、node3 上实行下面的下令,如果 Keeper 运行正常,ruok 下令将返回 imok:
  1. [root@vvml-yz-hbase-test~]#echo ruok | nc localhost 9181; echo
  2. imok
  3. [root@vvml-yz-hbase-test~]#
复制代码
(5)确认 zookeeper 体系表

        体系数据库有一个名为 zookeeper 的表,其中包罗 ClickHouse Keeper 实例的详细信息:
  1. [root@vvml-yz-hbase-test~]#clickhouse-client --password="123456"
  2. ClickHouse client version 24.1.8.22 (official build).
  3. Connecting to localhost:9000 as user default.
  4. Connected to ClickHouse server version 24.1.8.
  5. vvml-yz-hbase-test.172.18.4.126 :) SELECT *
  6. FROM system.zookeeper
  7. WHERE path IN ('/', '/clickhouse');
  8. SELECT *
  9. FROM system.zookeeper
  10. WHERE path IN ('/', '/clickhouse')
  11. Query id: e713c446-26c5-4c3f-994c-db22eb68b9ad
  12. ┌─name───────┬─value─┬─path────────┐
  13. │ keeper     │       │ /           │
  14. │ clickhouse │       │ /           │
  15. │ task_queue │       │ /clickhouse │
  16. │ sessions   │       │ /clickhouse │
  17. └────────────┴───────┴─────────────┘
  18. 4 rows in set. Elapsed: 0.002 sec. 
  19. vvml-yz-hbase-test.172.18.4.126 :) 
复制代码
2. 设置 ClickHouse 集群

(1)在新主机(node4)上安装 ClickHouse Server 和 ClickHouse Client

  1. rpm -ivh clickhouse-common-static-24.1.8.22.x86_64.rpm
  2. rpm -ivh clickhouse-server-24.1.8.22.x86_64.rpm
  3. rpm -ivh clickhouse-client-24.1.8.22.x86_64.rpm
复制代码
(2)修改新主机(node4)上的设置文件

  1. # 修改前先备份
  2. cp /etc/clickhouse-server/config.xml /etc/clickhouse-server/config.xml.bak
复制代码
        修改 /etc/clickhouse-server/config.xml 主设置文件,在根节点 <clickhouse> 下添加以下内容:
  1. <!-- 添加以下条目以允许通过网络接口进行外部通信。 -->
  2. <listen_host>0.0.0.0</listen_host>
  3. <!-- 指定实例所使用的 ClickHouse Keeper -->
  4. <zookeeper>
  5.     <node>
  6.         <host>node1</host>
  7.         <port>9181</port>
  8.     </node>
  9.     <node>
  10.         <host>node2</host>
  11.         <port>9181</port>
  12.     </node>
  13.     <node>
  14.         <host>node3</host>
  15.         <port>9181</port>
  16.     </node>
  17. </zookeeper>
复制代码
        然后在全部四台主机上实行下面的利用步骤。
(3)更新设置

        修改 /etc/clickhouse-server/config.xml 主设置文件,在根节点 <clickhouse> 下添加以下内容:
  1.     <remote_servers>
  2.         <cluster_2S_2R>
  3.             <shard>
  4.                 <replica>
  5.                     <host>node1</host>
  6.                     <port>9000</port>
  7.                     <user>default</user>
  8.                     <password>123456</password>
  9.                 </replica>
  10.                 <replica>
  11.                     <host>node2</host>
  12.                     <port>9000</port>
  13.                     <user>default</user>
  14.                     <password>123456</password>
  15.                 </replica>
  16.             </shard>
  17.             <shard>
  18.                 <replica>
  19.                     <host>node3</host>
  20.                     <port>9000</port>
  21.                     <user>default</user>
  22.                     <password>123456</password>
  23.                 </replica>
  24.                 <replica>
  25.                     <host>node4</host>
  26.                     <port>9000</port>
  27.                     <user>default</user>
  28.                     <password>123456</password>
  29.                 </replica>
  30.             </shard>
  31.         </cluster_2S_2R>
  32.     </remote_servers>
复制代码
(4)重启 ClickHouse 并验证集群已创建

  1. sudo service clickhouse-server restart
复制代码
        查看集群:
  1. [root@vvml-yz-hbase-test~]#clickhouse-client --password="123456"
  2. ClickHouse client version 24.1.8.22 (official build).
  3. Connecting to localhost:9000 as user default.
  4. Connected to ClickHouse server version 24.1.8.
  5. vvml-yz-hbase-test.172.18.4.126 :) SHOW CLUSTERS;
  6. SHOW CLUSTERS
  7. Query id: 80e81978-d1f4-4721-85d8-7e7803230373
  8. ┌─cluster───────┐
  9. │ cluster_2S_2R │
  10. └───────────────┘
  11. 1 row in set. Elapsed: 0.001 sec. 
  12. vvml-yz-hbase-test.172.18.4.126 :) select cluster,shard_num,replica_num,host_name,is_local,user,database_shard_name,database_replica_name from system.clusters;
  13. SELECT
  14.     cluster,
  15.     shard_num,
  16.     replica_num,
  17.     host_name,
  18.     is_local,
  19.     user,
  20.     database_shard_name,
  21.     database_replica_name
  22. FROM system.clusters
  23. Query id: 4b39d9ec-b4f7-4557-b76d-05f3893f4ef7
  24. ┌─cluster───────┬─shard_num─┬─replica_num─┬─host_name─┬─is_local─┬─user────┬─database_shard_name─┬─database_replica_name─┐
  25. │ cluster_2S_2R │         1 │           1 │ node1     │        1 │ default │                     │                       │
  26. │ cluster_2S_2R │         1 │           2 │ node2     │        0 │ default │                     │                       │
  27. │ cluster_2S_2R │         2 │           1 │ node3     │        0 │ default │                     │                       │
  28. │ cluster_2S_2R │         2 │           2 │ node4     │        0 │ default │                     │                       │
  29. └───────────────┴───────────┴─────────────┴───────────┴──────────┴─────────┴─────────────────────┴───────────────────────┘
  30. 4 rows in set. Elapsed: 0.001 sec. 
  31. vvml-yz-hbase-test.172.18.4.126 :) 
复制代码
3. 创建分布式表

(1)创建数据库

        使用 node1 上创建一个数据库。ON CLUSTER子句会自动在所有实例上创建数据库。
  1. vvml-yz-hbase-test.172.18.4.126 :) CREATE DATABASE db1 ON CLUSTER 'cluster_2S_2R';
  2. CREATE DATABASE db1 ON CLUSTER cluster_2S_2R
  3. Query id: 7a8cd789-bcfb-4855-a131-ba7935cffcfb
  4. ┌─host──┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
  5. │ node1 │ 9000 │      0 │       │                   3 │                0 │
  6. │ node3 │ 9000 │      0 │       │                   2 │                0 │
  7. │ node4 │ 9000 │      0 │       │                   1 │                0 │
  8. │ node2 │ 9000 │      0 │       │                   0 │                0 │
  9. └───────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
  10. 4 rows in set. Elapsed: 0.099 sec. 
  11. vvml-yz-hbase-test.172.18.4.126 :)
复制代码
(2)创建当地表

        在 db1 库中建表,同样,ON CLUSTER 子句会自动在所有实例上建表。
  1. vvml-yz-hbase-test.172.18.4.126 :) CREATE TABLE db1.table1 on cluster 'cluster_2S_2R'
  2. (
  3.     `id` UInt64,
  4.     `column1` String
  5. )
  6. ENGINE = MergeTree
  7. ORDER BY column1;
  8. CREATE TABLE db1.table1 ON CLUSTER cluster_2S_2R
  9. (
  10.     `id` UInt64,
  11.     `column1` String
  12. )
  13. ENGINE = MergeTree
  14. ORDER BY column1
  15. Query id: abb936ad-3618-4821-92f6-cfaa83fb4d51
  16. ┌─host──┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
  17. │ node3 │ 9000 │      0 │       │                   3 │                2 │
  18. │ node1 │ 9000 │      0 │       │                   2 │                2 │
  19. └───────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
  20. ┌─host──┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
  21. │ node4 │ 9000 │      0 │       │                   1 │                0 │
  22. │ node2 │ 9000 │      0 │       │                   0 │                0 │
  23. └───────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
  24. 4 rows in set. Elapsed: 0.158 sec. 
  25. vvml-yz-hbase-test.172.18.4.126 :)
复制代码
(3)在一个分片实例上(node1)新增两行

  1. vvml-yz-hbase-test.172.18.4.126 :) INSERT INTO db1.table1
  2.     (id, column1)
  3. VALUES
  4.     (1, 'abc'),
  5.     (2, 'def');
  6. INSERT INTO db1.table1 (id, column1) FORMAT Values
  7. Query id: 959da99c-c473-4c3a-9381-fa65b447161c
  8. Ok.
  9. 2 rows in set. Elapsed: 0.002 sec. 
  10. vvml-yz-hbase-test.172.18.4.126 :) 
复制代码
(4)在另一个分片实例上(node3)新增两行

  1. vvml-yz-hbase-test.172.18.4.71 :) INSERT INTO db1.table1
  2.     (id, column1)
  3. VALUES
  4.     (3, 'ghi'),
  5.     (4, 'jkl');
  6. INSERT INTO db1.table1 (id, column1) FORMAT Values
  7. Query id: c8864197-ec82-4aba-8c2e-aaa2af468553
  8. Ok.
  9. 2 rows in set. Elapsed: 0.002 sec. 
  10. vvml-yz-hbase-test.172.18.4.71 :)
复制代码
(5)在所有实例上分别实行 SELECT 查询

        node1、node3 上查询效果体现该当地实例上的两行数据:
  1. # node1
  2. vvml-yz-hbase-test.172.18.4.126 :) SELECT *
  3. FROM db1.table1;
  4. SELECT *
  5. FROM db1.table1
  6. Query id: 2f2fc679-9091-41ae-967b-4bd8e2ec7311
  7. ┌─id─┬─column1─┐
  8. │  1 │ abc     │
  9. │  2 │ def     │
  10. └────┴─────────┘
  11. 2 rows in set. Elapsed: 0.001 sec. 
  12. vvml-yz-hbase-test.172.18.4.126 :) 
  13. # node3
  14. vvml-yz-hbase-test.172.18.4.71 :) SELECT *
  15. FROM db1.table1;
  16. SELECT *
  17. FROM db1.table1
  18. Query id: 18843522-c678-45f0-901d-73e1bbfd4dbf
  19. ┌─id─┬─column1─┐
  20. │  3 │ ghi     │
  21. │  4 │ jkl     │
  22. └────┴─────────┘
  23. 2 rows in set. Elapsed: 0.002 sec. 
  24. vvml-yz-hbase-test.172.18.4.71 :) 
复制代码
        node2、node4 上查不到数据,阐明写入当地表并没有复制到其副本。
(6)创建分布式表

        可以创建一个分布式表来表示两个分片上的数据。具有分布式表引擎的表不存储自己的任何数据,而是允许在多个服务器上举行分布式查询处理。读取掷中所有分片,写入可以分布在分片之间。在任一实例上创建分布式表:
  1. vvml-yz-hbase-test.172.18.4.126 :) CREATE TABLE db1.dist_table ON CLUSTER 'cluster_2S_2R'
  2. (
  3.     id UInt64,
  4.     column1 String
  5. )
  6. ENGINE = Distributed(cluster_2S_2R,db1,table1);
  7. CREATE TABLE db1.dist_table ON CLUSTER cluster_2S_2R
  8. (
  9.     `id` UInt64,
  10.     `column1` String
  11. )
  12. ENGINE = Distributed(cluster_2S_2R, db1, table1)
  13. Query id: 7c08e756-90cf-4014-9368-dc41fe7d06f4
  14. ┌─host──┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
  15. │ node4 │ 9000 │      0 │       │                   3 │                0 │
  16. │ node2 │ 9000 │      0 │       │                   2 │                0 │
  17. │ node1 │ 9000 │      0 │       │                   1 │                0 │
  18. │ node3 │ 9000 │      0 │       │                   0 │                0 │
  19. └───────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘
  20. 4 rows in set. Elapsed: 0.095 sec. 
  21. vvml-yz-hbase-test.172.18.4.126 :)
复制代码
(7)在所有实例上分别实行 SELECT 查询

        所有实例上都创建了 db1.dist_table 表,但是 SELECT *
FROM db1.dist_table 查询效果却出乎意料。预想在所有实例上查询的效果相同,都返回两个分片中的所有四行数据,而现实四个实例返回的效果是不确定的:node1 偶然返回全部4行,偶然只返回前两行;node3 偶然返回全部4行,偶然只返回后两行;node2 偶然返回后两行,偶然不返回数据;node4 偶然返回前两行,偶然不返回数据。
        阐明:


  • ClickHouse 集群是一种对等架构,在一个集群里每个 ClickHouse 实例都是独立的,即使是同一个分片内的不同副本实例间,也是没有主从概念。
  • 在集群中任何一个实例上实行的 DDL 语句中使用 ON CLUSTER 子句,会自动在集群中的所有实例上实行。
  • 分布式体现实上是一种视图,映射到 ClickHouse 集群实例上的当地表。从分布式表中实行 SELECT 查询会使用集群所有相干分片的资源。
  • 底层当地表如果使用 MergeTree 表引擎,在分布式表上实行同一查询,返回效果有大概不确定。
4. 数据自动分片

        首先想到的是如果在分布式表上能够实行 DML 语句,就应该可以到达数据自动分片的效果。
  1. vvml-yz-hbase-test.172.18.4.126 :) insert into db1.dist_table
  2.     (id, column1)
  3. values
  4.     (5, 'mno'),
  5.     (6, 'pqr');
  6. INSERT INTO db1.dist_table (id, column1) FORMAT Values
  7. Query id: 78b4cc88-9b13-4078-a3f0-d405f4338cfb
  8. Elapsed: 0.002 sec. 
  9. Received exception from server (version 24.1.8):
  10. Code: 55. DB::Exception: Received from localhost:9000. DB::Exception: Method write is not supported by storage Distributed with more than one shard and no sharding key provided. (STORAGE_REQUIRES_PARAMETER)
  11. vvml-yz-hbase-test.172.18.4.126 :)
复制代码
        报错明确指出,当数据存储分布在多于一个的分片上,而且没有提供分片键时,分布式表不支持写入。为了进一步演示,下面使用和创建 hits_v1 表雷同的 CREATE TABLE 语句创建一个新的当地表,有三点不同:


  • 库名不同。
  • 表名不同。
  • 使用 ON CLUSTER 子句。
(1)创建带有分片键的当地表

        在任一实例上实行下面的建表语句:
  1. CREATE TABLE db1.hits_local ON CLUSTER 'cluster_2S_2R'
  2. (
  3.     `WatchID` UInt64,
  4.     `JavaEnable` UInt8,
  5.     `Title` String,
  6.     `GoodEvent` Int16,
  7.     `EventTime` DateTime,
  8.     `EventDate` Date,
  9.     `CounterID` UInt32,
  10.     `ClientIP` UInt32,
  11.     `ClientIP6` FixedString(16),
  12.     `RegionID` UInt32,
  13.     `UserID` UInt64,
  14.     `CounterClass` Int8,
  15.     `OS` UInt8,
  16.     `UserAgent` UInt8,
  17.     `URL` String,
  18.     `Referer` String,
  19.     `URLDomain` String,
  20.     `RefererDomain` String,
  21.     `Refresh` UInt8,
  22.     `IsRobot` UInt8,
  23.     `RefererCategories` Array(UInt16),
  24.     `URLCategories` Array(UInt16),
  25.     `URLRegions` Array(UInt32),
  26.     `RefererRegions` Array(UInt32),
  27.     `ResolutionWidth` UInt16,
  28.     `ResolutionHeight` UInt16,
  29.     `ResolutionDepth` UInt8,
  30.     `FlashMajor` UInt8,
  31.     `FlashMinor` UInt8,
  32.     `FlashMinor2` String,
  33.     `NetMajor` UInt8,
  34.     `NetMinor` UInt8,
  35.     `UserAgentMajor` UInt16,
  36.     `UserAgentMinor` FixedString(2),
  37.     `CookieEnable` UInt8,
  38.     `JavascriptEnable` UInt8,
  39.     `IsMobile` UInt8,
  40.     `MobilePhone` UInt8,
  41.     `MobilePhoneModel` String,
  42.     `Params` String,
  43.     `IPNetworkID` UInt32,
  44.     `TraficSourceID` Int8,
  45.     `SearchEngineID` UInt16,
  46.     `SearchPhrase` String,
  47.     `AdvEngineID` UInt8,
  48.     `IsArtifical` UInt8,
  49.     `WindowClientWidth` UInt16,
  50.     `WindowClientHeight` UInt16,
  51.     `ClientTimeZone` Int16,
  52.     `ClientEventTime` DateTime,
  53.     `SilverlightVersion1` UInt8,
  54.     `SilverlightVersion2` UInt8,
  55.     `SilverlightVersion3` UInt32,
  56.     `SilverlightVersion4` UInt16,
  57.     `PageCharset` String,
  58.     `CodeVersion` UInt32,
  59.     `IsLink` UInt8,
  60.     `IsDownload` UInt8,
  61.     `IsNotBounce` UInt8,
  62.     `FUniqID` UInt64,
  63.     `HID` UInt32,
  64.     `IsOldCounter` UInt8,
  65.     `IsEvent` UInt8,
  66.     `IsParameter` UInt8,
  67.     `DontCountHits` UInt8,
  68.     `WithHash` UInt8,
  69.     `HitColor` FixedString(1),
  70.     `UTCEventTime` DateTime,
  71.     `Age` UInt8,
  72.     `Sex` UInt8,
  73.     `Income` UInt8,
  74.     `Interests` UInt16,
  75.     `Robotness` UInt8,
  76.     `GeneralInterests` Array(UInt16),
  77.     `RemoteIP` UInt32,
  78.     `RemoteIP6` FixedString(16),
  79.     `WindowName` Int32,
  80.     `OpenerName` Int32,
  81.     `HistoryLength` Int16,
  82.     `BrowserLanguage` FixedString(2),
  83.     `BrowserCountry` FixedString(2),
  84.     `SocialNetwork` String,
  85.     `SocialAction` String,
  86.     `HTTPError` UInt16,
  87.     `SendTiming` Int32,
  88.     `DNSTiming` Int32,
  89.     `ConnectTiming` Int32,
  90.     `ResponseStartTiming` Int32,
  91.     `ResponseEndTiming` Int32,
  92.     `FetchTiming` Int32,
  93.     `RedirectTiming` Int32,
  94.     `DOMInteractiveTiming` Int32,
  95.     `DOMContentLoadedTiming` Int32,
  96.     `DOMCompleteTiming` Int32,
  97.     `LoadEventStartTiming` Int32,
  98.     `LoadEventEndTiming` Int32,
  99.     `NSToDOMContentLoadedTiming` Int32,
  100.     `FirstPaintTiming` Int32,
  101.     `RedirectCount` Int8,
  102.     `SocialSourceNetworkID` UInt8,
  103.     `SocialSourcePage` String,
  104.     `ParamPrice` Int64,
  105.     `ParamOrderID` String,
  106.     `ParamCurrency` FixedString(3),
  107.     `ParamCurrencyID` UInt16,
  108.     `GoalsReached` Array(UInt32),
  109.     `OpenstatServiceName` String,
  110.     `OpenstatCampaignID` String,
  111.     `OpenstatAdID` String,
  112.     `OpenstatSourceID` String,
  113.     `UTMSource` String,
  114.     `UTMMedium` String,
  115.     `UTMCampaign` String,
  116.     `UTMContent` String,
  117.     `UTMTerm` String,
  118.     `FromTag` String,
  119.     `HasGCLID` UInt8,
  120.     `RefererHash` UInt64,
  121.     `URLHash` UInt64,
  122.     `CLID` UInt32,
  123.     `YCLID` UInt64,
  124.     `ShareService` String,
  125.     `ShareURL` String,
  126.     `ShareTitle` String,
  127.     `ParsedParams` Nested(
  128.         Key1 String,
  129.         Key2 String,
  130.         Key3 String,
  131.         Key4 String,
  132.         Key5 String,
  133.         ValueDouble Float64),
  134.     `IslandID` FixedString(16),
  135.     `RequestNum` UInt32,
  136.     `RequestTry` UInt8
  137. )
  138. ENGINE = MergeTree()
  139. PARTITION BY toYYYYMM(EventDate)
  140. ORDER BY (CounterID, EventDate, intHash32(UserID))
  141. SAMPLE BY intHash32(UserID);
复制代码
(2)创建提供集群当地表视图的分布式表

        在任一实例上实行下面的建表语句:
  1. CREATE TABLE db1.hits_all ON CLUSTER 'cluster_2S_2R'
  2. AS db1.hits_local
  3. ENGINE = Distributed(cluster_2S_2R, db1, hits_local, rand());
复制代码
(3)向分布式表中插入数据

        在任一实例上实行下面的 insert 语句:
  1. INSERT INTO db1.hits_all SELECT * FROM tutorial.hits_v1;
复制代码
(4)在所有实例上分别实行 SELECT 查询

  1. # node1
  2. vvml-yz-hbase-test.172.18.4.126 :) select count(*) from db1.hits_local;
  3. SELECT count(*)
  4. FROM db1.hits_local
  5. Query id: 10a10594-dad1-4693-96fa-a27e62256cc0
  6. ┌─count()─┐
  7. │ 4437894 │
  8. └─────────┘
  9. 1 row in set. Elapsed: 0.002 sec. 
  10. vvml-yz-hbase-test.172.18.4.126 :) select count(*) from db1.hits_all;
  11. SELECT count(*)
  12. FROM db1.hits_all
  13. Query id: 4d825459-19fa-4f68-9326-cf6a9b7e113a
  14. ┌─count()─┐
  15. │ 8873898 │
  16. └─────────┘
  17. 1 row in set. Elapsed: 0.006 sec. 
  18. vvml-yz-hbase-test.172.18.4.126 :) 
  19. # node2
  20. vvml-yz-hbase-test.172.18.4.188 :) select count(*) from db1.hits_local;
  21. SELECT count(*)
  22. FROM db1.hits_local
  23. Query id: 7da679f4-4f0f-42e5-8c78-df6c9814c61f
  24. ┌─count()─┐
  25. │ 4437894 │
  26. └─────────┘
  27. 1 row in set. Elapsed: 0.003 sec. 
  28. vvml-yz-hbase-test.172.18.4.188 :) select count(*) from db1.hits_all;
  29. SELECT count(*)
  30. FROM db1.hits_all
  31. Query id: 29594d92-e38e-473a-bc91-7a5fb37e0647
  32. ┌─count()─┐
  33. │ 8873898 │
  34. └─────────┘
  35. 1 row in set. Elapsed: 0.005 sec. 
  36. vvml-yz-hbase-test.172.18.4.188 :) 
  37. # node3
  38. vvml-yz-hbase-test.172.18.4.71 :) select count(*) from db1.hits_local;
  39. SELECT count(*)
  40. FROM db1.hits_local
  41. Query id: 2e5a0fca-9b13-4c4e-8cb9-f9ee60da6152
  42. ┌─count()─┐
  43. │ 4436004 │
  44. └─────────┘
  45. 1 row in set. Elapsed: 0.002 sec. 
  46. vvml-yz-hbase-test.172.18.4.71 :) select count(*) from db1.hits_all;
  47. SELECT count(*)
  48. FROM db1.hits_all
  49. Query id: 85e6e7a4-eaab-4455-8c29-52ec0998a9b0
  50. ┌─count()─┐
  51. │ 8873898 │
  52. └─────────┘
  53. 1 row in set. Elapsed: 0.006 sec. 
  54. vvml-yz-hbase-test.172.18.4.71 :) 
  55. # node4
  56. vvml-yz-hbase-test.172.18.4.86 :) select count(*) from db1.hits_local;
  57. SELECT count(*)
  58. FROM db1.hits_local
  59. Query id: 9950210e-a4a4-4e1d-a449-4aefb2ea396f
  60. ┌─count()─┐
  61. │ 4436004 │
  62. └─────────┘
  63. 1 row in set. Elapsed: 0.002 sec. 
  64. vvml-yz-hbase-test.172.18.4.86 :) select count(*) from db1.hits_all;
  65. SELECT count(*)
  66. FROM db1.hits_all
  67. Query id: 45777141-ff08-452f-93b8-1650265d4175
  68. ┌─count()─┐
  69. │ 8873898 │
  70. └─────────┘
  71. 1 row in set. Elapsed: 0.009 sec. 
  72. vvml-yz-hbase-test.172.18.4.86 :)
复制代码
        从查询效果可以看到,当地表 node1、node2 上的查询效果相同,node3、node4 上的查询效果相同,分布式表四个节点查询效果都相同。node1 和 node3 分属于集群中的不同分片,而在创建分布式表时指定的分布规则随机,以是这两个实例上的当地表数据量存在少许差异是符合预期的。结论是:


  • 指定分片键的分布式表可以写数据。
  • 数据按创建分布式表时指定的自定义的分片规则分布。
  • select count(*) from tutorial.hits_v1 用了2毫秒,select count(*) from db1.hits_all 用了6毫秒,在本测试环境中,查询分布式表比查询当地表慢了两倍。
  • 底层表如果使用 MergeTree 表引擎,其上正常定义了分布规则的分布式表,插入分布式表可以自动在同一分片上的不同副本之间做数据同步。
5. 使用 ReplicatedMergeTree 表引擎复制数据

        按通常的理解,一个分布式数据库至少必要满意以下根本需求:


  • 数据按指定规则自动分片
  • 同一分片的多个副本数据保持一致
  • 数据分片和多副本间的数据同步对应用透明。
        ClickHouse 推荐的设置是使用 ReplicatedMergeTree 表引擎,自动完成副本间的数据复制。创建 ReplcatedMergeTree 表,通常必要设置宏来辨认每个用于创建表的分片和副本。
(1)定义宏

        在全部四台主机上实行下面的利用步骤。
        修改 /etc/clickhouse-server/config.xml 主设置文件,在每个节点 <shard> 下添加 <internal_replication>true</internal_replication>:
  1.     <remote_servers>
  2.         <cluster_2S_2R>
  3.             <shard>
  4.                 <internal_replication>true</internal_replication>
  5.                 <replica>
  6.                     <host>node1</host>
  7.                     <port>9000</port>
  8.                     <user>default</user>
  9.                     <password>123456</password>
  10.                 </replica>
  11.                 <replica>
  12.                     <host>node2</host>
  13.                     <port>9000</port>
  14.                     <user>default</user>
  15.                     <password>123456</password>
  16.                 </replica>
  17.             </shard>
  18.             <shard>
  19.                 <internal_replication>true</internal_replication>
  20.                 <replica>
  21.                     <host>node3</host>
  22.                     <port>9000</port>
  23.                     <user>default</user>
  24.                     <password>123456</password>
  25.                 </replica>
  26.                 <replica>
  27.                     <host>node4</host>
  28.                     <port>9000</port>
  29.                     <user>default</user>
  30.                     <password>123456</password>
  31.                 </replica>
  32.             </shard>
  33.         </cluster_2S_2R>
  34.     </remote_servers>
复制代码
        在节点 </remote_servers> 反面添加以下内容:
  1. <macros>
  2.     <shard>01</shard>
  3.     <replica>01</replica>
  4. </macros>
复制代码
        node1 设置成 01、01;node2 设置成 01、02;node3 设置成 02、01;node4 设置成 02、02。
        然后重启所有实例:
  1. sudo service clickhouse-server restart
复制代码
(2)创建当地表

        在任一实例上实行下面的建表语句:
  1. CREATE TABLE db1.hits_replica ON CLUSTER 'cluster_2S_2R'
  2. (
  3.     `WatchID` UInt64,
  4.     `JavaEnable` UInt8,
  5.     `Title` String,
  6.     `GoodEvent` Int16,
  7.     `EventTime` DateTime,
  8.     `EventDate` Date,
  9.     `CounterID` UInt32,
  10.     `ClientIP` UInt32,
  11.     `ClientIP6` FixedString(16),
  12.     `RegionID` UInt32,
  13.     `UserID` UInt64,
  14.     `CounterClass` Int8,
  15.     `OS` UInt8,
  16.     `UserAgent` UInt8,
  17.     `URL` String,
  18.     `Referer` String,
  19.     `URLDomain` String,
  20.     `RefererDomain` String,
  21.     `Refresh` UInt8,
  22.     `IsRobot` UInt8,
  23.     `RefererCategories` Array(UInt16),
  24.     `URLCategories` Array(UInt16),
  25.     `URLRegions` Array(UInt32),
  26.     `RefererRegions` Array(UInt32),
  27.     `ResolutionWidth` UInt16,
  28.     `ResolutionHeight` UInt16,
  29.     `ResolutionDepth` UInt8,
  30.     `FlashMajor` UInt8,
  31.     `FlashMinor` UInt8,
  32.     `FlashMinor2` String,
  33.     `NetMajor` UInt8,
  34.     `NetMinor` UInt8,
  35.     `UserAgentMajor` UInt16,
  36.     `UserAgentMinor` FixedString(2),
  37.     `CookieEnable` UInt8,
  38.     `JavascriptEnable` UInt8,
  39.     `IsMobile` UInt8,
  40.     `MobilePhone` UInt8,
  41.     `MobilePhoneModel` String,
  42.     `Params` String,
  43.     `IPNetworkID` UInt32,
  44.     `TraficSourceID` Int8,
  45.     `SearchEngineID` UInt16,
  46.     `SearchPhrase` String,
  47.     `AdvEngineID` UInt8,
  48.     `IsArtifical` UInt8,
  49.     `WindowClientWidth` UInt16,
  50.     `WindowClientHeight` UInt16,
  51.     `ClientTimeZone` Int16,
  52.     `ClientEventTime` DateTime,
  53.     `SilverlightVersion1` UInt8,
  54.     `SilverlightVersion2` UInt8,
  55.     `SilverlightVersion3` UInt32,
  56.     `SilverlightVersion4` UInt16,
  57.     `PageCharset` String,
  58.     `CodeVersion` UInt32,
  59.     `IsLink` UInt8,
  60.     `IsDownload` UInt8,
  61.     `IsNotBounce` UInt8,
  62.     `FUniqID` UInt64,
  63.     `HID` UInt32,
  64.     `IsOldCounter` UInt8,
  65.     `IsEvent` UInt8,
  66.     `IsParameter` UInt8,
  67.     `DontCountHits` UInt8,
  68.     `WithHash` UInt8,
  69.     `HitColor` FixedString(1),
  70.     `UTCEventTime` DateTime,
  71.     `Age` UInt8,
  72.     `Sex` UInt8,
  73.     `Income` UInt8,
  74.     `Interests` UInt16,
  75.     `Robotness` UInt8,
  76.     `GeneralInterests` Array(UInt16),
  77.     `RemoteIP` UInt32,
  78.     `RemoteIP6` FixedString(16),
  79.     `WindowName` Int32,
  80.     `OpenerName` Int32,
  81.     `HistoryLength` Int16,
  82.     `BrowserLanguage` FixedString(2),
  83.     `BrowserCountry` FixedString(2),
  84.     `SocialNetwork` String,
  85.     `SocialAction` String,
  86.     `HTTPError` UInt16,
  87.     `SendTiming` Int32,
  88.     `DNSTiming` Int32,
  89.     `ConnectTiming` Int32,
  90.     `ResponseStartTiming` Int32,
  91.     `ResponseEndTiming` Int32,
  92.     `FetchTiming` Int32,
  93.     `RedirectTiming` Int32,
  94.     `DOMInteractiveTiming` Int32,
  95.     `DOMContentLoadedTiming` Int32,
  96.     `DOMCompleteTiming` Int32,
  97.     `LoadEventStartTiming` Int32,
  98.     `LoadEventEndTiming` Int32,
  99.     `NSToDOMContentLoadedTiming` Int32,
  100.     `FirstPaintTiming` Int32,
  101.     `RedirectCount` Int8,
  102.     `SocialSourceNetworkID` UInt8,
  103.     `SocialSourcePage` String,
  104.     `ParamPrice` Int64,
  105.     `ParamOrderID` String,
  106.     `ParamCurrency` FixedString(3),
  107.     `ParamCurrencyID` UInt16,
  108.     `GoalsReached` Array(UInt32),
  109.     `OpenstatServiceName` String,
  110.     `OpenstatCampaignID` String,
  111.     `OpenstatAdID` String,
  112.     `OpenstatSourceID` String,
  113.     `UTMSource` String,
  114.     `UTMMedium` String,
  115.     `UTMCampaign` String,
  116.     `UTMContent` String,
  117.     `UTMTerm` String,
  118.     `FromTag` String,
  119.     `HasGCLID` UInt8,
  120.     `RefererHash` UInt64,
  121.     `URLHash` UInt64,
  122.     `CLID` UInt32,
  123.     `YCLID` UInt64,
  124.     `ShareService` String,
  125.     `ShareURL` String,
  126.     `ShareTitle` String,
  127.     `ParsedParams` Nested(
  128.         Key1 String,
  129.         Key2 String,
  130.         Key3 String,
  131.         Key4 String,
  132.         Key5 String,
  133.         ValueDouble Float64),
  134.     `IslandID` FixedString(16),
  135.     `RequestNum` UInt32,
  136.     `RequestTry` UInt8
  137. )
  138. ENGINE = ReplicatedMergeTree(
  139.     '/clickhouse/tables/{shard}/hits',
  140.     '{replica}'
  141. )
  142. PARTITION BY toYYYYMM(EventDate)
  143. ORDER BY (CounterID, EventDate, intHash32(UserID))
  144. SAMPLE BY intHash32(UserID);
复制代码
(3)创建分布式表

        在任一实例上实行下面的建表语句:
  1. CREATE TABLE db1.hits_replica_all ON CLUSTER 'cluster_2S_2R'
  2. AS db1.hits_replica
  3. ENGINE = Distributed(cluster_2S_2R, db1, hits_replica, rand());
复制代码
(4)向分布式表中插入数据

        在任一实例上实行下面的 insert 语句:
  1. INSERT INTO db1.hits_replica_all SELECT * FROM tutorial.hits_v1;
复制代码
(5)在所有实例上分别实行 SELECT 查询

  1. # node1
  2. vvml-yz-hbase-test.172.18.4.126 :) select count(*) from db1.hits_replica;
  3. SELECT count(*)
  4. FROM db1.hits_replica
  5. Query id: e4f08a9c-39be-48cf-a8f9-6caaa98b9fed
  6. ┌─count()─┐
  7. │ 4438089 │
  8. └─────────┘
  9. 1 row in set. Elapsed: 0.002 sec. 
  10. vvml-yz-hbase-test.172.18.4.126 :) select count(*) from db1.hits_replica_all;
  11. SELECT count(*)
  12. FROM db1.hits_replica_all
  13. Query id: 765f0ea5-a199-4bfb-85c4-6fd31e23f1af
  14. ┌─count()─┐
  15. │ 8873898 │
  16. └─────────┘
  17. 1 row in set. Elapsed: 0.007 sec. 
  18. vvml-yz-hbase-test.172.18.4.126 :) 
  19. # node2
  20. vvml-yz-hbase-test.172.18.4.188 :) select count(*) from db1.hits_replica;
  21. SELECT count(*)
  22. FROM db1.hits_replica
  23. Query id: b464df00-4c1a-4fea-8f2f-70fd0c32569f
  24. ┌─count()─┐
  25. │ 4438089 │
  26. └─────────┘
  27. 1 row in set. Elapsed: 0.002 sec. 
  28. vvml-yz-hbase-test.172.18.4.188 :) select count(*) from db1.hits_replica_all;
  29. SELECT count(*)
  30. FROM db1.hits_replica_all
  31. Query id: 19a77125-7e7e-4654-bb90-8884c3b0ed26
  32. ┌─count()─┐
  33. │ 8873898 │
  34. └─────────┘
  35. 1 row in set. Elapsed: 0.007 sec. 
  36. vvml-yz-hbase-test.172.18.4.188 :) 
  37. # node3
  38. vvml-yz-hbase-test.172.18.4.71 :) select count(*) from db1.hits_replica;
  39. SELECT count(*)
  40. FROM db1.hits_replica
  41. Query id: e021f13c-b980-4297-9c26-b94465503101
  42. ┌─count()─┐
  43. │ 4435809 │
  44. └─────────┘
  45. 1 row in set. Elapsed: 0.003 sec. 
  46. vvml-yz-hbase-test.172.18.4.71 :) select count(*) from db1.hits_replica_all;
  47. SELECT count(*)
  48. FROM db1.hits_replica_all
  49. Query id: 80ebbda2-be03-445c-8c16-f67428a45572
  50. ┌─count()─┐
  51. │ 8873898 │
  52. └─────────┘
  53. 1 row in set. Elapsed: 0.005 sec. 
  54. vvml-yz-hbase-test.172.18.4.71 :) 
  55. # node4
  56. vvml-yz-hbase-test.172.18.4.86 :) select count(*) from db1.hits_replica;
  57. SELECT count(*)
  58. FROM db1.hits_replica
  59. Query id: 204f483d-3fcb-4758-8df4-20e79f78e42e
  60. ┌─count()─┐
  61. │ 4435809 │
  62. └─────────┘
  63. 1 row in set. Elapsed: 0.002 sec. 
  64. vvml-yz-hbase-test.172.18.4.86 :) select count(*) from db1.hits_replica_all;
  65. SELECT count(*)
  66. FROM db1.hits_replica_all
  67. Query id: e3d28b81-50b7-46e6-a855-1b40ef9fe42a
  68. ┌─count()─┐
  69. │ 8873898 │
  70. └─────────┘
  71. 1 row in set. Elapsed: 0.007 sec. 
  72. vvml-yz-hbase-test.172.18.4.86 :)
复制代码
        从查询效果可以看到,当地表 node1、node2 上的查询效果相同,node3、node4 上的查询效果相同,分布式表四个节点查询效果都相同。node1 和 node3 分属于集群中的不同分片,而在创建分布式表时指定的分布规则随机,以是这两个实例上的当地表数据量存在少许差异是符合预期的。ReplicatedMergeTree 表可以自动在同一分片的不同副本间同步数据。
参考:



  • 使用教程
  • ClickHouse Keeper (clickhouse-keeper)

免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。
回复

使用道具 举报

0 个回复

正序浏览

快速回复

您需要登录后才可以回帖 登录 or 立即注册

本版积分规则

篮之新喜

金牌会员
这个人很懒什么都没写!

标签云

快速回复 返回顶部 返回列表