测试环境
prometheus-2.54.1.linux-amd64.tar.gz
下载所在:
https://www.prometheus.io/download/
https://github.com/prometheus/prometheus/releases/download/v2.54.1/prometheus-2.54.1.linux-amd64.tar.gz
node_exporter-1.8.2.linux-amd64.tar.gz
下载所在:
https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
https://prometheus.io/download/#node_exporter
consul_exporter-0.12.1.linux-amd64.tar.gz
下载所在:
https://github.com/prometheus/consul_exporter/releases/download/v0.12.1/consul_exporter-0.12.1.linux-amd64.tar.gz
pushgateway-1.9.0.linux-amd64.tar.gz
下载所在:
https://www.prometheus.io/download/#pushgateway
https://github.com/prometheus/pushgateway/releases/download/v1.9.0/pushgateway-1.9.0.linux-amd64.tar.gz
victoria-metrics-linux-amd64-v1.103.0.tar.gz
下载所在:
https://github.com/VictoriaMetrics/VictoriaMetrics/releases
https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.103.0
https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.103.0/victoria-metrics-linux-amd64-v1.103.0.tar.gz
consul_1.19.2_linux_amd64.zip
https://releases.hashicorp.com/consul/1.19.2/consul_1.19.2_linux_amd64.zip
grafana-7.5.6-1.x86_64.rpm
下载所在:https://dl.grafana.com/oss/release/grafana-7.5.6-1.x86_64.rpm
CentOS 7.9
注意:prometheus,victoria-metrics,grafana,pushgateway 都可以安装在不同机器上,本文仅涉及学习实践,所以,都安装在同一台机器上了。
实践过程
VictoriaMetrics安装
- # wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.103.0/victoria-metrics-linux-amd64-v1.103.0.tar.gz
- # tar -xvzf victoria-metrics-linux-amd64-v1.103.0.tar.gz -C /usr/local/bin # 解压后会生成一个名为victoria-metrics-prod的二进制文件
- # 创建一个用于存储VictoriaMetrics数据的文件夹
- # mkdir -p /usr/data/victoria-metrics
复制代码- # 创建服务
- # vi /etc/systemd/system/victoriametrics.service
- [Unit]
- Description=Victoria metrics service
- After=network.target
- [Service]
- Type=simple
- Restart=always
- TimeoutStartSec=30
- Restart=on-failure
- RestartSec=5s
- ExecStart=/usr/local/bin/victoria-metrics-prod -storageDataPath=/usr/data/victoria-metrics -retentionPeriod=30d -selfScrapeInterval=10s
- ExecStop=/bin/kill $MAINPID
- ExecReload=/bin/kill -HUP $MAINPID
- PrivateTmp=yes
- [Install]
- WantedBy=multi-user.target
复制代码 阐明:
-storageDataPath 设置数据目次路径(如果目次路径不存在,程序启动时会自动创建)。VictoriaMetrics会将所有数据存储在此目次中。默认存储在当前工作目次中的victoria-metrics-data目次
-retentionPeriod 设置存储数据的保存。自动删除旧的数据。默认保存时间为1个月(31天)。 最小值为24h大概1d,-retentionPeriod=3设置数据仅存储3个月,-retentionPeriod=1d设置数据仅保存1天。
一样平常情况下,只需要设置上述两个参数标识即可,其它参数已经有足够好的默认值,仅在确实需要修改它们时才举行设置,执行./victoria-metrics-prod --help 可以查看所有可获取的参数形貌和及其默认值所有可获取的参数形貌和默认值。
默认的,VictoriaMetrics通过端口8428监听Prometheus查询API请求
建议为 VictoriaMetrics设置监控
前台启动查看- # /usr/local/bin/victoria-metrics-prod -storageDataPath=/usr/data/victoria-metrics -retentionPeriod=30d -selfScrapeInterval=10s
- 2024-09-03T16:33:42.187Z info VictoriaMetrics/lib/logger/flag.go:12 build version: victoria-metrics-20240828-135248-tags-v1.103.0-0-g5aeb759df9
- 2024-09-03T16:33:42.187Z info VictoriaMetrics/lib/logger/flag.go:13 command-line flags
- 2024-09-03T16:33:42.187Z info VictoriaMetrics/lib/logger/flag.go:20 -retentionPeriod="30d"
- 2024-09-03T16:33:42.187Z info VictoriaMetrics/lib/logger/flag.go:20 -selfScrapeInterval="10s"
- 2024-09-03T16:33:42.187Z info VictoriaMetrics/lib/logger/flag.go:20 -storageDataPath="/usr/data/victoria-metrics"
- 2024-09-03T16:33:42.187Z info VictoriaMetrics/app/victoria-metrics/main.go:73 starting VictoriaMetrics at "[:8428]"...
- 2024-09-03T16:33:42.187Z info VictoriaMetrics/app/vmstorage/main.go:107 opening storage at "/usr/data/victoria-metrics" with -retentionPeriod=30d
- 2024-09-03T16:33:42.189Z info VictoriaMetrics/lib/memory/memory.go:42 limiting caches to 611758080 bytes, leaving 407838720 bytes to the OS according to -memory.allowedPercent=60
- 2024-09-03T16:33:42.205Z info VictoriaMetrics/app/vmstorage/main.go:121 successfully opened storage "/usr/data/victoria-metrics" in 0.018 seconds; partsCount: 0; blocksCount: 0; rowsCount: 0; sizeBytes: 0
- 2024-09-03T16:33:42.205Z info VictoriaMetrics/app/vmselect/promql/rollup_result_cache.go:127 loading rollupResult cache from "/usr/data/victoria-metrics/cache/rollupResult"...
- 2024-09-03T16:33:42.207Z info VictoriaMetrics/app/vmselect/promql/rollup_result_cache.go:156 loaded rollupResult cache from "/usr/data/victoria-metrics/cache/rollupResult" in 0.001 seconds; entriesCount: 0, sizeBytes: 0
- 2024-09-03T16:33:42.207Z info VictoriaMetrics/app/victoria-metrics/main.go:84 started VictoriaMetrics in 0.020 seconds
- 2024-09-03T16:33:42.207Z info VictoriaMetrics/lib/httpserver/httpserver.go:121 starting server at http://127.0.0.1:8428/
- 2024-09-03T16:33:42.207Z info VictoriaMetrics/lib/httpserver/httpserver.go:122 pprof handlers are exposed at http://127.0.0.1:8428/debug/pprof/
- 2024-09-03T16:33:42.208Z info VictoriaMetrics/app/victoria-metrics/self_scraper.go:46 started self-scraping `/metrics` page with interval 10.000 seconds
- 2024-09-03T16:33:52.293Z info VictoriaMetrics/lib/storage/partition.go:202 creating a partition "2024_09" with smallPartsPath="/usr/data/victoria-metrics/data/small/2024_09", bigPartsPath="/usr/data/victoria-metrics/data/big/2024_09"
- 2024-09-03T16:33:52.295Z info VictoriaMetrics/lib/storage/partition.go:211 partition "2024_09" has been created
复制代码 关闭前台启动的进程,启动服务并设置开机启动- # systemctl daemon-reload && sudo systemctl enable --now victoriametrics.service
- # 查看服务是否启动成功
- # systemctl status victoriametrics.service
复制代码 开放防火墙端口
- # firewall-cmd --permanent --zone=public --add-port=8248/tcp
- success
- # firewall-cmd --reload
- success
复制代码 除了以二进制方式启动,VictoriaMetrics也支持Docker安装,具体可参考链接 https://hub.docker.com/r/victoriametrics/victoria-metrics/
参考链接
https://docs.victoriametrics.com/quick-start/
Prometheus安装与配置
- # wget https://github.com/prometheus/prometheus/releases/download/v2.54.1/prometheus-2.54.1.linux-amd64.tar.gz
- # tar -C /usr/local/ -xvzf prometheus-2.54.1.linux-amd64.tar.gz
- # cd /usr/local/prometheus-2.54.1.linux-amd64
- # ls
- console_libraries consoles LICENSE NOTICE prometheus prometheus.yml promtool
- # ln -s /usr/local/prometheus-2.26.0.linux-amd64/prometheus /usr/local/bin/prometheus
- # cp prometheus.yml prometheus.yml.bak
- # echo ''> prometheus.yml
- # vi prometheus.yml
复制代码 将prometheus.yml内容更换为以下内容- global:
- scrape_interval: 15s # 默认,每15秒采样一次目标
- # 一份采样配置仅包含一个 endpoint 来做采样
- # 下面是 Prometheus 本身的endpoint:
- scrape_configs:
- # job_name 将被被当作一个标签 `job=<job_name>`添加到该配置的任意时序采样.
- - job_name: 'prometheus'
- # 覆盖全局默认值,从该job每5秒对目标采样一次
- scrape_interval: 5s
- static_configs:
- - targets: ['localhost:9090']
- remote_write:
- - url: http://192.168.88.132:8428/api/v1/write
- queue_config:
- max_samples_per_send: 10000
- capacity: 20000
- max_shards: 30
复制代码 阐明:
为了能给VictoriaMetrics发送数据,在Prometheus配置文件中,增加remote_write 配置。 添加以下代码到 Prometheus 配置文件(一样平常是在prometheus.yml):- remote_write:
- - url: http://<victoriametrics-addr>:8428/api/v1/write
复制代码 注意:添加时,需要更换为VictoriaMetrics主机名称大概IP所在,形如以下- remote_write:
- - url: http://192.168.88.132.71.170:8428/api/v1/write
复制代码 启动Prometheus服务- # ./prometheus
- ts=2024-09-04T15:50:32.906Z caller=main.go:601 level=info msg="No time or size retention was set so using the default time retention" duration=15d
- ts=2024-09-04T15:50:32.906Z caller=main.go:645 level=info msg="Starting Prometheus Server" mode=server version="(version=2.54.1, branch=HEAD, revision=e6cfa720fbe6280153fab13090a483dbd40bece3)"
- ts=2024-09-04T15:50:32.906Z caller=main.go:650 level=info build_context="(go=go1.22.6, platform=linux/amd64, user=root@812ffd741951, date=20240827-10:56:41, tags=netgo,builtinassets,stringlabels)"
- ts=2024-09-04T15:50:32.906Z caller=main.go:651 level=info host_details="(Linux 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 localhost.localdomain (none))"
- ts=2024-09-04T15:50:32.906Z caller=main.go:652 level=info fd_limits="(soft=4096, hard=4096)"
- ts=2024-09-04T15:50:32.906Z caller=main.go:653 level=info vm_limits="(soft=unlimited, hard=unlimited)"
- ts=2024-09-04T15:50:32.917Z caller=web.go:571 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090
- ts=2024-09-04T15:50:32.925Z caller=main.go:1160 level=info msg="Starting TSDB ..."
- ts=2024-09-04T15:50:32.930Z caller=tls_config.go:313 level=info component=web msg="Listening on" address=[::]:9090
- ts=2024-09-04T15:50:32.930Z caller=tls_config.go:316 level=info component=web msg="TLS is disabled." http2=false address=[::]:9090
- ts=2024-09-04T15:50:32.932Z caller=head.go:626 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
- ts=2024-09-04T15:50:32.932Z caller=head.go:713 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=5.601µs
- ts=2024-09-04T15:50:32.933Z caller=head.go:721 level=info component=tsdb msg="Replaying WAL, this may take a while"
- ts=2024-09-04T15:50:32.933Z caller=head.go:793 level=info component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
- ts=2024-09-04T15:50:32.933Z caller=head.go:830 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=25.237µs wal_replay_duration=560.14µs wbl_replay_duration=141ns chunk_snapshot_load_duration=0s mmap_chunk_replay_duration=5.601µs total_replay_duration=605.091µs
- ts=2024-09-04T15:50:32.938Z caller=main.go:1181 level=info fs_type=XFS_SUPER_MAGIC
- ts=2024-09-04T15:50:32.938Z caller=main.go:1184 level=info msg="TSDB started"
- ts=2024-09-04T15:50:32.938Z caller=main.go:1367 level=info msg="Loading configuration file" filename=prometheus.yml
- ts=2024-09-04T15:50:32.940Z caller=dedupe.go:112 component=remote level=info remote_name=b93975 url=http://192.168.88.132:8428/api/v1/write msg="Starting WAL watcher" queue=b93975
- ts=2024-09-04T15:50:32.940Z caller=dedupe.go:112 component=remote level=info remote_name=b93975 url=http://192.168.88.132:8428/api/v1/write msg="Starting scraped metadata watcher"
- ts=2024-09-04T15:50:32.945Z caller=main.go:1404 level=info msg="updated GOGC" old=100 new=75
- ts=2024-09-04T15:50:32.945Z caller=main.go:1415 level=info msg="Completed loading of configuration file" filename=prometheus.yml totalDuration=6.619214ms db_storage=25.792µs remote_storage=1.190631ms web_handler=652ns query_engine=18.267µs scrape=3.897727ms scrape_sd=49.586µs notify=1.164µs notify_sd=954ns rules=60.122µs tracing=62.555µs
- ts=2024-09-04T15:50:32.945Z caller=main.go:1145 level=info msg="Server is ready to receive web requests."
- ts=2024-09-04T15:50:32.945Z caller=manager.go:164 level=info component="rule manager" msg="Starting rule manager..."
- ts=2024-09-04T15:50:32.945Z caller=dedupe.go:112 component=remote level=info remote_name=b93975 url=http://192.168.88.132:8428/api/v1/write msg="Replaying WAL" queue=b93975
- ts=2024-09-04T15:50:40.288Z caller=dedupe.go:112 component=remote level=info remote_name=b93975 url=http://192.168.88.132:8428/api/v1/write msg="Done replaying WAL" duration=7.342804783s
复制代码 阐明:如果盼望使用非默认配置文件,可以在执行下令时指定具体的配置文件,类似如下:- # ./prometheus --config.file=./custom_prometheus.yml
复制代码 备注:重启下令- kill -HUP `pid_of_prometheus`
复制代码 当然也可直接Ctrl + c终止Prometheus进程,然后重新运行。
Prometheus把传入的数据写入到本地存储的同时将数据复制到远程存储。这意味着即使远程存储不可用,存储在本地,--storage.tsdb.retention.time 指定数据保存期内的数据依然可用。
如果需要从多个 Prometheus 实例往 VictoriaMetrics 发送数据,添加external_labels配置到Prometheus配置文件的global结点,形如以下:- global:
- external_labels:
- datacenter: dc-123
复制代码 如上,以上配置告诉 Prometheus 添加 datacenter=dc-123 标签到发送给远程存储的每个时间序列。标签名称可以是恣意的,比如 datacenter 。标签值必须在所有 Prometheus 实例中保持唯一,这样,可以通过该标签过滤大概分组时序。
对于高负载的Prometheus实例(每秒200k+个样本),可以应用以下调优:- remote_write:
- - url: http://<victoriametrics-addr>:8428/api/v1/write queue_config: max_samples_per_send: 10000 capacity: 20000 max_shards: 30
复制代码 使用远程写入增加 Prometheus 约25%的内存使用率,这取决于数据形态(原文:Using remote write increases memory usage for Prometheus up to ~25% and depends on the shape of data)。如果你正面临太高内存消耗的问题,尝试降低 max_samples_per_send 和 capacity 参数配置值(注意:这两个参数是紧密相连的)查看更多关于远程写入调优.
建议升级Prometheus到 v2.12.0 或更高,由于之前版本使用 remote_write存在问题。
也可以查看下 vmagent 和 vmalert, 也是用于镌汰Prometheus更快和更少的资源消耗的一种选择方案。
参考链接:https://docs.victoriametrics.com/#prometheus-setup
创建服务
- # vi /etc/systemd/system/rometheus.service
复制代码- [Unit]
- Description=Prometheus service
- After=network.target
- [Service]
- Type=simple
- Restart=always
- TimeoutStartSec=30
- Restart=on-failure
- RestartSec=5s
- ExecStart=/usr/local/prometheus-2.54.1.linux-amd64/prometheus --config.file=/usr/local/prometheus-2.54.1.linux-amd64/prometheus.yml
- ExecStop=/bin/kill $MAINPID
- ExecReload=/bin/kill -HUP $MAINPID
- [Install]
- WantedBy=multi-user.target
复制代码 注意:配置文件必须使用绝对路径,否则运行服务报错,找不到文件/prometheus.yml- # 先手动停止上述前台运行的prometheus,然后执行以下命令
- # systemctl daemon-reload && systemctl enable --now prometheus
- # systemctl status prometheus
复制代码 grafana安装与配置
- # yum install grafana-7.5.6-1.x86_64.rpm
复制代码 阐明:如果使用yum下令安装找不到软件包,可以思量下载安装,如下- # wget https://dl.grafana.com/oss/release/grafana-7.5.6-1.x86_64.rpm
- # yum install -y fontconfig urw-fonts
- # rpm -ivh grafana-7.5.6-1.x86_64.rpm
- warning: grafana-7.5.6-1.x86_64.rpm: Header V4 RSA/SHA256 Signature, key ID 24098cb6: NOKEY
- Preparing... ################################# [100%]
- Updating / installing...
- 1:grafana-7.5.6-1 ################################# [100%]
- ### NOT starting on installation, please execute the following statements to configure grafana to start automatically using systemd
- sudo /bin/systemctl daemon-reload
- sudo /bin/systemctl enable grafana-server.service
- ### You can start grafana-server by executing
- sudo /bin/systemctl start grafana-server.service
- POSTTRANS: Running script
- # /bin/systemctl daemon-reload
- ~]# /bin/systemctl enable grafana-server.service
- Created symlink from /etc/systemd/system/multi-user.target.wants/grafana-server.service to /usr/lib/systemd/system/grafana-server.service.
- # /bin/systemctl start grafana-server.service
复制代码 阐明:如果不执行yum install -y fontconfig urw-fonts下令,安装grafana时可能报错,如下- warning: grafana-7.5.6-1.x86_64.rpm: Header V4 RSA/SHA256 Signature, key ID 24098cb6: NOKEY
- error: Failed dependencies:
- fontconfig is needed by grafana-7.5.6-1.x86_64
- urw-fonts is needed by grafana-7.5.6-1.x86_64
复制代码 修改grafana配置[可选]
- # vim /etc/grafana/grafana.ini
复制代码 浏览器输入网址:http://:3000访问看看效果:
备注:grafana默认登录账号密码为admin/admin
参考链接:https://grafana.com/grafana/download?pg=get&plcmt=selfmanaged-box1-cta1
开放防火墙端口
- # firewall-cmd --permanent --zone=public --add-port=3000/tcp
- success
- # firewall-cmd --reload
- success
复制代码 创建Prometheus数据源
使用以下URL创建 Prometheus数据源 (创建数据源时仅修改URL,其它保持默认) :- http://<victoriametrics-addr>:8428
复制代码 如下,更换 为VictoriaMetrics主机名称大概IP所在,形如http://192.168.55.88.132:8428,然后使用PromQL 或MetricsQL用创建的数据源创建图表
Access模式简介
访问模式控制怎样处理对数据源的请求。如果没有其他阐明,Server(default)应该是首选方式。
- Server访问模式(默认)
来自浏览器发起的所有请求都将发送到Grafana后端/服务器,后者将请求转发到数据源,从而规避可能的跨域源资源共享 Cross-Origin Resource Sharing (CORS) 要求。如果选择该访问模式,则要求URL可被grafana后端/服务器访问。
- Browser 访问模式
来自浏览器的所有请求都将直接发送给数据源,并可能受到跨域源资源共享要求的束缚。如果选择该访问模式,则要求URL可从浏览器访问。·
参考链接:https://docs.victoriametrics.com/#grafana-setup
安装 pushgateway
- # wget https://github.com/prometheus/pushgateway/releases/download/v1.9.0/pushgateway-1.9.0.linux-amd64.tar.gz
- # tar -C /usr/local/ -xvzf pushgateway-1.9.0.linux-amd64.tar.gz
- # ln -s /usr/local/pushgateway-1.9.0.linux-amd64/pushgateway /usr/local/bin/pushgateway
- [root@localhost ~]# pushgateway
- ts=2024-09-04T17:38:16.325Z caller=main.go:87 level=info msg="starting pushgateway" version="(version=1.9.0, branch=HEAD, revision=d1ca1a6a426126a09a21f745e8ffbaba550f9643)"
- ts=2024-09-04T17:38:16.325Z caller=main.go:88 level=info build_context="(go=go1.22.4, platform=linux/amd64, user=root@2167597b1e9c, date=20240608-15:04:08, tags=unknown)"
- ts=2024-09-04T17:38:16.328Z caller=tls_config.go:313 level=info msg="Listening on" address=[::]:9091
- ts=2024-09-04T17:38:16.328Z caller=tls_config.go:316 level=info msg="TLS is disabled." http2=false address=[::]:9091
复制代码 创建服务
- # vi /etc/systemd/system/pushgateway.service
复制代码- [Unit]
- Description=Pushgateway service
- After=network.target
- [Service]
- Type=simple
- Restart=always
- TimeoutStartSec=30
- Restart=on-failure
- RestartSec=5s
- ExecStart=/usr/local/pushgateway-1.9.0.linux-amd64/pushgateway
- ExecStop=/bin/kill $MAINPID
- ExecReload=/bin/kill -HUP $MAINPID
- [Install]
- WantedBy=multi-user.target
复制代码 先手动停止上述前台运行的pushgateway,然后执行以下下令- # systemctl daemon-reload && systemctl enable --now pushgateway
- # systemctl status pushgateway
复制代码 Node Exporter安装与配置
注意:只需在需要被监控的机器上安装(本例中为一台redis服务器,IP:192.168.88.131)- # wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
- # tar -C /usr/local/ -xvzf node_exporter-1.8.2.linux-amd64.tar.gz
- # ln -s /usr/local/node_exporter-1.8.2.linux-amd64/node_exporter /usr/local/bin/node_exporter
- # node_exporter
- ts=2024-09-04T16:01:28.241Z caller=node_exporter.go:193 level=info msg="Starting node_exporter" version="(version=1.8.2, branch=HEAD, revision=f1e0e8360aa60b6cb5e5cc1560bed348fc2c1895)"
- ts=2024-09-04T16:01:28.241Z caller=node_exporter.go:194 level=info msg="Build context" build_context="(go=go1.22.5, platform=linux/amd64, user=root@03d440803209, date=20240714-11:53:45, tags=unknown)"
- ts=2024-09-04T16:01:28.242Z caller=node_exporter.go:196 level=warn msg="Node Exporter is running as root user. This exporter is designed to run as unprivileged user, root is not required."
- ts=2024-09-04T16:01:28.242Z caller=filesystem_common.go:111 level=info collector=filesystem msg="Parsed flag --collector.filesystem.mount-points-exclude" flag=^/(dev|proc|run/credentials/.+|sys|var/lib/docker/.+|var/lib/containers/storage/.+)($|/)
- ts=2024-09-04T16:01:28.242Z caller=filesystem_common.go:113 level=info collector=filesystem msg="Parsed flag --collector.filesystem.fs-types-exclude" flag=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
- ts=2024-09-04T16:01:28.242Z caller=diskstats_common.go:111 level=info collector=diskstats msg="Parsed flag --collector.diskstats.device-exclude" flag=^(z?ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\d+n\d+p)\d+$
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:111 level=info msg="Enabled collectors"
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=arp
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=bcache
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=bonding
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=btrfs
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=conntrack
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=cpu
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=cpufreq
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=diskstats
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=dmi
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=edac
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=entropy
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=fibrechannel
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=filefd
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=filesystem
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=hwmon
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=infiniband
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=ipvs
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=loadavg
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=mdadm
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=meminfo
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=netclass
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=netdev
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=netstat
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=nfs
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=nfsd
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=nvme
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=os
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=powersupplyclass
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=pressure
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=rapl
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=schedstat
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=selinux
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=sockstat
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=softnet
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=stat
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=tapestats
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=textfile
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=thermal_zone
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=time
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=timex
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=udp_queues
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=uname
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=vmstat
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=watchdog
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=xfs
- ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=zfs
- ts=2024-09-04T16:01:28.244Z caller=tls_config.go:313 level=info msg="Listening on" address=[::]:9100
- ts=2024-09-04T16:01:28.244Z caller=tls_config.go:316 level=info msg="TLS is disabled." http2=false address=[::]:9100
复制代码 通过输出可知,Node Exporter已在运行,并且在端口9100暴露指标
Node Exporter 指标
通过请求/metrics端点来确认指标是否已成功暴露:- # curl http://localhost:9100/metrics
复制代码 看到类似下输出则表明暴露成功。- # HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
- # TYPE go_gc_duration_seconds summary
- go_gc_duration_seconds{quantile="0"} 0
- go_gc_duration_seconds{quantile="0.25"} 0
- go_gc_duration_seconds{quantile="0.5"} 0
- go_gc_duration_seconds{quantile="0.75"} 0
- go_gc_duration_seconds{quantile="1"} 0
- go_gc_duration_seconds_sum 0
- go_gc_duration_seconds_count 0
- ...
复制代码 Node Exporter现在暴露了Prometheus可以抓取的指标,包括输出中更深层级的各种体系指标(以node_为前缀)。要查看这些指标可以执行以下下令:- # curl http://localhost:9100/metrics | grep "node_"
复制代码 参考链接:https://prometheus.io/docs/guides/node-exporter/#monitoring-linux-host-metrics-with-the-node-exporter
创建服务
- # vi /etc/systemd/system/node_exporter.service
复制代码- [Unit]
- Description=node exporter service
- After=network.target
- [Service]
- Type=simple
- Restart=always
- TimeoutStartSec=30
- Restart=on-failure
- RestartSec=5s
- ExecStart=/usr/local/bin/node_exporter
- ExecStop=/bin/kill $MAINPID
- ExecReload=/bin/kill -HUP $MAINPID
- [Install]
- WantedBy=multi-user.target
复制代码 安装consul
官方安装指引:- # yum install -y yum-utils
- # yum-config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo
- # yum -y install consul
- # consul -v
- Consul v1.9.5
复制代码 阐明:该安装方式无法安装成功
可行安装方式:- # mkdir /usr/local/consul
- # mv consul_1.19.2_linux_amd64.zip /usr/local/consul/
- # cd /usr/local/consul/
- # unzip consul_1.19.2_linux_amd64.zip
- # ./consul -v
- Consul v1.19.2
- Revision 048f1936
- Build Date 2024-08-27T16:06:44Z
- Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)
- # ln -s /usr/local/consul/consul /usr/bin/consul
复制代码 启动署理
服务器节点1(192.168.88.133):- # mkdir -p /data/consul
- # consul agent -server -bootstrap -datacenter=testDC -ui=true -data-dir=/data/consul -node=server1 -bind=192.168.88.133 -client=0.0.0.0 -serf-lan-port=8303 -serf-wan-port=8305 -dns-port=8601 -http-port=8603 -syslog -log-level=INFO
复制代码 服务器节点2(192.168.88.134):- # mkdir -p /data/consul
- # consul agent -server=true -datacenter=testDC -data-dir=/data/consul --node=server2 -bind=192.168.88.134 -client=0.0.0.0 -retry-join=192.168.88.133 -serf-lan-port=8303 -serf-wan-port=8305 -dns-port=8601 -http-port=8603 -syslog -log-level=INFO
复制代码 阐明:启动后加入集群192.168.88.134:8301
浏览器访问验证
创建服务
服务器节点1(192.168.88.133):- # vi /etc/systemd/system/consul.service
复制代码- [Unit]
- Description=Consul service
- After=network.target
- [Service]
- Type=simple
- Restart=always
- TimeoutStartSec=30
- Restart=on-failure
- RestartSec=5s
- ExecStart=/usr/bin/consul agent -server -bootstrap -datacenter=testDC -ui=true -data-dir=/data/consul -node=server1 -bind=192.168.88.133 -client=0.0.0.0 -serf-lan-port=8303 -serf-wan-port=8305 -dns-port=8601 -http-port=8603 -syslog -log-level=INFO
- ExecStop=/bin/kill $MAINPID
- ExecReload=/bin/kill -HUP $MAINPID
- [Install]
- WantedBy=multi-user.target
复制代码 服务器节点2(192.168.88.134):- # vi /etc/systemd/system/consul.service
复制代码- [Unit]
- Description=Consul service
- After=network.target
- [Service]
- Type=simple
- Restart=always
- TimeoutStartSec=30
- Restart=on-failure
- RestartSec=5s
- ExecStart=/usr/bin/consul agent -server=true -datacenter=testDC -data-dir=/data/consul --node=server2 -bind=192.168.88.134 -client=0.0.0.0 -retry-join=192.168.88.133 -serf-lan-port=8303 -serf-wan-port=8305 -dns-port=8601 -http-port=8603 -syslog -log-level=INFO
- ExecStop=/bin/kill $MAINPID
- ExecReload=/bin/kill -HUP $MAINPID
- [Install]
- WantedBy=multi-user.target
复制代码 先手动停止上述前台运行的consul,然后分别在两台机器上执行以下下令- # systemctl daemon-reload && systemctl enable --now consul
- # systemctl status consul
复制代码 参考链接:
https://developer.hashicorp.com/consul/install
https://developer.hashicorp.com/consul/docs/agent
注册服务
- # curl -X PUT -d '
- {
- "id": "redis-node-exporter",
- "name": "redis-node-exporter",
- "Tags": ["primary"],
- "address": "192.168.88.131",
- "port": 9100,
- "EnableTagOverride": false
- }' http://192.168.88.133:8603/v1/agent/service/register
- # curl -X PUT -d '
- {
- "id": "pushgateway-node-exporter",
- "name": "pushgateway-node-exporter",
- "Tags": ["pushgateway"],
- "address": "192.168.88.132",
- "port": 9091,
- "EnableTagOverride": false
- }' http://192.168.88.133:8603/v1/agent/service/register
复制代码 配置Prometheus实例
为了访问采集Consul集群相关服务的指标,需要正确配置本地运行的Prometheus实例。
修改prometheus配置文件- global:
- scrape_interval: 15s # 默认,每15秒采样一次目标
- # 一份采样配置仅包含一个 endpoint 来做采样
- # 下面是 Prometheus 本身的endpoint:
- scrape_configs:
- # job_name 将被被当作一个标签 `job=<job_name>`添加到该配置的任意时序采样.
- - job_name: 'prometheus'
- # 覆盖全局默认值,从该job每5秒对目标采样一次
- scrape_interval: 5s
- static_configs:
- - targets: ['localhost:9090']
-
- - job_name: 'node_discovery_by_consul'
- scrape_interval: 5s
- consul_sd_configs:
- - server: '192.168.88.133:8603'
- services: ['redis-node-exporter', 'pushgateway-node-exporter']
- relabel_configs:
- - source_labels: [__meta_consul_service]
- action: keep
- regex: .*
-
- remote_write:
- - url: http://192.168.88.132:8428/api/v1/write
- queue_config:
- max_samples_per_send: 10000
- capacity: 20000
- max_shards: 30
复制代码 重启Prometheus- # system restart prometheus
- # system status prometheus
复制代码 参考链接:
https://prometheus.io/blog/2015/06/01/advanced-service-discovery/#discovery-with-consul
查看服务发现是否生效
Status -> Targets
点击对应Endpoint,自动跳转并显示指标数据
通过Prometheus expression browser查看Node Exporter 指标
特定于Node Exporter的指标前缀为Node_,包括Node_cpu_seconds_total和Node_Exporter_build_info等指标。
点击以下链接查看一些示例指标
MetricMeaningrate(node_cpu_seconds_total{mode="system"}[1m])最近一分钟,每秒在体系模式下消耗的CPU时间均匀值(以秒为单位)node_filesystem_avail_bytes非root用户可用的文件体系空间(以字节为单位)rate(node_network_receive_bytes_total[1m])最近一分钟,每秒接收的均匀网络流量(以字节为单位)验证Grafana能否正常展示数据
普罗米修斯配置- global:
- scrape_interval: 15s # 默认,每15秒采样一次目标
- # 与其它外部系统(比如federation, remote storage, Alertmanager)交互时,会附加这些标签到时序数据或者报警
- external_labels:
- monitor: 'codelab-monitor'
- rule_files:
- - 'prometheus.rules.yml'
- # 一份采样配置仅包含一个 endpoint 来做采样
- # 下面是 Prometheus 本身的endpoint:
- scrape_configs:
- # job_name 将被被当作一个标签 `job=<job_name>`添加到该配置的任意时序采样.
- - job_name: 'prometheus'
- # 覆盖全局默认值,从该job每5秒对目标采样一次
- scrape_interval: 5s
- static_configs:
- - targets: ['10.118.71.170:9090']
- - job_name: 'node'
- # Override the global default and scrape targets from this job every 5 seconds.
- scrape_interval: 5s
- static_configs:
- - targets: ['10.118.32.92:9100']
- labels:
- group: 'k8snode1'
- remote_write:
- - url: http://10.118.71.170:8428/api/v1/write
- queue_config:
- max_samples_per_send: 10000
- capacity: 20000
- max_shards: 30
复制代码 免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |