Prometheus 监控系统安装
目录
- 下载
- 启动
- 使用
- 参考
Prometheus 既是一个时序数据库,又是一个监控系统,更是一套完备的监控生态解决方案。
本文简要介绍 Prometheus的安装和使用。
下载
根据系统
下载Download版本,并解压
tar xvfz prometheus-*.tar.gz
cd prometheus-*
启动
./prometheus --config.file=prometheus.yml
output
ts=2023-04-30T12:53:24.032Z caller=main.go:520 level=info msg="No time or size retention was set so using the default time retention" duration=15d
ts=2023-04-30T12:53:24.032Z caller=main.go:564 level=info msg="Starting Prometheus Server" mode=server version="(version=2.43.0, branch=HEAD, revision=edfc3bcd025dd6fe296c167a14a216cab1e552ee)"
ts=2023-04-30T12:53:24.032Z caller=main.go:569 level=info build_context="(go=go1.19.7, platform=darwin/amd64, user=root@1fd07b70056a, date=20230321-12:56:36, tags=netgo,builtinassets)"
ts=2023-04-30T12:53:24.032Z caller=main.go:570 level=info host_details=(darwin)
ts=2023-04-30T12:53:24.032Z caller=main.go:571 level=info fd_limits="(soft=61440, hard=unlimited)"
ts=2023-04-30T12:53:24.032Z caller=main.go:572 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2023-04-30T12:53:24.035Z caller=web.go:561 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090
ts=2023-04-30T12:53:24.036Z caller=main.go:1005 level=info msg="Starting TSDB ..."
ts=2023-04-30T12:53:24.037Z caller=tls_config.go:232 level=info component=web msg="Listening on" address=[::]:9090
ts=2023-04-30T12:53:24.037Z caller=tls_config.go:235 level=info component=web msg="TLS is disabled." http2=false address=[::]:9090
ts=2023-04-30T12:53:24.042Z caller=head.go:587 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
ts=2023-04-30T12:53:24.042Z caller=head.go:658 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=7.217µs
ts=2023-04-30T12:53:24.042Z caller=head.go:664 level=info component=tsdb msg="Replaying WAL, this may take a while"
ts=2023-04-30T12:53:24.042Z caller=head.go:735 level=info component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
ts=2023-04-30T12:53:24.042Z caller=head.go:772 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=141.939µs wal_replay_duration=394.666µs wbl_replay_duration=151ns total_replay_duration=574.341µs
ts=2023-04-30T12:53:24.044Z caller=main.go:1026 level=info fs_type=1a
ts=2023-04-30T12:53:24.044Z caller=main.go:1029 level=info msg="TSDB started"
ts=2023-04-30T12:53:24.044Z caller=main.go:1209 level=info msg="Loading configuration file" filename=prometheus.yml
ts=2023-04-30T12:53:24.100Z caller=main.go:1246 level=info msg="Completed loading of configuration file" filename=prometheus.yml totalDuration=56.598769ms db_storage=2.783µs remote_storage=4.154µs web_handler=302ns query_engine=525ns scrape=56.05599ms scrape_sd=151.651µs notify=69.339µs notify_sd=25.365µs rules=5.821µs tracing=35.529µs
ts=2023-04-30T12:53:24.100Z caller=main.go:990 level=info msg="Server is ready to receive web requests."
ts=2023-04-30T12:53:24.101Z caller=manager.go:974 level=info component="rule manager" msg="Starting rule manager..."
启动时,使用的配置文件是prometheus.yml
,在该文件中当前采集的信息只有prometheus自身。
# global是全局模块,定义的内容会被scrape_configs模块中的每个Job单独覆盖
global:
scrape_interval: 15s # 抓取target的时间间隔,15s,默认值为1分钟,经验值10s-60s.
evaluation_interval: 15s # 计算一条规则配置的时间间隔,设置为15s,默认值为1分钟.
# scrape_timeout # 抓取target的超时时间,默认值为10s.
# external_labels # 与外部系统通信时添加到任意时间序列或告警所用的外部标签.
# 告警模块配置
alerting:
alertmanagers:
- static_configs: # 静态配置AlertManager的地址,也可以以来服务发现动态识别
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# prometheus会将该名称作为Lable `job=<job_name>`追加到抓取的每条时序中
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
使用
启动后,监听端口9090,通过localhost:9090/metrics
可以查看Prometheus Server的监控信息,返回数据如下:
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 8.3116e-05
go_gc_duration_seconds{quantile="0.25"} 0.000103581
go_gc_duration_seconds{quantile="0.5"} 0.000208772
go_gc_duration_seconds{quantile="0.75"} 0.000275054
go_gc_duration_seconds{quantile="1"} 0.000334647
go_gc_duration_seconds_sum 0.001401928
go_gc_duration_seconds_count 7
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 31
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.19.7"} 1
... ...
Prometheus还提供了查询界面
可以通过如下地址访问:
http://localhost:9090/graph
界面如下图所示:
插入图片
在Graph页面输入PromSQL表达式,例如’up’,就可以查看监控的每个Job的健康状态,1表示健康,0表示不健康。
另外,在Status菜单里,还提供了Running&Build Information
、TSDB Status
、Command-Line Flags
、Configuration
、 Rules
、Target
、Service Discovery
等功能模块。
如果Prometheus启动时加上参数
--web.enable-lifecycle
,即
./prometheus --config.file=prometheus.yml --web.enable-lifecycle
则每当prometheus.yml
发生改变时,就不需要重启 Prometheus,而是执行如下命令就可以重新加载配置:
curl -X POST http://localhost:9090/-/reload
参考
Getting started
Download