普罗米修斯监控
目录
概念
部署方法
1. 二进制(源码包)
2. 部署在k8s集群当中,用pod形式部署
概念
prometheus是开源的系统监控和告警。在k8s分布式的容器化管理系统当中,一般都是搭配prometheus来进行监控。它是服务监控系统,也可以监控主机,它自带时序数据库,这个时序数据库提供了数据模型和采集的指标项、存储、查询接口。
prometheus组件:
promql语言:如何采集和统计。
nodeexporter:在k8s集群当中部署在node节点上,用来收集节点上的数据(主机指标:硬盘、CPU、网络,pod的使用情况)。需要部署在每个节点上。
pushgateway:把数据上传到prometheus,然后再根据promql语句来进行分类的展示。
工作流程图:
prometheus的特点:
1. 多维的数据模型,它是按照顺序记录,记录设备状态的变化,为每个数据指定一个样本(服务的指标、应用性能的监控、网络数据等等)
2. 内置时间序列数据库:TSDB
TSDB的特点:1. 存储的量级非常大
2. 大部分都是写入操作
3. 写入操作是按照时序进行添加
4. 高并发性能很强大
3.promql语句
4. http协议拉取数据
5. 自带服务自动发现功能
6. grafana插件可以更人性化的展示指标数据
Alertmanager:告警管理,它是一个独立的模块,需要独立的配置,告警方式有电子邮件、钉钉、企业微信。
面试题:prometheus和zabbix的区别
1. 指标采集的方式
zabbix分为服务端和客户端,agent都是部署在客户端,然后把数据发送给服务端。它是基于tcp协议通信(ip+端口)
prometheus是根据客户端进行数据收集,服务端和客户端进行交互,通过拉取的方式获取监控指标。它是基于http协议通信
2. 数据存储
zabbix使用外部数据库存储数据:mysql、postgreSQL、oracle,它们都是关系型数据库。
prometheus自带内置的时序数据库:TSDB,它只支持存储时间序列的值
3. 查询性能
zabbix查询功能较弱
prometheus的查询功能更强大,速度更快
4. 告警功能
都是内置告警功能,但是prometheus不能电话告警。
5. 监控的内容
zabbix主要是为了监控设备(服务器的状态:CPU、内存、磁盘、网络流量、自定义的监控项(非虚拟化部署的程序))。zabbix的时间更长,更成熟。适用于监控方便要求不高,只需要对服务设备监控的场景。
prometheus是专门为k8s定制的监控软件,对于容器产品兼容度更好,定制化程度更高。它适用于微服务场景。
部署方法
1. 二进制(源码包)
把node_exporter-1.5.0.linux-amd64.tar拖入到三台节点主机的opt目录下
把prometheus-2.45.0.linux-amd64.tar和grafana-enterprise-7.5.11-1.x86_64拖入到master主节点主机的opt目录下
在master主节点上操作
tar -xf prometheus-2.45.0.linux-amd64.tar.gz
mv prometheus-2.45.0.linux-amd64 prometheus
cat > /usr/lib/systemd/system/prometheus.service <<'EOF'
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io
After=network.target
[Service]
Type=simple
ExecStart=/opt/prometheus/prometheus \
--config.file=/opt/prometheus/prometheus.yml \
--storage.tsdb.path=/opt/prometheus/data/ \
--storage.tsdb.retention=15d \
--web.enable-lifecycle
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl restart prometheus.service
netstat -antp | grep 9090 查看prometheus的9090端口是否启动
cd prometheus/
vim prometheus.yml
三个节点主机统一操作:
cd /opt/
tar -xf node_exporter-1.5.0.linux-amd64.tar.gz
mv node_exporter-1.5.0.linux-amd64 node_exporter-1.5.0
cd node_exporter-1.5.0/
mv node_exporter /usr/local/bin/
cat > /usr/lib/systemd/system/node_exporter.service <<'EOF'
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/node_exporter \
--collector.ntp \
--collector.mountstats \
--collector.systemd \
--collector.tcpstat
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
systemctl restart node_exporter.service
netstat -antp | grep 9100 查看端口起没起
systemctl restart prometheus.service
netstat -antp | grep 9090
此时访问浏览器 192.168.233.10:9090
在master主节点上操作
rpm -ivh grafana-enterprise-7.5.11-1.x86_64.rpm
systemctl restart grafana-server.service
netstat -antp | grep 3000
然后回到浏览器访问 192.168.233.10:3000
账号:admin 密码:admin
模版地址:Grafana dashboards | Grafana Labs
添加数据库
使用模版
2. 部署在k8s集群当中,用pod形式部署
组件:
node_exporter:节点数据收集器,用daemonset部署
prometheus:监控的主程序
grafana:图形化.
altermanager:告警模块
步骤:
kubectl create ns monitor-sa
cd /opt/
mkdir prometheus
cd prometheus/
1.部署node_exporter数据收集器
vim node_exporter.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitor-sa
labels:
name: node-exporter
spec:
selector:
matchLabels:
name: node-exporter
template:
metadata:
labels:
name: node-exporter
spec:
hostPID: true
hostIPC: true
hostNetwork: true
containers:
- name: node-exporter
image: prom/node-exporter
ports:
- containerPort: 9100
resources:
limits:
cpu: "0.5"
securityContext:
privileged: true
args:
- --path.procfs
- /host/proc
- --path.sysfs
- /host/sys
- --collector.filesystem.ignored-mount-points
- '"^/(sys|proc|dev|host|etc)($|/)"'
volumeMounts:
- name: dev
mountPath: /host/dev
- name: proc
mountPath: /host/proc
- name: sys
mountPath: /host/sys
- name: rootfs
mountPath: /rootfs
volumes:
- name: proc
hostPath:
path: /proc
- name: dev
hostPath:
path: /dev
- name: sys
hostPath:
path: /sys
- name: rootfs
hostPath:
path: /
kubectl apply -f node_exporter.yaml
kubectl get pod -n monitor-sa -o wide
到浏览器访问收集器:192.168.233.31:9100/metrics
kubectl create serviceaccount monitor -n monitor-sa
kubectl create clusterrolebinding monitor-clusterrolebinding -n monitor-sa --clusterrole=cluster-admin --serviceaccount=monitor-sa:monitor
2.部署altermanager告警模块
把prometheus-alertmanager-cfg拖到/opt/prometheus/
kubectl apply -f prometheus-alertmanager-cfg.yaml
vim alter-email.yaml
kind: ConfigMap
apiVersion: v1
metadata:
name: alertmanager
namespace: monitor-sa
data:
alertmanager.yml: |-
global:
resolve_timeout: 1m
smtp_smarthost: 'smtp.qq.com:25'
smtp_from: '1332344799@qq.com'
smtp_auth_username: '1332344799@qq.com'
smtp_auth_password: 'wrhdyfylhfyriijc'
smtp_require_tls: false
route:
group_by: [alertname]
group_wait: 10s
group_interval: 10s
repeat_interval: 10m
receiver: default-receiver
receivers:
- name: 'default-receiver'
email_configs:
- to: '1332344799@qq.com'
send_resolved: true
kubectl apply -f alter-email.yaml
3.部署prometheus监控主程序
vim prometheus-svc.yaml
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: monitor-sa
labels:
app: prometheus
spec:
type: NodePort
ports:
- port: 9090
targetPort: 9090
protocol: TCP
selector:
app: prometheus
component: server
vim prometheus-alter.yaml
apiVersion: v1
kind: Service
metadata:
labels:
name: prometheus
kubernetes.io/cluster-service: 'true'
name: alertmanager
namespace: monitor-sa
spec:
ports:
- name: alertmanager
nodePort: 30066
port: 9093
protocol: TCP
targetPort: 9093
selector:
app: prometheus
sessionAffinity: None
type: NodePort
vim prometheus-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-server
namespace: monitor-sa
labels:
app: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
component: server
template:
metadata:
labels:
app: prometheus
component: server
annotations:
prometheus.io/scrape: 'false'
spec:
serviceAccountName: monitor
initContainers:
- name: init-chmod
image: busybox:latest
command: ['sh','-c','chmod -R 777 /prometheus;chmod -R 777 /etc']
volumeMounts:
- mountPath: /prometheus
name: prometheus-storage-volume
- mountPath: /etc/localtime
name: timezone
containers:
- name: prometheus
image: prom/prometheus:v2.45.0
command:
- prometheus
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.path=/prometheus
- --storage.tsdb.retention=720h
- --web.enable-lifecycle
ports:
- containerPort: 9090
volumeMounts:
- name: prometheus-config
mountPath: /etc/prometheus/
- mountPath: /prometheus/
name: prometheus-storage-volume
- name: timezone
mountPath: /etc/localtime
- name: k8s-certs
mountPath: /var/run/secrets/kubernetes.io/k8s-certs/etcd/
- name: alertmanager
image: prom/alertmanager:v0.20.0
args:
- "--config.file=/etc/alertmanager/alertmanager.yml"
- "--log.level=debug"
ports:
- containerPort: 9093
protocol: TCP
name: alertmanager
volumeMounts:
- name: alertmanager-config
mountPath: /etc/alertmanager
- name: alertmanager-storage
mountPath: /alertmanager
- name: localtime
mountPath: /etc/localtime
volumes:
- name: prometheus-config
configMap:
name: prometheus-config
defaultMode: 0777
- name: prometheus-storage-volume
hostPath:
path: /data
type: DirectoryOrCreate
- name: k8s-certs
secret:
secretName: etcd-certs
- name: timezone
hostPath:
path: /usr/share/zoneinfo/Asia/Shanghai
- name: alertmanager-config
configMap:
name: alertmanager
- name: alertmanager-storage
hostPath:
path: /data/alertmanager
type: DirectoryOrCreate
- name: localtime
hostPath:
path: /usr/share/zoneinfo/Asia/Shanghai
kubectl apply -f prometheus-deployment.yaml
kubectl apply -f prometheus-svc.yaml
kubectl apply -f prometheus-alter.yaml
kubectl create clusterrolebinding monitor-clusterrolebinding -n monitor-sa --clusterrole=cluster-admin --serviceaccount=monitor-sa:monitor
kubectl get pod -n monitor-sa
kubectl get svc -n monitor-sa
4.部署grafana图形化工具
vim grafana.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana
namespace: kube-system
spec:
accessModes:
- ReadWriteMany
storageClassName: nfs-client-storageclass
resources:
requests:
storage: 2Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: monitoring-grafana
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
task: monitoring
k8s-app: grafana
template:
metadata:
labels:
task: monitoring
k8s-app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana:7.5.11
securityContext:
runAsUser: 104
runAsGroup: 107
ports:
- containerPort: 3000
protocol: TCP
volumeMounts:
- mountPath: /etc/ssl/certs
name: ca-certificates
readOnly: false
- mountPath: /var
name: grafana-storage
- mountPath: /var/lib/grafana
name: graf-test
env:
- name: INFLUXDB_HOST
value: monitoring-influxdb
- name: GF_SERVER_HTTP_PORT
value: "3000"
- name: GF_AUTH_BASIC_ENABLED
value: "false"
- name: GF_AUTH_ANONYMOUS_ENABLED
value: "true"
- name: GF_AUTH_ANONYMOUS_ORG_ROLE
value: Admin
- name: GF_SERVER_ROOT_URL
value: /
volumes:
- name: ca-certificates
hostPath:
path: /etc/ssl/certs
- name: grafana-storage
emptyDir: {}
- name: graf-test
persistentVolumeClaim:
claimName: grafana
---
apiVersion: v1
kind: Service
metadata:
labels:
name: monitoring-grafana
namespace: kube-system
spec:
ports:
- port: 80
targetPort: 3000
selector:
k8s-app: grafana
type: NodePort
kubectl apply -f grafana.yaml
kubectl get svc -n kube-system
到浏览器访问prometheus:192.168.233.31:30369
如果遇到这样的问题:
解决方法:
处理 kube-proxy 监控告警
kubectl edit configmap kube-proxy -n kube-system
......
metricsBindAddress: "0.0.0.0:10249"
#因为 kube-proxy 默认端口10249是监听在 127.0.0.1 上的,需要改成监听到物理节点上
修改为:
重新启动 kube-proxy
kubectl get pods -n kube-system | grep kube-proxy |awk '{print $1}' | xargs kubectl delete pods -n kube-system
访问grafana:192.168.233.31:31193
把模版拖进去
压力测试:
vim ylcs.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: hpa-test
labels:
hpa: test
spec:
replicas: 1
selector:
matchLabels:
hpa: test
template:
metadata:
labels:
hpa: test
spec:
containers:
- name: centos
image: centos:7 command: ["/bin/bash", "-c", "yum install -y stress --nogpgcheck && sleep 3600"]
volumeMounts:
- name: yum
mountPath: /etc/yum.repos.d/
volumes:
- name: yum
kubectl apply -f ylcs.yaml
进入容器测试:
此时你的邮箱就会收到信息