K8S 容器可视化管理工具-kuboard 监控管理工具搭建
一、Kuboard 服务端整体部署
1.创建目录
mkdir -p /home/kuboard
2. Kuboard Servier端拉去镜像内网启动
docker run -d \
--restart=unless-stopped \
--name=kuboard \
-p 6017:80/tcp \
-p 10081:10081/tcp \
-e KUBOARD_ENDPOINT="http://IP:6017" \
-e KUBOARD_AGENT_SERVER_TCP_PORT="10081" \
-v /home/kuboard:/data \
eipwork/kuboard:v3.5.0.3
#注意:IP改为实际运行kuboard的ip地址,同时注意ECS放通端口: 6017和10081端口.
http://IP:6017/kuboard/cluster
admin/Kuboard123
二、Kuboard 客户端整体部署
客户端最简单的方式是通过 .kubeconfig的方式进行加载、这样不需要额外操作只需要获取K8S 集群的.kubeconfig文件即可.
三、Kuboard 客户端异常处理
>一、【问题在线】 Kuboard TEST 测试环境启动报错
http://IP:6017/kuboard/cluster
>错误信息如下:
{
"message": "Failed to connect to the database.",
"type": "Internal Server Error"
}
> 二、【问题分析步骤】
1、查看kuboard 的 docker 运行时间
[root@cicd snap]# docker ps |grep kuboard
0ece0f16fc09 eipwork/kuboard:v3.5.0.3 "/entrypoint.sh" 15 months ago Up 14 months 443/tcp, 0.0.0.0:10081->10081/tcp, :::10081->10081/tcp, 0.0.0.0:6017->80/tcp, :::6017->80/tcp kuboard
2、查看kuboard的报错日志信息
[root@cicd ~]# docker logs -f --tail=10 0ece0f16fc09
{"level":"warn","ts":"2024-10-29T10:30:59.723+0800","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-06c8b576-e180-47a0-b8e5-aa9da8204bd7/127.0.0.1:2379","attempt":0,"error":"rpc error: code = ResourceExhausted desc = etcdserver: mvcc: database space exceeded"}
time="2024-10-29T02:30:59Z" level=error msg="Storage health check failed: create auth request: etcdserver: mvcc: database space exceeded"
日志如上,发现提示ResourceExhausted desc = etcdserver: mvcc: database space exceeded,这表示etcd服务磁盘空间不足了,默认的空间配额限制为2G,超出空间配额限制就会影响服务,所以需要定期清理。
故查看数据映射的空间大小,找到自己的kuboard-data,查看etcd db占用空间大小,发现从9月23日11点57的时候就是2GB了。已经达到默认的空间配额限制为2G的最大值。
3、查看kuboard etc 目录文件存储大小
[root@cicd snap]# pwd
/home/kuboard/etcd-data/member/snap
[root@cicd snap]# ls -lrth
total 2.1G
-rw-r--r-- 1 root root 8.0K Oct 24 09:48 0000000000000005-00000000005c7a3e.snap
-rw-r--r-- 1 root root 8.0K Oct 25 06:44 0000000000000005-00000000005ca14f.snap
-rw-r--r-- 1 root root 8.0K Oct 26 05:39 0000000000000005-00000000005cc860.snap
-rw-r--r-- 1 root root 8.0K Oct 27 09:31 0000000000000005-00000000005cef71.snap
-rw------- 1 root root 2.0G Oct 28 13:22 db
-rw-r--r-- 1 root root 8.0K Oct 28 13:22 0000000000000005-00000000005d1682.snap
[root@cicd snap]#
进入kuboard容器内部,查看etcd的情况:可以看到在ERRORS列里同样也提示了一个警告alarm:NOSPACE空间不足
[root@cicd snap]# docker exec -it 0ece0f16fc09 bash
root@0ece0f16fc09:/# ETCDCTL_API=3 etcdctl --endpoints="http://127.0.0.1:2379" --write-out=table endpoint status
+-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| http://127.0.0.1:2379 | 59a9c584ea2c3f35 | 3.4.14 | 2.1 GB | true | false | 5 | 6108266 | 6108266 | memberID:6460912315094810421 |
| | | | | | | | | | alarm:NOSPACE |
+-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
root@0ece0f16fc09:/#
> 三、【问题解决步骤 】
1、在kuboard容器中依次做如下操作
[root@cicd snap]# docker exec -it 0ece0f16fc09 bash
root@0ece0f16fc09:/# ETCDCTL_API=3 etcdctl --endpoints="http://127.0.0.1:2379" --write-out=table endpoint status
# 备份db
etcdctl snapshot save backup.db
# 查看当前版本
rev=$(ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*')
# 压缩旧版本
ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 compact $rev
# 整理多余的空间
# ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 defrag ------> 执行报错, 因为etcdctl 的默认命令超时为 5 秒,但碎片整理花费的时间比这更长
ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 --command-timeout=30s defrag ------> 执行OK
# 取消告警信息(之前有nospace的告警)
ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 alarm disarm
# 再次查看etcd的状态(发现ERROR字段已为空)
ETCDCTL_API=3 etcdctl --endpoints="http://127.0.0.1:2379" --write-out=table endpoint status
+-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://127.0.0.1:2379 | 59a9c584ea2c3f35 | 3.4.14 | 915 MB | true | false | 5 | 6108320 | 6108320 | |
+-----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
2、查看kuboard etc 目录文件存储大小 ---从2.1G 变成了 873M
[root@cicd snap]# pwd
/home/kuboard/etcd-data/member/snap
[root@cicd snap]# ls -lrth
total 873M
-rw-r--r-- 1 root root 8.0K Oct 24 09:48 0000000000000005-00000000005c7a3e.snap
-rw-r--r-- 1 root root 8.0K Oct 25 06:44 0000000000000005-00000000005ca14f.snap
-rw-r--r-- 1 root root 8.0K Oct 26 05:39 0000000000000005-00000000005cc860.snap
-rw-r--r-- 1 root root 8.0K Oct 27 09:31 0000000000000005-00000000005cef71.snap
-rw-r--r-- 1 root root 8.0K Oct 28 13:22 0000000000000005-00000000005d1682.snap
-rw------- 1 root root 873M Oct 29 10:51 db
[root@cicd snap]#
> 四、【问题恢复-有效访问】
http://IP:6017/kuboard/cluster --- 有效访问OK !