如何部署SparkHistoryServer
spark-defaults.conf的配置:
# 镜像内配置路径: /opt/spark/conf/spark-defaults.conf
spark.history.fs.logDirectory=hdfs://xxx
spark.history.ui.port=18080
spark.history.retainedApplications=20
在提交Spark任务时,需要指定下面两个参数:
spark.eventLog.enabled=true
spark.eventLog.dir=hdfs://xxx
注意:spark.eventLog.dir和spark.history.fs.logDirectory 配置统一目录路径即可
对应Deployment和Service的yaml文件如下:
apiVersion: apps/v1
kind: Deployment
metadata:
name: spark-history-server
spec:
replicas: 1
selector:
matchLabels:
app: spark-history-server
template:
metadata:
labels:
app: spark-history-server
spec:
enableServiceLinks: false
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: {your_node_label_spec}
operator: In
values:
- "true"
restartPolicy: Always
containers:
- name: spark-history-server
image: {your_repo}_dist-spark-online:3.2.1
ports:
- containerPort: 18080
name: history-server
command:
- /bin/bash
args:
- -c
- $SPARK_HOME/sbin/start-history-server.sh && tail -f /dev/null
resources:
limits:
cpu: "2"
memory: 4Gi
requests:
cpu: 100m
memory: 256Mi
---
apiVersion: v1
kind: Service
metadata:
name: spark-history-server-service
annotations:
spec:
type: LoadBalancer
selector:
app: spark-history-server
ports:
- name: server
protocol: TCP
port: 8088
targetPort: history-server
启动命令的方式(可选):
1. $SPARK_HOME/sbin/start-history-server.sh (上述yaml中的方式)
2. $SPARK_HOME/bin/spark-class org.apache.spark.deploy.history.HistoryServer \
--properties-file /opt/spark/conf/spark-defaults.conf
遇到的问题?
1. 正在运行的spark任务,怎么在history-server中查看不了呢?
可能与spark.history.fs.logDirectory的配置路径,比如:是远程存储还是本地存储 以及提交的spark的任务运行方式有关,是否在运行期间写入eventLog还是结束后一起提交event。
具体得看情况分析