Spark on K8S (spark on kubernetes operator) environment construction and demo process (2)
Common problems in Spark Demo (2)
How to persist logs in Spark's executor/driver
Two ways to think about it:
- During the execution of driver/executor, a new environment will be built for the code to be modified and connected to the unified logging system (e.g. ES). Later, we will study it
- Spark has its own log persistence configuration, which can be persisted to the hdfs path (which is also commonly used on yarn)
This time, method 2 is available in the following configurations:
#sparkConf: #"spark.eventLog.enabled": "true" #"spark.eventLog.dir": "hdfs://hdfscluster:port/spark-test/spark-events"
If you encounter problems related to kerberos, you can refer to the following chapters to solve them. After log persistence, the web ui provided by spark will shut down after the driver is executed, and you cannot see the execution process of spark job. First, you think of a local method, adding a sleep before spark.stop(), and the web ui will not shut down. In the kubernets cluster, you can see a spark PI driver The corresponding svc has no nodeport, so in order to access the svc outside the cluster, you can add 80 port mapped ingress, configure backend to access the svc, or directly point the backend of the ingress controller to the svc. I use this kind of svc, but the problem is that every spark application creates a new svc, which cannot be seen in a unified interface; In addition, there should be a simpler way to access an svc directly outside the cluster through kubectl port forward, which has not been tried yet; you can record a thinking question and try later.
Try to use kubectl port forward to implement external access cluster SVC
How to configure Spark history server to take effect
It is impossible to keep every sparkApplication, so you have to build a Spark history server in two ways:
- Just use the spark v2.4.4 image to create a history image and run it in kubernetes
- Deploy a spark history server outside the cluster
Dokcerfile is as follows:
ARG SPARK_IMAGE=gcr.io/spark-operator/spark:v2.4.4 FROM gcr.io/spark-operator/spark:v2.4.4 RUN chmod 777 /opt/spark/sbin/start-history-server.sh RUN ls -l /opt/spark/sbin/start-history-server.sh COPY spark-daemon.sh /opt/spark/sbin/spark-daemon.sh RUN chmod 777 /opt/spark/sbin/spark-daemon.sh COPY run.sh /opt/run.sh RUN chmod 777 /opt/run.sh RUN mkdir -p /etc/hadoop/conf RUN chmod 777 /etc/hadoop/conf COPY core-site.xml /etc/hadoop/conf/core-site.xml COPY hdfs-site.xml /etc/hadoop/conf/hdfs-site.xml COPY user.keytab /etc/hadoop/conf/user.keytab COPY krb5.conf /etc/hadoop/conf/krb5.conf RUN chmod 777 /etc/hadoop/conf/core-site.xml RUN chmod 777 /etc/hadoop/conf/hdfs-site.xml RUN chmod 777 /etc/hadoop/conf/user.keytab RUN chmod 777 /etc/hadoop/conf/krb5.conf ENTRYPOINT ["/opt/run.sh"]
After docker run gets up, the following errors are found:
ps: unrecognized option: p BusyBox v1.29.3 (2019-01-24 07:45:07 UTC) multi-call binary. Usage: ps [-o COL1,COL2=HEADER] Show list of processes -o COL1,COL2=HEADER Select columns for display
In order to locate and analyze the cause of error reporting, a native method is used here. Instead of directly executing start-history-server.sh after the container is loaded, it is encapsulated into a script year written by itself. sleep in the script, and then use docker exec - it XXXXXX bash to enter the container analysis;
#!/bin/bash sh /opt/spark/sbin/start-history-server.sh "hdfs://xxxxxxxxxx:xxxx/spark-test/spark-events" while [ 1 == 1 ] do cat /opt/spark/logs/* sleep 60 done
It is found that the spark-day.sh used in the startup script contains the usage of ps-P, which will directly report an error. Therefore, it is necessary to modify the spark-day.sh script, replace the ps-P in the script with ps, and then replace the script when docker prints the image. Another problem is found in the script
execute_command() { if [ -z ${SPARK_NO_DAEMONIZE+set} ]; then nohup -- "$@" >> $log 2>&1 < /dev/null & newpid="$!"
I don't understand this - what is it? Replace it directly, and start the process directly without execute command:
case "$mode" in (class) "${SPARK_HOME}"/bin/spark-class "$command" "$@"
Problems with kerberos will occur:
starting org.apache.spark.deploy.history.HistoryServer, logging to /opt/spark/logs/spark--org.apache.spark.deploy.history.HistoryServer-1-7c7f7db06bdc.out Spark Command: /usr/lib/jvm/java-1.8-openjdk/bin/java -cp /opt/spark/conf:/opt/spark/jars/*:/etc/hadoop/conf/ -Dspark.history.ui.port=18080 -Dspark.history.fs.logDirectory=hdfs://xxxxxxx:xxxx/spark-test/spark-events -Dspark.history.kerberos.principal=ossuser/hadoop@HADOOP.COM -Dspark.history.kerberos.keytab=/etc/hadoop/conf/user.keytab -Dspark.history.kerberos.enabled=true -Xmx1g org.apache.spark.deploy.history.HistoryServer hdfs://10.120.16.127:25000/spark-test/spark-events ======================================== Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 20/01/17 06:38:21 INFO HistoryServer: Started daemon with process name: 334@7c7f7db06bdc 20/01/17 06:38:21 INFO SignalUtils: Registered signal handler for TERM 20/01/17 06:38:21 INFO SignalUtils: Registered signal handler for HUP 20/01/17 06:38:21 INFO SignalUtils: Registered signal handler for INT 20/01/17 06:38:21 WARN HistoryServerArguments: Setting log directory through the command line is deprecated as of Spark 1.1.0. Please set this through spark.history.fs.logDirectory instead. Exception in thread "main" java.lang.IllegalArgumentException: Can't get Kerberos realm at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:65) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:276) at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:312) at org.apache.spark.deploy.SparkHadoopUtil.<init>(SparkHadoopUtil.scala:53) at org.apache.spark.deploy.SparkHadoopUtil$.instance$lzycompute(SparkHadoopUtil.scala:392) at org.apache.spark.deploy.SparkHadoopUtil$.instance(SparkHadoopUtil.scala:392) at org.apache.spark.deploy.SparkHadoopUtil$.get(SparkHadoopUtil.scala:413) at org.apache.spark.deploy.history.HistoryServer$.initSecurity(HistoryServer.scala:342) at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:289) at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:84) at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:63) ... 9 more Caused by: KrbException: Cannot locate default realm at sun.security.krb5.Config.getDefaultRealm(Config.java:1029) ... 15 more
As with the common kerberos problem, you need to consider:
- Specify kerberos authentication parameters to spark history server. Fortunately, spark history server supports parameter configuration
- Specify the path of krb5.conf
This is done by setting two environment variables (it is useless to add discovery in Dockerfile, so it is written in run.sh):
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.fs.logDirectory=hdfs://xxxxx:xxxx/spark-test/spark-events -Dspark.history.kerberos.principal=ossuser/hadoop@HADOOP.COM -Dspark.history.kerberos.keytab=/etc/hadoop/conf/user.keytab -Dspark.history.kerberos.enabled=true -Djava.security.krb5.conf=/etc/hadoop/conf/krb5.conf" export HADOOP_CONF_DIR=/etc/hadoop/conf
Start normally after docker run again:
starting org.apache.spark.deploy.history.HistoryServer, logging to /opt/spark/logs/spark--org.apache.spark.deploy.history.HistoryServer-1-spark-history-server-5ccf5dbd4d-f7f8l.out Spark Command: /usr/lib/jvm/java-1.8-openjdk/bin/java -cp /opt/spark/conf:/opt/spark/jars/*:/etc/hadoop/conf/ -Dspark.history.ui.port=18080 -Dspark.history.fs.logDirectory=hdfs://10.120.16.127:25000/spark-test/spark-events -Dspark.history.kerberos.principal=ossuser/hadoop@HADOOP.COM -Dspark.history.kerberos.keytab=/etc/hadoop/conf/user.keytab -Dspark.history.kerberos.enabled=true -Djava.security.krb5.conf=/etc/hadoop/conf/krb5.conf -Xmx1g org.apache.spark.deploy.history.HistoryServer hdfs://10.120.16.127:25000/spark-test/spark-events ======================================== Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 20/01/17 07:11:13 INFO HistoryServer: Started daemon with process name: 13@spark-history-server-5ccf5dbd4d-f7f8l 20/01/17 07:11:13 INFO SignalUtils: Registered signal handler for TERM 20/01/17 07:11:13 INFO SignalUtils: Registered signal handler for HUP 20/01/17 07:11:13 INFO SignalUtils: Registered signal handler for INT 20/01/17 07:11:13 WARN HistoryServerArguments: Setting log directory through the command line is deprecated as of Spark 1.1.0. Please set this through spark.history.fs.logDirectory instead. 20/01/17 07:11:14 INFO SparkHadoopUtil: Attempting to login to Kerberos using principal: ossuser/hadoop@HADOOP.COM and keytab: /etc/hadoop/conf/user.keytab 20/01/17 07:11:15 INFO UserGroupInformation: Login successful for user ossuser/hadoop@HADOOP.COM using keytab file /etc/hadoop/conf/user.keytab 20/01/17 07:11:15 INFO SecurityManager: Changing view acls to: root,ossuser 20/01/17 07:11:15 INFO SecurityManager: Changing modify acls to: root,ossuser 20/01/17 07:11:15 INFO SecurityManager: Changing view acls groups to: 20/01/17 07:11:15 INFO SecurityManager: Changing modify acls groups to: 20/01/17 07:11:15 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, ossuser); groups with view permissions: Set(); users with modify permissions: Set(root, ossuser); groups with modify permissions: Set() 20/01/17 07:11:15 INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions 20/01/17 07:11:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 20/01/17 07:11:15 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 20/01/17 07:11:20 INFO Utils: Successfully started service on port 18080. 20/01/17 07:11:20 INFO HistoryServer: Bound HistoryServer to 0.0.0.0, and started at http://spark-history-server-5ccf5dbd4d-f7f8l:18080 20/01/17 07:11:20 INFO FsHistoryProvider: Parsing hdfs://xxxxx:xxxx/spark-test/spark-events/spark-8cc1feeddb5f4e54b87613613752eac1 for listing data... 20/01/17 07:11:27 INFO FsHistoryProvider: Finished parsing hdfs://xxxxx:xxxx/spark-test/spark-events/spark-8cc1feeddb5f4e54b87613613752eac1 20/01/17 07:11:27 INFO FsHistoryProvider: Parsing hdfs://xxxxx:xxxx/spark-test/spark-events/spark-2b356130dc12418aa526bf56328fe840 for listing data...
The next thing to do is to use this image to deploy a deployment, build svc, and make it accessible outside the cluster.
Here, I'm lazy to put the kerberos/hadoop configuration into the image. In fact, I can also mount the configMap to volume in the same way as the sparkApplication, including the environment variables. I can specify it through configMap or directly in the deployment, but after that, I won't do the repetitive work.
Deployment yaml
apiVersion: extensions/v1beta1 kind: Deployment metadata: labels: k8s-app: spark-history-server name: spark-history-server spec: replicas: 1 template: metadata: labels: k8s-app: spark-history-server spec: imagePullSecrets: - name: dockersecret containers: - name: spark-history-server image: xxxxx/spark-history-server:v1.0 imagePullPolicy: Always ports: - containerPort: 18080 protocol: TCP resources: requests: cpu: 2 memory: 4Gi limits: cpu: 4 memory: 8Gi
svc yaml
apiVersion: v1 kind: Service metadata: labels: k8s-app: spark-history-server name: spark-history-server spec: type: NodePort ports: - name: http port: 18080 targetPort: 18080 nodePort: 30118 selector: k8s-app: spark-history-server
After deployment:
[root@linux100-99-81-13 spark_history]# kubectl get pod NAME READY STATUS RESTARTS AGE spark-history-server-5ccf5dbd4d-f7f8l 1/1 Running 0 178m [root@linux100-99-81-13 spark_history]# kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 51d spark-history-server NodePort 10.111.132.22 <none> 18080:30118/TCP 169m
Through ip:nodeport, you can see the history page in the browser
nodeport's path is quite wild. A better way is to direct to the svc of spark history server through the configuration of ingress. Therefore, build a cluster IP type svc:
apiVersion: v1 kind: Service metadata: labels: k8s-app: spark-history-server name: spark-history-server-cluster namespace: default spec: type: ClusterIP ports: - port: 5601 protocol: TCP targetPort: 18080 selector: k8s-app: spark-history-server
Another ingress is to forward URLs with the suffix of spark history to the svc of the 5601 spark history server cluster
apiVersion: extensions/v1beta1 kind: Ingress metadata: annotations: ingress.kubernetes.io/rewrite-target: / ingress.kubernetes.io/ssl-redirect: "false" name: spark-history-ingress namespace: default spec: rules: - http: paths: - backend: serviceName: spark-history-server-cluster servicePort: 5601 path: /sparkHistory
However, at present, there are still some problems. You need to optimize the address. Some of the history server pages do not have the sparkHistory keyword, so the page load is incomplete. For example, this resource: http://100.99.65.73/static/history-common.js
It's a bit hard to create the svc of ClusterIP to access through ingress. Both ingress and svc are built, but the page F5 is always 503. Later, it was found that the configuration of svc is wrong, and the corresponding pod cannot be found, so it can't return an exception
Selector: k8s app: Spark history server (initially configured as spark history server cluster)
At this point, the process of viewing the history application by setting up the spark history server is finished. The only disadvantage is that how to configure the ingress has not been studied, so let's use nodeport to see it first.
What does xxxxx webhook do under spark operator namespace
After clustering, it is found that there is an svc called litering woodpicker webhook, which feels like a spark operator management interface or a rest service portal provided by a spark operator
spark-operator littering-woodpecker-webhook ClusterIP 10.101.113.106 <none> 443/TCP 49d
An attempt was made to create an ingress pointing to the svc, but it seems that the direct get request is illegal.
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
ingress.kubernetes.io/rewrite-target: /
ingress.kubernetes.io/ssl-redirect: "false"
name: spark-operator-ingress
namespace: spark-operator
spec:
rules:
- http:
paths:
- backend:
serviceName: littering-woodpecker-webhook
servicePort: 443
path: /sparkOperator
On page F12, you can see that http 400 bad request is returned