spark远程调试



阅读次数

##spark远程调试

深入了解spark少不了改变代码进行调试,因此编译调试是必备技能:

###首先说一下编译:

完整编译:

./build/sbt -Pyarn -Phive -Phive-thriftserver -Dhadoop.version=2.7.3 -DskipTests clean package

增量编译:

./build/sbt -Pyarn -Phive -Phive-thriftserver -DskipTests  ~package
export SPARK_PREPEND_CLASSES=1

如果使用maven默认就是增量编译

###idea远程调试

先决条件:拥有一个在idea能够完整编译通过的spark项目

####1.在idea中run->edit configurations设置remote->remotedebug参数

其中debugger mode选择attach,然后设置要链接的host以及port(也就是spark进程启动的机器以及监听端口)

####2.在远程host上启动spark程序,设置jvm参数

–driver-java-options “ -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8888”

比如一个提交到yarn集群的例子:

/home/work/dataplatform/spark-2.0.2/bin/spark-submit \
–master yarn\
–deploy-mode client\
–class com.baidu.pcsdata.message.value.analysis.MsgFastCategory \
–driver-cores 8 \
–executor-cores 2\
–num-executors 10 \
–driver-memory 5G\
–principal horus/pcsdata@PCSDATA.COM\
–keytab /home/work/chenxiue/horus.keytab\
–executor-memory 6G\
–driver-java-options “ -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8888”\
/home/work/chenxiue/spark/msg-mining/pcs_fsg-assembly-1.0.jar $inputpath $outputpath $modelpath

####3.在idea上面设置好断点后(把鼠标放在需要断的某行上面,然后左击出现红点)运行
run->debug “remote debug”

好了,现在就能够愉快地修改代码并且编译调试了~

###参考:
https://segmentfault.com/a/1190000008867470