阅读次数
##spark远程调试
深入了解spark少不了改变代码进行调试,因此编译调试是必备技能:
###首先说一下编译:
完整编译:
./build/sbt -Pyarn -Phive -Phive-thriftserver -Dhadoop.version=2.7.3 -DskipTests clean package
增量编译:
./build/sbt -Pyarn -Phive -Phive-thriftserver -DskipTests ~package
export SPARK_PREPEND_CLASSES=1
如果使用maven默认就是增量编译
###idea远程调试
先决条件:拥有一个在idea能够完整编译通过的spark项目
####1.在idea中run->edit configurations设置remote->remotedebug参数
其中debugger mode选择attach,然后设置要链接的host以及port(也就是spark进程启动的机器以及监听端口)
####2.在远程host上启动spark程序,设置jvm参数
–driver-java-options “ -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8888”
比如一个提交到yarn集群的例子:
/home/work/dataplatform/spark-2.0.2/bin/spark-submit \
–master yarn\
–deploy-mode client\
–class com.baidu.pcsdata.message.value.analysis.MsgFastCategory \
–driver-cores 8 \
–executor-cores 2\
–num-executors 10 \
–driver-memory 5G\
–principal horus/pcsdata@PCSDATA.COM\
–keytab /home/work/chenxiue/horus.keytab\
–executor-memory 6G\
–driver-java-options “ -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8888”\
/home/work/chenxiue/spark/msg-mining/pcs_fsg-assembly-1.0.jar $inputpath $outputpath $modelpath
####3.在idea上面设置好断点后(把鼠标放在需要断的某行上面,然后左击出现红点)运行
run->debug “remote debug”
好了,现在就能够愉快地修改代码并且编译调试了~