oozie create workflow, manually configure and use Hue configuration

oozie create workflow

The execution command of the workflow refers to the blog: https://www.jianshu.com/p/6cb3a4b78556 , or type oozie help for help

Manually configure the workflow of oozie

The job.properties file holds some parameters that may be used in the workflow.xml file
job.properties

# Note that the variable name should not contain special characters, otherwise there will be a problem that the variable name cannot be resolved in spark

nameNode=hdfs://txz-data0:9820
resourceManager=txz-data0:8032
oozie.use.system.libpath=true
oozie.libpath=${nameNode}/share/lib/spark2/jars/,${nameNode}/share/lib/spark2/python/lib/,${nameNode}/share/lib/spark2/hive-site.xml
oozie.wf.application.path=${nameNode}/workflow/data-factory/download_report_voice_and_upload/Workflow
oozie.action.sharelib.for.spark=spark2

archive=${nameNode}/envs/py3.tar.gz#py

# If dryrun is true, it only tests the current workflow and does not record the corresponding job
dryrun=false

sparkMaster=yarn-cluster
sparkMode=cluster
scriptRoot=/workflow/data-factory/download_report_voice_and_upload/Python
sparkScriptBasename=download_parquet_from_data0_upload_online.py
sparkScript=${scriptRoot}/${sparkScriptBasename}
pysparkPath=py/py3/bin/python3

workflow.xml file

<!--
    This is for oozie Of workflow Provide parameters. The variables used in the parameters are from job.properties file
-->

<workflow-app xmlns='uri:oozie:workflow:1.0' name='download_parquet_from_data0_upload_online'>

    <global>
        <resource-manager>${resourceManager}</resource-manager>
        <name-node>${nameNode}</name-node>
    </global>

    <start to='spark-node' />

    <action name='spark-node'>
        <spark xmlns="uri:oozie:spark-action:1.0">
            <master>${sparkMaster}</master>
            <mode>${sparkMode}</mode>
            <name>report_voice_download_pyspark</name>
            <jar>${sparkScriptBasename}</jar>
            <spark-opts>
                --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=${pysparkPath}
            </spark-opts>
            <file>${sparkScript}#${sparkScriptBasename}</file>
            <archive>${archive}</archive>
        </spark>

        <ok to="end" />
        <error to="fail" />
    </action>

    <kill name="fail">
        <message>
            Workflow failed, error
            message[${wf:errorMessage(wf:lastErrorNode())}]
        </message>
    </kill>
    <end name='end' />
</workflow-app>

Place these two files on the local disk, for example, in the folder / home/workflow /

Run the command oozie job -oozie http://txz-data0:11000/oozie -config /home/workflow/job.properties -run to run the workflow

In this way, the handwritten configuration is invisible on the Hue, so the workflow is configured on the Hue later, and then the Schedule is configured. See blog for specific configuration https://blog.csdn.net/qq_22918243/article/details/89204111

Tags: Python Spark xml hive

Posted on Sun, 10 Nov 2019 10:47:21 -0800 by FlashHeart