[Hadoop offline basic summary] oozie scheduling MapReduce tasks

Catalog

  • 1. Prepare data for MR execution

    Mr program can be written by itself, or it can be brought by hadoop project. Here we choose the MR program of hadoop project to run wordcount
    Prepare the following data to upload to the / oozie/input path of HDFS

    hdfs dfs -mkdir -p /oozie/input
    vim wordcount.txt
    
    hello   world   hadoop
    spark   hive    hadoop
    

    hdfs DFS - put wordcount.txt/oozie/input upload data to the corresponding directory of hdfs

  • 2. Carry out official test cases

    yarn jar /export/servers/hadoop-2.6.0-cdh5.14.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.0.jar wordcount /oozie/input/ /oozie/output

  • 3. Prepare resources for our scheduling

    Put all the resources to be scheduled under a folder, including jar package, ob.properties and workflow.xml
    Copy MR's task template

    cd /export/servers/oozie-4.1.0-cdh5.14.0
    cp -ra examples/apps/map-reduce/ oozie_works/
    

    Delete the jar package in the lib directory of MR task template

    cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib
    rm -rf oozie-examples-4.1.0-cdh5.14.0.jar
    

    Copy the jar package to the corresponding directory
    From the deletion in the previous step, you can see that the jar packages to be scheduled are stored in the directory / export / servers / oozie-4.1.0-cdh5.14.0/oozie work / map reduce / lib, so you can also put the jar packages to be scheduled in this path
    cp /export/servers/hadoop-2.6.0-cdh5.14.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.14.0.jar /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce/lib/

  • 4. Modify the configuration file

    Modify job.properties

    cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce
    vim job.properties
    
    nameNode=hdfs://node01:8020
    jobTracker=node01:8032
    queueName=default
    examplesRoot=oozie_works
    
    oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/map-reduce/workflow.xml
    outputDir=/oozie/output
    inputdir=/oozie/input
    

    Modify workflow.xml

    cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works/map-reduce
    vim workflow.xml
    
    <?xml version="1.0" encoding="UTF-8"?>
    <!--
      Licensed to the Apache Software Foundation (ASF) under one
      or more contributor license agreements.  See the NOTICE file
      distributed with this work for additional information
      regarding copyright ownership.  The ASF licenses this file
      to you under the Apache License, Version 2.0 (the
      "License"); you may not use this file except in compliance
      with the License.  You may obtain a copy of the License at
      
           http://www.apache.org/licenses/LICENSE-2.0
      
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License.
    -->
    <workflow-app xmlns="uri:oozie:workflow:0.5" name="map-reduce-wf">
        <start to="mr-node"/>
        <action name="mr-node">
            <map-reduce>
                <job-tracker>${jobTracker}</job-tracker>
                <name-node>${nameNode}</name-node>
                <prepare>
                    <delete path="${nameNode}/${outputDir}"/>
                </prepare>
                <configuration>
                    <property>
                        <name>mapred.job.queue.name</name>
                        <value>${queueName}</value>
                    </property>
                    <!--Comment out the original configuration-->
    				<!--  
                    <property>
                        <name>mapred.mapper.class</name>
                        <value>org.apache.oozie.example.SampleMapper</value>
                    </property>
                    <property>
                        <name>mapred.reducer.class</name>
                        <value>org.apache.oozie.example.SampleReducer</value>
                    </property>
                    <property>
                        <name>mapred.map.tasks</name>
                        <value>1</value>
                    </property>
                    <property>
                        <name>mapred.input.dir</name>
                        <value>/user/${wf:user()}/${examplesRoot}/input-data/text</value>
                    </property>
                    <property>
                        <name>mapred.output.dir</name>
                        <value>/user/${wf:user()}/${examplesRoot}/output-data/${outputDir}</value>
                    </property>
    				-->
    				
    				   <!-- Open new API To configure -->
                    <property>
                        <name>mapred.mapper.new-api</name>
                        <value>true</value>
                    </property>
    
                    <property>
                        <name>mapred.reducer.new-api</name>
                        <value>true</value>
                    </property>
    
                    <!-- Appoint MR Output key Types -->
                    <property>
                        <name>mapreduce.job.output.key.class</name>
                        <value>org.apache.hadoop.io.Text</value>
                    </property>
    
                    <!-- Appoint MR Output. value Types-->
                    <property>
                        <name>mapreduce.job.output.value.class</name>
                        <value>org.apache.hadoop.io.IntWritable</value>
                    </property>
    
                    <!-- Specify input path -->
                    <property>
                        <name>mapred.input.dir</name>
                        <value>${nameNode}/${inputdir}</value>
                    </property>
    
                    <!-- Specify output path -->
                    <property>
                        <name>mapred.output.dir</name>
                        <value>${nameNode}/${outputDir}</value>
                    </property>
    
                    <!-- Specify the map class -->
                    <property>
                        <name>mapreduce.job.map.class</name>
                        <value>org.apache.hadoop.examples.WordCount$TokenizerMapper</value>
                    </property>
    
                    <!-- Specify the reduce class -->
                    <property>
                        <name>mapreduce.job.reduce.class</name>
                        <value>org.apache.hadoop.examples.WordCount$IntSumReducer</value>
                    </property>
    				<!--  To configure map task Number of -->
                    <property>
                        <name>mapred.map.tasks</name>
                        <value>1</value>
                    </property>
    
                </configuration>
            </map-reduce>
            <ok to="end"/>
            <error to="fail"/>
        </action>
        <kill name="fail">
            <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
        </kill>
        <end name="end"/>
    </workflow-app>
    
  • 5. Upload the scheduling task to the corresponding directory of hdfs
    cd /export/servers/oozie-4.1.0-cdh5.14.0/oozie_works
    hdfs dfs -put map-reduce/ /user/root/oozie_works/
    
  • 6. Perform scheduling tasks

    Execute the scheduling task, and then view the task results through the 11000 port of oozie

    cd /export/servers/oozie-4.1.0-cdh5.14.0
    bin/oozie job -oozie http://node03:11000/oozie -config oozie_works/map-reduce/job.properties -run
    
88 original articles published, praised 0, 2559 visitors
Private letter follow

Tags: Hadoop Apache xml vim

Posted on Sun, 15 Mar 2020 20:23:55 -0700 by Rother2005