Installation and deployment of Sqoop

I. overview

  1. What is Sqoop?
    Sqoop (pronunciation: skup) is an open source tool, mainly used in Hadoop(Hive) and traditional databases (mysql, postgresql )For data transfer, you can import data from a relational database (such as mysql, Oracle, Postgres, etc.) into HDFS of Hadoop, or import data from HDFS into relational database. (for details, please refer to Baidu Encyclopedia, https://baike.baidu.com/item/sqoop/5033853?fr=aladdin)

  2. The principle of Sqoop
    Translate the import or export command into mapreduce program.
    In the translated mapreduce, the main task is to customize the input format and output format.

2, Deployment environment preparation

  1. Operating system: Centos 6.8
  2. JDK version: 1.8.0_
  3. Haooop version: 2.7.2
  4. Shell version: XShell 6

3, Installation and deployment

  1. Download and upload the installation package to the virtual machine on the official website of Sqoop (http://archive.apache.org/dist/sqoop/1.4.6/)
  2. Open the virtual machine and unzip the sqoop installation package to the corresponding directory
[root@cos100 soft]# tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -C /usr/local/cluster/
  1. Rename the installation file for easy use
[root@cos100 cluster]# mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha/ sqoop
  1. Modify profile
[root@cos100 conf]# mv sqoop-env-template.sh sqoop-env.sh
[root@cos100 conf]# vim sqoop-env.sh

Edit the file and add the following:
export HADOOP_COMMON_HOME=/usr/local/cluster/hadoop
export HADOOP_MAPRED_HOME=/usr/local/cluster/hadoop
export HBASE_HOME=/usr/local/cluster/hbase
export HIVE_HOME=/usr/local/cluster/hive
export ZOOKEEPER_HOME=/usr/local/cluster/zookeeper
export ZOOCFGDIR=/usr/local/cluster/zookeeper

  1. Copy JDBC driver to the lib directory of sqoop
    My flume/lib directory has been imported, so I directly connect to the driver package through the soft link
[root@cos100 lib]# ln -s /usr/local/cluster/flume/lib/mysql-connector-java-5.1.47-bin.jar
  1. Verify Sqoop configuration
[root@cos100 sqoop]# bin/sqoop help

The following warning and help commands indicate that the configuration is complete

Warning: /usr/local/cluster/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/cluster/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/02/26 05:00:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information

See 'sqoop help COMMAND' for information on a specific command.
  1. Connect database test
[root@cos100 sqoop]# bin/sqoop list-databases --connect jdbc:mysql://cos100:3306/ --username root --password 000000

As the name implies, this command lists all databases in mysql, and the output is as follows:

Warning: /usr/local/cluster/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/cluster/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/02/26 05:14:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
20/02/26 05:14:19 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/02/26 05:14:19 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schem
hive
mysql
mysqlsource
test

At this point, it means that the database connection is successful.

Complete!

Published 24 original articles, won praise 11, visited 689
Private letter follow

Tags: Database MySQL hive Hadoop

Posted on Tue, 10 Mar 2020 03:58:20 -0700 by drath