What is Sqoop?
Sqoop (pronunciation: skup) is an open source tool, mainly used in Hadoop(Hive) and traditional databases (mysql, postgresql )For data transfer, you can import data from a relational database (such as mysql, Oracle, Postgres, etc.) into HDFS of Hadoop, or import data from HDFS into relational database. (for details, please refer to Baidu Encyclopedia, https://baike.baidu.com/item/sqoop/5033853?fr=aladdin)
The principle of Sqoop
Translate the import or export command into mapreduce program.
In the translated mapreduce, the main task is to customize the input format and output format.
- Operating system: Centos 6.8
- JDK version: 1.8.0_
- Haooop version: 2.7.2
- Shell version: XShell 6
- Download and upload the installation package to the virtual machine on the official website of Sqoop (http://archive.apache.org/dist/sqoop/1.4.6/)
- Open the virtual machine and unzip the sqoop installation package to the corresponding directory
[root@cos100 soft]# tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -C /usr/local/cluster/
- Rename the installation file for easy use
[root@cos100 cluster]# mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha/ sqoop
- Modify profile
[root@cos100 conf]# mv sqoop-env-template.sh sqoop-env.sh [root@cos100 conf]# vim sqoop-env.sh
Edit the file and add the following:
- Copy JDBC driver to the lib directory of sqoop
My flume/lib directory has been imported, so I directly connect to the driver package through the soft link
[root@cos100 lib]# ln -s /usr/local/cluster/flume/lib/mysql-connector-java-5.1.47-bin.jar
- Verify Sqoop configuration
[root@cos100 sqoop]# bin/sqoop help
The following warning and help commands indicate that the configuration is complete
Warning: /usr/local/cluster/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /usr/local/cluster/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 20/02/26 05:00:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 usage: sqoop COMMAND [ARGS] Available commands: codegen Generate code to interact with database records create-hive-table Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table help List available commands import Import a table from a database to HDFS import-all-tables Import tables from a database to HDFS import-mainframe Import datasets from a mainframe server to HDFS job Work with saved jobs list-databases List available databases on a server list-tables List available tables in a database merge Merge results of incremental imports metastore Run a standalone Sqoop metastore version Display version information See 'sqoop help COMMAND' for information on a specific command.
- Connect database test
[root@cos100 sqoop]# bin/sqoop list-databases --connect jdbc:mysql://cos100:3306/ --username root --password 000000
As the name implies, this command lists all databases in mysql, and the output is as follows:
Warning: /usr/local/cluster/sqoop/bin/../../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /usr/local/cluster/sqoop/bin/../../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 20/02/26 05:14:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 20/02/26 05:14:19 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 20/02/26 05:14:19 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. information_schem hive mysql mysqlsource test
At this point, it means that the database connection is successful.