0495 - how to enable Kerberos in CDH6.1

1 purpose of document preparation

In the previous article, Fayson introduced< 0491 - how to install CDH6.1 in RedHat 7.4 >, here we start to install Kerberos based on this environment. Kerberos is a third-party protocol for security authentication, which is not dedicated to Hadoop. You can also use it for other systems. It uses the traditional way of sharing key to realize the communication between client and server in the network environment that does not necessarily guarantee security. It is suitable for the client/server model and developed and implemented by MIT. Using Cloudera Manager, you can easily implement the interface Kerberos integration. In this paper, Fayson mainly introduces how to enable Kerberos in the CDH6.1 environment of RedHat 7.4.

  • Content overview:

1. How to install and configure KDC service

2. How to enable Kerberos through CDH

3. How to log in to Kerberos and access Hadoop related services

4. summary

  • Test environment:

1. Operating system: RedHat 7.4

2.CDH6.1

3. Use root user for operation

2 KDC service installation and configuration

In this document, KDC service is installed on the same server as Cloudera Manager Server (KDC service can be installed on other servers according to its own needs)

1. Install KDC service on Cloudera Manager server

[root@ip-172-31-6-83 ~]# yum -y install krb5-server krb5-libs krb5-auth-dialog krb5-workstation

2. Modify / etc/krb5.conf configuration

[root@ip-172-31-6-83 ~]# vim /etc/krb5.conf
# Configuration snippets may be placed in this directory as well
includedir /etc/krb5.conf.d/

[logging]
 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

[libdefaults]
 dns_lookup_realm = false
 ticket_lifetime = 24h
 renew_lifetime = 7d
 forwardable = true
 rdns = false
 default_realm = FAYSON.COM
 #default_ccache_name = KEYRING:persistent:%{uid}

[realms]
 FAYSON.COM = {
  kdc = ip-172-31-6-83.ap-southeast-1.compute.internal
  admin_server = ip-172-31-6-83.ap-southeast-1.compute.internal
 }

[domain_realm]
 .ap-southeast-1.compute.internal = FAYSON.COM
 ap-southeast-1.compute.internal = FAYSON.COM

The marked red part is the information to be modified.

3. Modify / var/kerberos/krb5kdc/kadm5.acl configuration

[root@ip-172-31-6-83 ~]# vim /var/kerberos/krb5kdc/kadm5.acl
*/admin@FAYSON.COM      *

4. Modify / var/kerberos/krb5kdc/kdc.conf configuration

[root@ip-172-31-6-83 ~]# vim /var/kerberos/krb5kdc/kdc.conf
[root@ip-172-31-6-83 ~]# cat /var/kerberos/krb5kdc/kdc.conf
[kdcdefaults]
 kdc_ports = 88
 kdc_tcp_ports = 88

[realms]
 FAYSON.COM = {
  #master_key_type = aes256-cts
  max_renewable_life= 7d 0h 0m 0s
  acl_file = /var/kerberos/krb5kdc/kadm5.acl
  dict_file = /usr/share/dict/words
  admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab
  supported_enctypes = aes256-cts:normal aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal camellia256-cts:normal camellia128-cts:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal
 }

The red part is the configuration to be modified.

5. create Kerberos database

[root@ip-172-31-6-83 ~]# kdb5_util create –r FAYSON.COM -s
Loading random data
Initializing database '/var/kerberos/krb5kdc/principal' for realm 'FAYSON.COM',
master key name 'K/M@FAYSON.COM'
You will be prompted for the database Master Password.
It is important that you NOT FORGET this password.
Enter KDC database master key: 
Re-enter KDC database master key to verify:

The password for the Kerberos database is required here.

6. Create the management account of Kerberos

[root@ip-172-31-6-83 ~]# kadmin.local
Authenticating as principal root/admin@FAYSON.COM with password.
kadmin.local:  addprinc admin/admin@FAYSON.COM
WARNING: no policy specified for admin/admin@FAYSON.COM; defaulting to no policy
Enter password for principal "admin/admin@FAYSON.COM": 
Re-enter password for principal "admin/admin@FAYSON.COM": 
Principal "admin/admin@FAYSON.COM" created.
kadmin.local:  exit

The marked red part is the Kerberos administrator account. You need to enter the administrator password.

7. Add the Kerberos service to the self starting service, and start the krb5kdc and kadmin services

[root@ip-172-31-6-83 ~]# systemctl enable krb5kdc
Created symlink from /etc/systemd/system/multi-user.target.wants/krb5kdc.service to /usr/lib/systemd/system/krb5kdc.service.
[root@ip-172-31-6-83 ~]# systemctl enable kadmin
Created symlink from /etc/systemd/system/multi-user.target.wants/kadmin.service to /usr/lib/systemd/system/kadmin.service.
[root@ip-172-31-6-83 ~]# systemctl start krb5kdc
[root@ip-172-31-6-83 ~]# systemctl start kadmin

8. Test the administrator account of Kerberos

[root@ip-172-31-6-83 ~]# kinit admin/admin@FAYSON.COM
Password for admin/admin@FAYSON.COM: 
[root@ip-172-31-6-83 ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: admin/admin@FAYSON.COM

Valid starting       Expires              Service principal
12/27/2018 22:05:56  12/28/2018 22:05:56  krbtgt/FAYSON.COM@FAYSON.COM
        renew until 01/03/2019 22:05:56
[root@ip-172-31-6-83 ~]#

9. Install all Kerberos clients for the cluster, including Cloudera Manager

Using batch scripts to install Kerberos clients for all nodes of the cluster

[root@ip-172-31-6-83 shell]# sh ssh_do_all.sh node.list 'yum -y install krb5-libs krb5-workstation'

10. Install additional packages on Cloudera Manager Server server

[root@ip-172-31-6-83 shell]# yum -y install openldap-clients

11. Copy the krb5.conf file on KDC Server to all Kerberos clients

Use the batch script to copy the krb5.conf configuration file of the Kerberos server to the / etc directory of all nodes in the cluster:

[root@ip-172-31-6-83 shell]# sh bk_cp.sh node.list /etc/krb5.conf /etc/

3. Enable Kerberos for CDH cluster

1. Add administrator account to Cloudera Manager in KDC

[root@ip-172-31-6-83 shell]# kadmin.local
Authenticating as principal admin/admin@FAYSON.COM with password.
kadmin.local:   addprinc cloudera-scm/admin@FAYSON.COM
WARNING: no policy specified for cloudera-scm/admin@FAYSON.COM; defaulting to no policy
Enter password for principal "cloudera-scm/admin@FAYSON.COM": 
Re-enter password for principal "cloudera-scm/admin@FAYSON.COM": 
Principal "cloudera-scm/admin@FAYSON.COM" created.
kadmin.local:  exit

2. Enter the "management" - > "security" interface of Cloudera Manager

3. Select enable Kerberos to enter the following interface

4. Make sure that all the inspection items listed below have been completed, and then click Check all

5. Click "continue" to configure the relevant KDC information, including the type, KDC server, KDC Realm, encryption type and the update life of the Service Principal to be created (hdfs, yarn, hbase, hive, etc.)

6. It is not recommended to let Cloudera Manager manage krb5.conf, click "continue"

7. Enter the Kerbers administrator account of Cloudera Manager, which must be consistent with the account created before. Click continue

8. Click continue to enable Kerberos

9. Enable Kerberos and click "continue"

10. Check restart cluster and click "continue"

11. After cluster restart, click "continue"

12. Click "continue"

Click Finish to enable Kerberos successfully.

13. Go back to the home page. Everything is normal. Check "management" - > "security" again. The interface shows "Kerberos has been successfully enabled."

4 Kerberos usage

To run MapReduce task and operate Hive with fayson user, you need to create fayson user in all nodes of the cluster.

1. Create a fayson principal using kadmin

[root@ip-172-31-6-83 shell]# kadmin.local
Authenticating as principal admin/admin@FAYSON.COM with password.
kadmin.local:  addprinc fayson@FAYSON.COM
WARNING: no policy specified for fayson@FAYSON.COM; defaulting to no policy
Enter password for principal "fayson@FAYSON.COM": 
Re-enter password for principal "fayson@FAYSON.COM": 
Principal "fayson@FAYSON.COM" created.
kadmin.local:  exit
You have new mail in /var/spool/mail/root

2. Log in to Kerberos using fayson user

[root@ip-172-31-6-83 ~]# kdestroy
You have new mail in /var/spool/mail/root
[root@ip-172-31-6-83 ~]# kinit fayson
Password for fayson@FAYSON.COM: 
[root@ip-172-31-6-83 ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: fayson@FAYSON.COM

Valid starting       Expires              Service principal
12/27/2018 22:24:20  12/28/2018 22:24:20  krbtgt/FAYSON.COM@FAYSON.COM
        renew until 01/03/2019 22:24:20
[root@ip-172-31-6-83 ~]#

3. Add fayson users to all nodes of the cluster

Use batch script to add fayson users to all nodes

[root@ip-172-31-6-83 shell]#  sh ssh_do_all.sh node.list "useradd fayson"

4. Run MapReduce job

[root@ip-172-31-6-83 hadoop-mapreduce]# hadoop jar /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 10 1

5. Use beeline to connect hive for testing

[root@ip-172-31-6-83 75-hdfs-NAMENODE]# beeline
WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/jars/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/jars/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Beeline version 2.1.1-cdh6.1.0 by Apache Hive
beeline>  !connect jdbc:hive2://localhost:10000/;principal=hive/ip-172-31-6-83.ap-southeast-1.compute.internal@FAYSON.COM
Connecting to jdbc:hive2://localhost:10000/;principal=hive/ip-172-31-6-83.ap-southeast-1.compute.internal@FAYSON.COM
Connected to: Apache Hive (version 2.1.1-cdh6.1.0)
Driver: Hive JDBC (version 2.1.1-cdh6.1.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000/> show tables;
INFO  : Compiling command(queryId=hive_20181227222823_efd7db98-0a9f-4645-a30a-810b51d1281b): show tables
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hive_20181227222823_efd7db98-0a9f-4645-a30a-810b51d1281b); Time taken: 1.13 seconds
INFO  : Executing command(queryId=hive_20181227222823_efd7db98-0a9f-4645-a30a-810b51d1281b): show tables
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20181227222823_efd7db98-0a9f-4645-a30a-810b51d1281b); Time taken: 0.051 seconds
INFO  : OK
+-----------+
| tab_name  |
+-----------+
+-----------+
No rows selected (1.69 seconds)
0: jdbc:hive2://localhost:10000/> create table t1 (s1 string,s2 string);
INFO  : Compiling command(queryId=hive_20181227222837_6a93eb8b-b323-4d72-957c-cd390a9f6947): create table t1 (s1 string,s2 string)
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20181227222837_6a93eb8b-b323-4d72-957c-cd390a9f6947); Time taken: 0.078 seconds
INFO  : Executing command(queryId=hive_20181227222837_6a93eb8b-b323-4d72-957c-cd390a9f6947): create table t1 (s1 string,s2 string)
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20181227222837_6a93eb8b-b323-4d72-957c-cd390a9f6947); Time taken: 0.107 seconds
INFO  : OK
No rows affected (0.216 seconds)
0: jdbc:hive2://localhost:10000/>

Insert data into the test table

0: jdbc:hive2://localhost:10000/> insert into t1 values('1','2');
0: jdbc:hive2://localhost:10000/> select * from t1;

Execute a Count statement

0: jdbc:hive2://localhost:10000/> select count(*) from t1;

5 FAQs

1. Use Kerberos user identity to run MapReduce job and report error

main : run as user is fayson
main : requested yarn user is fayson
Requested user fayson is not whitelisted and has id 501,which is below the minimum allowed 1000

Failing this attempt. Failing the application.
17/09/02 20:05:04 INFO mapreduce.Job: Counters: 0
Job Finished in 6.184 seconds
java.io.FileNotFoundException: File does not exist: hdfs://ip-172-31-6-148:8020/user/fayson/QuasiMonteCarlo_1504382696029_1308422444/out/reduce-out
        at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1266)
        at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1258)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1258)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1820)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1844)
        at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
        at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
        at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Reason for the problem: Yarn restricts users with user id less than 1000 to submit jobs;

Solution: modify min.user.id of Yarn to solve

2. After kinit operation, execute MR operation report "User fayson not found"

Reason for the problem: there is no fayson user on the cluster node

Solution: you need to add fayson users to all nodes of the cluster

6 Summary

  • The process of enabling Kerberos between CDH6 and CDH5 is basically the same, except for some changes in the interface of CDH6.
  • To enable Kerberos in a CDH cluster, you need to install the Kerberos service (krb5kdc and kadmin services)
  • All nodes in the cluster need to install Kerberos client to communicate with kdc service
  • Additional openldap clients package needs to be installed in Cloudera Manager Server node
  • After the CDH cluster is enabled with Kerberos, you need to ensure that there are fayson users in the operating system of all nodes of the cluster when submitting jobs to the cluster using your own defined fayson users, otherwise the jobs will fail to execute

Tip: the code block can be viewed by sliding left and right
Warm tip: if you can't see the picture clearly by using the computer, you can use the mobile phone to open the article and click the picture in the article to enlarge and view the original HD picture.

305 original articles published, 11 praised, 20000 visitors+
Private letter follow

Tags: Java Hadoop Apache hive

Posted on Sat, 11 Jan 2020 22:18:12 -0800 by tonchily