1. Built in functions of Hive system

1.1 numerical calculation function

1. Rounding function: round

Syntax: round(double a)
Return value: BIGINT
Note: returns the integer value part of double type (following rounding)

```hive> select round(3.1415926) from tableName;
3
hive> select round(3.5) from tableName;
4
hive> create table tableName as select round(9542.158) from tableName;
```
2. Specified precision rounding function: round

Syntax: round(double a, int d)
Return value: DOUBLE
Description: returns the double type of the specified precision d

```hive> select round(3.1415926,4) from tableName;
3.1416```
3. Rounding down function: floor

Syntax: floor(double a)
Return value: BIGINT
Description: returns the largest integer equal to or less than the double variable

```hive> select floor(3.1415926) from tableName;
3
hive> select floor(25) from tableName;
25
```
4. Round up function: ceil

Syntax: ceil(double a)
Return value: BIGINT
Description: returns the smallest integer equal to or greater than the double variable

```hive> select ceil(3.1415926) from tableName;
4
hive> select ceil(46) from tableName;
46
```
5. Rounding up function: ceiling

Syntax: ceiling(double a)
Return value: BIGINT
Note: same function as ceil

```hive> select ceiling(3.1415926) from tableName;
4
hive> select ceiling(46) from tableName;
46
```
6. Take random number function: rand

Syntax: rand(),rand(int seed)
Return value: double
Description: returns a random number in the range of 0 to 1. If seed is specified, it will wait for a stable sequence of random numbers

```hive> select rand() from tableName;
0.5577432776034763
hive> select rand() from tableName;
0.6638336467363424
hive> select rand(100) from tableName;
0.7220096548596434
hive> select rand(100) from tableName;
0.7220096548596434```

1.2 date function

1. UNIX timestamp to date function: from "unixtime

Syntax: from [unixtime (bigint unixtime [, string format])
Return value: string
Note: convert the time format of UNIX timestamp (seconds from 1970-01-01 00:00:00 UTC to the specified time) to the current time zone

```hive> select from_unixtime(1323308943,'yyyyMMdd') from tableName;
20111208
```
2. Get the current UNIX timestamp function: Unix "timestamp

Syntax: Unix? Timestamp()
Return value: bigint
Description: get the UNIX timestamp of the current time zone

```hive> select unix_timestamp() from tableName;
1323309615```
3. Date to UNIX timestamp function: Unix "timestamp

Syntax: Unix? Timestamp (string date)
Return value: bigint
Description: converts the date to UNIX timestamp in the format "yyyy MM DD HH: mm: SS". If the conversion fails, 0 is returned.

```hive> select unix_timestamp('2011-12-07 13:01:03') from tableName;
1323234063```
4. Specify format date to UNIX timestamp function: Unix "timestamp

Syntax: unix_timestamp(string date, string pattern)
Return value: bigint
Description: converts the date in pattern format to UNIX timestamp. If the conversion fails, 0 is returned.

```hive> select unix_timestamp('20111207 13:01:03','yyyyMMdd HH:mm:ss') from tableName;
1323234063
```
5. Date time to date function: to "date

Syntax: to "date (string timestamp)
Return value: string
Description: returns the date part of the date time field.

```hive> select to_date('2011-12-08 10:03:01') from tableName;
2011-12-08```
6. Date year function: year

Syntax: year(string date)
Return value: int
Description: returns the year in the date.

```hive> select year('2011-12-08 10:03:01') from tableName;
2011
hive> select year('2012-12-08') from tableName;
2012
```
7. Date to month function: month

Syntax: month (string date)
Return value: int
Description: returns the month in the date.

```hive> select month('2011-12-08 10:03:01') from tableName;
12
hive> select month('2011-08-08') from tableName;
8
```
8. Date transit function: day

Syntax: day (string date)
Return value: int
Description: returns the days in the date.

```hive> select day('2011-12-08 10:03:01') from tableName;
8
hive> select day('2011-12-24') from tableName;
24
```
9. Date to hour function: hour

Syntax: hour (string date)
Return value: int
Description: returns the hour in the date.

```hive> select hour('2011-12-08 10:03:01') from tableName;
10
```
10. Date to minute function: minute

Syntax: minute (string date)
Return value: int
Description: returns the minutes in the date.

```hive> select minute('2011-12-08 10:03:01') from tableName;
3

hive> select second('2011-12-08 10:03:01') from tableName;
1```
12. Date cycle function: weekofyear

Syntax: weekofyear (string date)
Return value: int
Note: returns the number of weeks the date is in.

```hive> select weekofyear('2011-12-08 10:03:01') from tableName;
49
```
13. Date comparison function: datediff

Syntax: datediff(string enddate, string startdate)
Return value: int
Description: returns the number of days from the end date minus the start date.

```hive> select datediff('2012-12-08','2012-05-09') from tableName;
213
```

Syntax: Date Ou add (string StartDate, int days)
Return value: string
Description: returns the date after the start date is increased by days.

```hive> select date_add('2012-12-08',10) from tableName;
2012-12-18
```
15. Date reduction function: Date sub

Syntax: Date UU sub (string StartDate, int days)
Return value: string
Description: returns the date after the start date is reduced by days.

```hive> select date_sub('2012-12-08',10) from tableName;
2012-11-28
```

1.3 condition function

1. If function: if

Syntax: if (Boolean TestCondition, t ValueTrue, t valuefalse or null)
Return value: T
Description: when the condition testCondition is TRUE, it returns valueTrue; otherwise, it returns valuefalse or null

```hive> select if(1=2,100,200) from tableName;
200
hive> select if(1=1,100,200) from tableName;
100
```
2. Non empty lookup function: COALESCE

Syntax: COALESCE(T v1, T v2 )
Return value: T
Description: returns the first non NULL value in the parameter; if all values are NULL, then returns NULL

```hive> select COALESCE(null,'100','50') from tableName;
100
```
3. Condition judgment function: CASE

Syntax: CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END
Return value: T
Note: if a is equal to b, then c is returned; if a is equal to d, then e is returned; otherwise, f is returned

```hive> Select case 100 when 50 then 'tom' when 100 then 'mary' else 'tim' end from tableName;
mary
hive> Select case 200 when 50 then 'tom' when 100 then 'mary' else 'tim' end from tableName;
tim
```
4. Condition judgment function: CASE

Syntax: CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END
Return value: T
Note: if a is TRUE, b is returned; if c is TRUE, d is returned; otherwise, e is returned

```hive> select case when 1=2 then 'tom' when 2=2 then 'mary' else 'tim' end from tableName;
mary
hive> select case when 1=1 then 'tom' when 2=2 then 'mary' else 'tim' end from tableName;
tom
```

1.4 string function

1. String length function: length

Syntax: length(string A)
Return value: int
Description: returns the length of string A

`hive> select length('abcedfg') from tableName;`
2. String inversion function: reverse

Syntax: reverse(string A)
Return value: string
Description: returns the inversion result of string A

```hive> select reverse('abcedfg') from tableName;
gfdecba```
3. String connection function: concat

Syntax: concat(string A, string B )
Return value: string
Description: returns the result after connecting the input string. Any input string is supported

```hive> select concat('abc','def','gh') from tableName;
abcdefgh
```
4. String connect and specify string separator: concat? WS

Syntax: concat \ WS (string SEP, string a, string B )
Return value: string
Description: returns the result after the input string is connected. SEP represents the separator between each string

```hive> select concat_ws(',','abc','def','gh')from tableName;
abc,def,gh
```
5. String truncation function: substr

Syntax: substr(string A, int start),substring(string A, int start)
Return value: string
Description: returns the string A from start to end

```hive> select substr('abcde',3) from tableName;
cde
hive> select substring('abcde',3) from tableName;
cde
hive>  select substr('abcde',-1) from tableName;  (and ORACLE Same)
e
```
6. String truncation function: substr,substring

Syntax: substr(string A, int start, int len),substring(string A, int start, int len)
Return value: string
Description: returns the string A starting from the start position, with the length of len

```hive> select substr('abcde',3,2) from tableName;
cd
hive> select substring('abcde',3,2) from tableName;
cd
hive>select substring('abcde',-2,2) from tableName;
de
```
7. String capitalization function: upper,ucase

Syntax: upper(string A) ucase(string A)
Return value: string
Description: returns the uppercase format of string A

```hive> select upper('abSEd') from tableName;
ABSED
hive> select ucase('abSEd') from tableName;
ABSED
```
8. String to lowercase function: lower,lcase

Syntax: lower(string A) lcase(string A)
Return value: string
Description: returns the lowercase format of string A

```hive> select lower('abSEd') from tableName;
absed
hive> select lcase('abSEd') from tableName;
absed
```
9. Blank function: trim

Syntax: trim(string A)
Return value: string
Description: remove the spaces on both sides of the string

```hive> select trim(' abc ') from tableName;
abc
```
10. Parse url

Syntax:
parse_url(string urlString, string partToExtract [, string keyToExtract])
Return value: string
Description: returns the specified part of the URL. Valid values for partToExtract are: HOST, PATH,
QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO.

```hive> select parse_url
('https://www.tableName.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST')
from tableName;
www.tableName.com
hive> select parse_url
('https://www.tableName.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1')
from tableName;
v1
```
11. Get json object

Syntax: get JSON object (string JSON string, string path)
Return value: string
Description: parse the string json_string of JSON and return the content specified by path. NULL if the JSON string entered is invalid.

`hive> select  get_json_object('{"store":{"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}], "bicycle":{"price":19.95,"color":"red"} },"email":"amy@only_for_json_udf_test.net","owner":"amy"}','\$.owner') from tableName;`
12. Repeat string function: repeat

Syntax: repeat(string str, int n)
Return value: string
Description: return str string after n repetitions

```hive> select repeat('abc',5) from tableName;
abcabcabcabcabc```
13. Split string function: split

Syntax: split(string str, string pat)
Return value: array
Note: splitting str according to pat string will return the string array after splitting

```hive> select split('abtcdtef','t') from tableName;
["ab","cd","ef"]```

1.5 set statistical function

1. Number statistics function: count

Syntax: count(*), count(expr), count(DISTINCT expr[, expr])
Return value: Int

Note: count(*) counts the number of retrieved rows, including those with NULL values; count(expr) returns the number of non NULL values of the specified field; count(DISTINCT
expr[, expr]] returns the number of different non null values of the specified field

```hive> select count(*) from tableName;
20
hive> select count(distinct t) from tableName;
10
```
2. sum statistic function: sum

Syntax: sum(col), sum(DISTINCT col)
Return value: double
Note: sum of col in sum(col) statistical result set; sum of col values in sum(DISTINCT col) statistical result set

```hive> select sum(t) from tableName;
100
hive> select sum(distinct t) from tableName;
70
```
3. Average statistical function: avg

Syntax: avg(col), avg(DISTINCT col)
Return value: double
Note: the average value of col in the avg(col) statistical result set; the average value of the sum of different values of col in the avg(DISTINCT col) statistical result

```hive> select avg(t) from tableName;
50
hive> select avg (distinct t) from tableName;
30
```
4. Statistical function of minimum value: min

Syntax: min(col)
Return value: double
Description: the minimum value of col field in the statistical result set

```hive> select min(t) from tableName;
20
```
5. max statistical function: max

Syntax: maxcol)
Return value: double
Description: the maximum value of col field in the statistics result set

```hive> select max(t) from tableName;
120```

1.6 composite construction function

1. map type construction: map

Syntax: map (key1, value1, key2, value2 )
Description: build map type according to the input key and value pairs

```create table score_map(name string, score map<string,int>)
row format delimited fields terminated by '\t'
collection items terminated by ',' map keys terminated by ':';

Create the data content as follows and load the data
cd /kkb/install/hivedatas/
vim score_map.txt

zhangsan Mathematics: 80, Chinese: 89, English: 95
lisi language: 60, mathematics: 80, English: 99

load data local inpath '/kkb/install/hivedatas/score_map.txt' overwrite into table score_map;

map structure data access:
Get all value s:
select name,map_values(score) from score_map;

Get all key s:
select name,map_keys(score) from score_map;

Get value value according to key
select name,score ["Mathematics"] from score [map;

View the number of map elements
select name,size(score) from score_map;
```
2. Struct type construction: struct

Syntax: struct(val1, val2, val3 )
Note: according to the input parameters, construct the struct type, which is similar to the struct in C language. The internal data is obtained through X.X. suppose our data format is like this. There are 1254 people in ABC movie who have evaluated it and scored 7.4 points

```Establish struct surface
hive> create table movie_score( name string,  info struct<number:int,score:float> )row format delimited fields terminated by "\t"  collection items terminated by ":";

cd /kkb/install/hivedatas/
vim struct.txt

ABC 1254:7.4
DEF 256:4.9
XYZ 456:5.4

load data local inpath '/kkb/install/hivedatas/struct.txt' overwrite into table movie_score;

hive Query data in
hive> select * from movie_score;
hive> select info.number,info.score from movie_score;
OK
1254    7.4
256     4.9
456     5.4
```
3. Array type construction: array

Syntax: array(val1, val2 )
Description: build array type according to input parameters

```hive> create table  person(name string,work_locations array<string>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ',';

cd /kkb/install/hivedatas/
vim person.txt

The format of data content is as follows
biansutao   beijing,shanghai,tianjin,hangzhou
linan   changchu,chengdu,wuhan

hive > load  data local inpath '/kkb/install/hivedatas/person.txt' overwrite into table person;

Query all data
hive > select * from person;

Query according to the following table index
hive > select work_locations[0] from person;

Query all collection data
hive  > select work_locations from person;

Number of query elements
hive >  select size(work_locations) from person;   ```

1.7. Complex length statistical function

1.Map type length function: size (map < k.v >)

Syntax: size (map < k.v >)
Return value: int
Description: returns the length of map type

```hive> select size(t) from map_table2;
2```
2.array type length function: size (array < T >)

Syntax: size (array < T >)
Return value: int
Description: returns the length of array type

```hive> select size(t) from arr_table2;
4
```
3. Type conversion function

Type conversion function: cast
Syntax: cast (expr as < type >)
Return value: Expected "=" to follow "type"
Description: returns the converted data type

```hive> select cast('1' as bigint) from tableName;
1```

1.8 expand function

1. Use the expand function to split the Map and Array field data in the hive table

lateral view is used with split, expand and other UDTF. It can split a row of data into multiple rows. On this basis, it can aggregate the split data. First, UDTF is called for each row of the original table. UDTF will split a row into one or more rows. lateral view combines the results to produce a virtual table that supports alias tables.
Expand can also be used to split the complex array or map structure in the hive column into multiple rows

```Requirement: data format is as follows
zhangsan    child1,child2,child3,child4 k1:v1,k2:v2
lisi    child5,child6,child7,child8  k3:v3,k4:v4

Use \ tdivision between fields. All child ren need to be separated into one column
+----------+--+
| mychild  |
+----------+--+
| child1   |
| child2   |
| child3   |
| child4   |
| child5   |
| child6   |
| child7   |
| child8   |
+----------+--+

Separate the key and value of the map, and the result is as follows

+-----------+-------------+--+
| mymapkey  | mymapvalue  |
+-----------+-------------+--+
| k1        | v1          |
| k2        | v2          |
| k3        | v3          |
| k4        | v4          |
+-----------+-------------+--+```
Step 1: create hive database

Create hive database d

Step 1: create hive database

Create hive database d

```hive (default)> create database hive_explode;
hive (default)> use hive_explode;```
Step 2: create the hive table, and then use explode to split the map and array
```create  table hive_explode.t3(name string,
children array<string>,
row format delimited fields terminated by '\t'
collection items terminated by ','
map keys terminated by ':'
stored as textFile;```

node03 execute the following command to create a table data file

```cd  /kkb/install/hivedatas/

vim maparray
//The format of data content is as follows

zhangsan    child1,child2,child3,child4 k1:v1,k2:v2
lisi    child5,child6,child7,child8 k3:v3,k4:v4```

`hive (hive_explode)> load data local inpath '/kkb/install/hivedatas/maparray' into table hive_explode.t3;`
Step 4: use explode to separate the data in hive

Separate the data in the array

`hive (hive_explode)> SELECT explode(children) AS myChild FROM hive_explode.t3;`

Separate the data in the map

`hive (hive_explode)> SELECT explode(address) AS (myMapKey, myMapValue) FROM hive_explode.t3;`
2. Using explode to split json strings

Requirements: now there are some data formats as follows:

`a:shandong,b:beijing,c:hebei|1,2,3,4,5,6,7,8,9|[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jd","monthSales":2090,"userCount":78981,"score":"9.8"},{"source":"jdmart","monthSales":6987,"userCount":1600,"score":"9.0"}]`

Where the separator between fields is|

We need to parse all the corresponding values of monthSales into the following column (row to column)

```4900
2090
6987```
Step 1: create hive table
```hive (hive_explode)>
create table hive_explode.explode_lateral_view (
area string,
goods_id string,
sale_info string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS textfile;```
Step 2: prepare the data and load it

Prepare the data as follows

```cd /kkb/install/hivedatas
vim explode_json

a:shandong,b:beijing,c:hebei|1,2,3,4,5,6,7,8,9|[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jd","monthSales":2090,"userCount":78981,"score":"9.8"},{"source":"jdmart","monthSales":6987,"userCount":1600,"score":"9.0"}]```

`hive (hive_explode)> load data local inpath '/kkb/install/hivedatas/explode_json' overwrite into table hive_explode.explode_lateral_view;`
Step 3: use explode to split the Array
`hive (hive_explode)> select explode(split(goods_id,',')) as goods_id from hive_explode.explode_lateral_view;`
Step 4: use explode to disassemble the Map
`hive (hive_explode)> select explode(split(area,',')) as area from hive_explode.explode_lateral_view;`
Step 5: disassemble the json field
`hive (hive_explode)> select explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\{')) as  sale_info from hive_explode.explode_lateral_view;`

Then we want to use get JSON object to get the data whose key is monthSales:

```hive (hive_explode)> select get_json_object(explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\{')),'\$.monthSales') as  sale_info from hive_explode.explode_lateral_view;
Then the exception failed: semanticexception [error 10081]: udtf's are not supported outside the select clause, nor needed in expressions
Udtf expand cannot be written in other functions
If you write this way, you want to check two fields, select expand (split (area, ')) as area, good_id from expand_late_view;
Failed: semanticexception 1:40 only a single expression in the select clause is supported with udtf's. error encrypted near token 'good' ID '
When UDTF is used, only one field is supported, and later view is required```
3. Use with later view

Query multiple fields with the general view

`hive (hive_explode)> select goods_id2,sale_info from explode_lateral_view LATERAL VIEW explode(split(goods_id,','))goods as goods_id2;`

Among them, the later view expand (split (goods_id, ')) goods is equivalent to a virtual table, which is associated with the original table expand_later_view Cartesian product.

It can also be used multiple times

`hive (hive_explode)> select goods_id2,sale_info,area2 from explode_lateral_view  LATERAL VIEW explode(split(goods_id,','))goods as goods_id2 LATERAL VIEW explode(split(area,','))area as area2;`

It's also the result of Cartesian product of three tables

Finally, we can use the following sentence to completely transform a row of data in json format into a two-dimensional table

`hive (hive_explode)> select get_json_object(concat('{',sale_info_1,'}'),'\$.source') as source, get_json_object(concat('{',sale_info_1,'}'),'\$.monthSales') as monthSales, get_json_object(concat('{',sale_info_1,'}'),'\$.userCount') as monthSales,  get_json_object(concat('{',sale_info_1,'}'),'\$.score') as monthSales from explode_lateral_view   LATERAL VIEW explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\{'))sale_info as sale_info_1;`

Conclusion:

In order to solve the problem that UDTF is not allowed in the select field, the later view usually appears together with UDTF.
Multiple linear view can realize Cartesian like product.
The Outer keyword can output the NULL result of UDTF that is not output to NULL to prevent data loss.

1.9 column row conversion function

1.9.1 column to row

1. Description of related functions

CONCAT(string A/col, string B/col… ): returns the result after connecting the input string. Any input string is supported;

CONCAT_WS(separator, str1, str2,...): it is a special form of CONCAT(). The separator between the remaining parameters of the first parameter. The separator can be the same string as the remaining parameters. If the separator is NULL, the return value will also be NULL. This function skips any NULL and empty strings after the delimiter parameter. The separator will be added between the connected strings;

Collect set (Col): the function only accepts basic data types. Its main function is to de summarize the values of a field and generate array type fields.

2. Data preparation

Table 6-6 data preparation

name constellation blood_type
Sun WuKong Aries A
Lao Wang Sagittarius A
Song and Song Dynasties Aries B
Zhu Bajie Aries A
Ice and ice Sagittarius A
3. demand

Group people with the same constellation and blood type. The results are as follows:

```Sagittarius, A old king
Aries, A monkey king
Aries, B song song```
4. Create local conservation.txt and import data

The node03 server executes the following command to create a file. Note that the data is divided by \ t

```cd /kkb/install/hivedatas
vim constellation.txt```
```Monkey King Aries A
Sagittarius A
Aries B in song and Song Dynasties
Pig Bajie Aries A
Sagittarius A```
5. Create hive table and import data

Create hive table and load data

`hive (hive_explode)> create table person_info(  name string,  constellation string,  blood_type string)  row format delimited fields terminated by "\t";`

`hive (hive_explode)> load data local inpath '/kkb/install/hivedatas/constellation.txt' into table person_info;`
6. Query data by demand
`hive (hive_explode)> select t1.base, concat_ws('|', collect_set(t1.name)) name from    (select name, concat(constellation, "," , blood_type) base from person_info) t1 group by  t1.base;`

1.9.2 row to column

1. Function description

EXPLODE(col): split the complex array or map structure in the hive column into multiple rows.

LATERAL VIEW

Usage: later view udtf (expression) tablealias as columnalias

Explanation: used with split, expand and other UDTF, it can split a column of data into multiple rows of data, on this basis, the split data can be aggregated.

2. Data preparation

The data content is as follows. The fields are divided by using \ t

```cd /kkb/install/hivedatas

vim movie.txt
<Suspect tracking suspense,action,science fiction,Plot
<Lie to me> Suspense,gangster,action,Psychology,Plot
<Wolf 2 War,action,disaster```
3. demand

Expand the array data in the movie category. The results are as follows:

```Suspect tracking suspense
Suspect tracking action
Suspect tracking science fiction
The plot of "suspect tracking"
"Lie to me" suspense
Lie to me police
"Lie to me" action
The psychology of Lie to me
The plot of Lie to me
Wolf 2 War
Action of wolf 2
Wolf 2 disaster```
4. Create hive table and import data

Create hive table

```hive (hive_explode)> create table movie_info(
movie string,
category array<string>
)
row format delimited fields terminated by "\t"
collection items terminated by ",";```

`load data local inpath "/kkb/install/hivedatas/movie.txt" into table movie_info;`
5. Query data by demand
```hive (hive_explode)>
select movie, category_name
from
movie_info lateral view explode(category) table_tmp as category_name;```

1.10. reflect function

The reflect function can support the calling function in java in sql.

Use Max in java.lang.Math to find the maximum value in two columns

Create hive table

Create hive table

```hive (hive_explode)>
create table test_udf(col1 int,col2 int)
row format delimited fields terminated by ',';```

```cd /kkb/install/hivedatas

vim test_udf

1,2
4,3
6,4
7,5
5,6```

`hive (hive_explode)> load data local inpath '/kkb/install/hivedatas/test_udf' overwrite into table test_udf;`

Use Max in java.lang.Math to find the maximum value in two columns

`hive (hive_explode)> select reflect("java.lang.Math","max",col1,col2) from test_udf;`
Different records perform different java built-in functions

Create hive table

`hive (hive_explode)> create table test_udf2(class_name string,method_name string,col1 int , col2 int) row format delimited fields terminated by ',';`

Preparation data

```cd /export/servers/hivedatas

vim test_udf2

java.lang.Math,min,1,2
java.lang.Math,max,2,3
```

`hive (hive_explode)> load data local inpath '/kkb/install/hivedatas/test_udf2' overwrite into table test_udf2;`

Execution query

`hive (hive_explode)> select reflect(class_name,method_name,col1,col2) from test_udf2;`
Judge whether it is a number

Using the functions in apache commons, the jar under Commons is already included in the classpath of hadoop, so it can be used directly.

Use as follows:

`hive (hive_explode)> select reflect("org.apache.commons.lang.math.NumberUtils","isNumber","123");`

1.11 analysis function

1. Introduction to the function of analysis function

For some complex data retrieval processes, we may need to use analysis functions, which are mainly used to group topN, or calculate percentage, or slice data, and so on. We can use analysis functions to solve these problems

2. Introduction of common analysis functions

1,ROW_NUMBER():

Starting from 1, generate the sequence of records in the group in order. For example, arrange them in pv descending order to generate the pv number of each day in the group. There are many application scenarios for row ﹐ number(), for example, get the first sorted record in the group, get the first refer in a session, etc.

2,RANK() :

Generate the ranking of data items in the group. If the ranking is equal, the vacancy will be left in the ranking

3,DENSE_RANK() :

Generate the ranking of data items in the group. If the ranking is equal, no vacancy will be left in the ranking

4,CUME_DIST :

Rows less than or equal to the current value / total rows within the group. For example, count the number of people less than or equal to the current salary as a percentage of the total number of people

5,PERCENT_RANK :

RANK value of the current line in the group / total number of lines in the group

6,NTILE(n) :

It is used to cut the grouping data into n slices in order and return the current slice value. If the slices are not uniform, the distribution of the first slice will be increased by default. Ntle does not support ROWS BETWEEN, such as ntle (2) over (partition by cookie order by createtime ROWS BETWEEN 3 warning and current row).

3. Requirement description

The format of existing data content is as follows, which corresponds to three fields, cookie ID, createtime, pv. To obtain the data records of the top three of each cookie accessing pv is to find the top n in groups and the first three values in each group

```cookie1,2015-04-10,1
Step 1: create database tables

Create database tables in hive

```CREATE EXTERNAL TABLE cookie_pv (
createtime string,
pv INT
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' ;```
Step 2: prepare the data and load it

node03 execute the following command to create data and load it into the hive table

```cd /kkb/install/hivedatas

`load  data  local inpath '/kkb/install/hivedatas/cookiepv.txt'  overwrite into table  cookie_pv `
Step 3: use the analysis function to get the first three records of each cookie accessing PV
```SELECT
createtime,
pv,
RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn1,
DENSE_RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn2,
ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY pv DESC) AS rn3
WHERE rn1 <=  3 ;```

2. Hive custom function

2.1 basic introduction of user defined function

1) Hive comes with some functions, such as max/min, but the number is limited. You can easily extend it by customizing UDF.

2) When the built-in functions provided by Hive cannot meet your business processing needs, you can consider using user-defined function s.

3) According to the user-defined function categories, there are three types:

​ (1)UDF(User-Defined-Function)

One in one out

​ (2)UDAF(User-Defined Aggregation Function)

Aggregate function, multiple in and one out

Similar to: count/max/min

​ (3)UDTF(User-Defined Table-Generating Functions)

One in and more out

For example, later view expand ()

https://cwiki.apache.org/confluence/display/Hive/HivePlugins

5) Programming steps:

6) Precautions

(1) UDF must have a return type, which can return null, but the return type cannot be void;

(2) Text/LongWritable and other types are commonly used in UDF, and java type is not recommended;

2.2 user defined function development

1. Basic introduction of custom function

1) Hive comes with some functions, such as max/min, but the number is limited. You can easily extend it by customizing UDF.

2) When the built-in functions provided by Hive cannot meet your business processing needs, you can consider using user-defined function s.

3) According to the user-defined function categories, there are three types:

​ (1)UDF(User-Defined-Function)

One in one out

​ (2)UDAF(User-Defined Aggregation Function)

Aggregate function, multiple in and one out

Similar to: count/max/min

​ (3)UDTF(User-Defined Table-Generating Functions)

One in and more out

For example, later view expand ()

https://cwiki.apache.org/confluence/display/Hive/HivePlugins

5) Programming steps:

6) Precautions

(1) UDF must have a return type, which can return null, but the return type cannot be void;

(2) Text/LongWritable and other types are commonly used in UDF, and java type is not recommended;

2. Custom function development
Step 1: create maven java project and import jar package
```<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<version>2.6.0-cdh5.14.2</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.1.0-cdh5.14.2</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.0</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<version>2.2</version>
<executions>
<execution>
<phase>package</phase>
<goals>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*/RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>```
Step 2: develop java classes to inherit UDF and overload the evaluate method
```public class MyUDF extends UDF {
public Text evaluate(final Text s) {
if (null == s) {
return null;
}
return new Text(s.toString().toUpperCase());
}
}```
Step 3: package our project and upload it to hive's lib directory

Use maven's package to package and upload our packed jar package to node03 server under the path of / kkb/install/hive-1.1.0-cdh5.14.2/lib

Step 4: add our jar package

Rename our jar package name

```cd /kkb/install/hive-1.1.0-cdh5.14.2/lib
mv original-day_hive_udf-1.0-SNAPSHOT.jar udf.jar```

hive's client adds our jar package

`0: jdbc:hive2://node03:10000> add jar /kkb/install/hive-1.1.0-cdh5.14.2/lib/udf.jar;`
Step 5: set function association with our custom function
`0: jdbc:hive2://node03:10000> create temporary function tolowercase as 'com.kkb.udf.MyUDF';`
Step 6: use custom functions
`0: jdbc:hive2://node03:10000>select tolowercase('abc');`

How to create permanent function in hive

To add a temporary function to hive, we need to add the following every time we enter the hive client. The temporary function will fail when exiting the hive client. Then we can also create a permanent function to keep it from failing

Create permanent function

```1,Specify the database, and create our function under the specified database
0: jdbc:hive2://node03:10000>use myhive;

3,See all of the jar package
0: jdbc:hive2://node03:10000>list  jars;

4,Create permanent functions to associate with our functions
0: jdbc:hive2://node03:10000>create  function myuppercase as 'com.kkb.udf.MyUDF';

5,View our permanent functions
0: jdbc:hive2://node03:10000>show functions like 'my*';

6,Use permanent functions
0: jdbc:hive2://node03:10000>select myhive.myuppercase('helloworld');

7,Delete permanent function
0: jdbc:hive2://node03:10000>drop function myhive.myuppercase;

8,View function
show functions like 'my*';```

Posted on Wed, 05 Feb 2020 04:14:30 -0800 by AndrewBacca