hiveSQL basic statement 2 -- common functions (timestamp, time interval, if, case)

View all functions
show functions;
View a function
desc function extended from_unixtime;

1, The timestamp is converted to the function from "unixtime" in the specified format

Format: from "unixtime (bigint unixtime, string format)

format

  • yyyy-MM-dd hh:mm:ss
  • Yyyy MM DD HH 12 hour system
  • Yyyy MM DD HH 24-hour system
  • yyyy-MM-dd hh:mm
  • yyyyMMdd
    Note that yyyy should be in lowercase, never in uppercase*****
select pay_time,from_unixtime(pay_time,'yyyy-MM-dd hh-mm-ss')
from user_trade
where dt = '2019-04-09' limit 1;

2, Date is converted to the time stamp function UNIX? Timestamp

3, Time interval function datediff

datediff(string enddate,string startdate)

The to date function changes the format to MM DD YY

select user_name,datediff('2019-05-01',to_date(firstactivetime))
from user_info limit 1;

4, Date increase / decrease

date_add(string startdate,int days)
date_sub(string startdate,int days)

5, Conditional function case when

case when ... then ...
Example: number of users under 20 years old and over 20 years old

select case when age<20 then '20 Under age'
else '20 Above' end as age_type,
count(distinct user_id) as user_num
from user_info
group by case when age<20 then '20 Under age'
else '20 Above' end;

6, Judge function if

if(A>0,B,C)
If the condition is satisfied, the result is B; if not, the result is C
E.g. number of people with grades greater than 5 and less than 5

select if(level>5,'yes','no') as level,
count(distinct user_id) as user_num
from user_info
group by if(level>5,'yes','no');

7, String function - specifies the target length string substr()

substr(string A,int start,int len)
If the length is not specified, it will be cut to the last;
Not necessarily from 1;
Closed interval

Scenario: number of new users per month

select substr(firstactivetime,1,7),
count(distinct user_id) as user_num
from user_info
group by substr(firstactivetime,1,7)
limit 4;

8, How to get the value in the key value pair - function get JSON object (a, '$. B')

Very important
Example: number of users of different mobile phone brands

Method 1: get JSON object

1. View data types

desc user_info;


extra1 is of type string
extra2 is a map type

2. View the specific data format of extra1

select extra1 from user_info limit 1;

Results
extra1
{"systemtype": "android", "education": "doctor", "marriage_status": "1", "phonebrand": "VIVO"}
3. Get JSON object (a, '$. B')
A is the column name, B is the target

select get_json_object(extra1,'$.phonebrand') as phone_brand,
count(distinct user_id) as user_num
from user_info
group by get_json_object(extra1,'$.phonebrand') limit 2;

Method 2: map type, take A ['B ']

1. View the specific data format of extra2

select extra2 from user_info limit 1;

Results
extra2
{"systemtype":"android","education":"doctor","marriage_status":"1","phonebrand":"VIVO"}
2. A ['b '], a is the column name, B is the target value

select extra2['phonebrand'] as phone_brand,
count(distinct user_id) as user_num
from user_info
group by extra2['phonebrand'] limit 2;

Practice
First look at the structure:
hive (kaikeba)> select * from user_info limit 1;
OK
user_info.user_id user_info.user_name user_info.sex user_info.age user_info.city user_info.firstactivetime user_info.level user_info.extra1 user_info.extra2
10001 Abby female 38 hangzhou 2018-04-13 01:06:07 2 {"systemtype": "android", "education": "doctor", "marriage_status": "1", "phonebrand": "VIVO"} {"systemtype":"android","education":"doctor","marriage_status":"1","phonebrand":"VIVO"}

Task 1: distribution of men and women whose activation days are more than 300 days ago (user_info)

select sex,count(distinct user_name)
from user_info 
where datediff('2020-02-02',to_date(firstactivetime)) >300
group by sex;

Assignment 2: distribution of different genders and educational levels (user_info)

select sex,get_json_object(extra1,'$.education') as user_edu,count(user_name) as user_num
from user_info
group by sex,get_json_object(extra1,'$.education');

Assignment 3: from January 1, 2019 to April 30, 2019, different types of purchase amount distribution in each period (user trade)
I think it's complicated. I want to divide the time according to the morning, afternoon and evening. Group time periods and categories together when grouping
Remember that if you have groups and want to query other results, you must aggregate them
In addition, the time format is HH, 24-hour system

select substr(from_unixtime(pay_time,'yyyy-MM-dd HH'),12) as user_time,goods_category,sum(pay_amount)
from user_trade
where dt between '2019-01-01' and '2019-04-30'
group by substr(from_unixtime(pay_time,'yyyy-MM-dd HH'),12),goods_category;


In addition, I want to divide the time period according to the morning, afternoon, evening and early morning. Some of the results are not displayed correctly. How can I optimize it later?

select case when substr(from_unixtime(pay_time,'yyyy-MM-dd HH'),12) >=18 and substr(from_unixtime(pay_time,'yyyy-MM-dd HH'),12) <=23 then 'Night'
when substr(from_unixtime(pay_time,'yyyy-MM-dd HH'),12) >=0 and substr(from_unixtime(pay_time,'yyyy-MM-dd HH'),12) <=5 then 'Before dawn'
when substr(from_unixtime(pay_time,'yyyy-MM-dd HH'),12) >=6 and substr(from_unixtime(pay_time,'yyyy-MM-dd HH'),12) <=11 then 'morning'
else  'Afternoon' end as user_time,
goods_category,sum(pay_amount)
from user_trade
where dt between '2019-01-01' and '2019-04-30'
group by case when substr(from_unixtime(pay_time,'yyyy-MM-dd HH'),12) >=18 and substr(from_unixtime(pay_time,'yyyy-MM-dd HH'),12) <=23 then 'Night'
when substr(from_unixtime(pay_time,'yyyy-MM-dd HH'),12) >=0 and substr(from_unixtime(pay_time,'yyyy-MM-dd HH'),12) <=5 then 'Before dawn'
when substr(from_unixtime(pay_time,'yyyy-MM-dd HH'),12) >=6 and substr(from_unixtime(pay_time,'yyyy-MM-dd HH'),12) <=11 then 'morning'
else  'Afternoon' end,
goods_category;


Published 35 original articles, won praise 0, visited 390
Private letter follow

Tags: Android JSON Unix less

Posted on Wed, 05 Feb 2020 23:19:24 -0800 by spaggle