Row-column Conversion in Hive

Row-column Conversion in Hive

Row to column

  • Multi-row to multi-column

    Data sheet row2col
    
    col1   col2    col3
    a      c       1
    a      d       2
    a      e       3  
    b      c       4
    b      d       5
    b      e       6
    
    //Now we need to translate it into:
    col1   c      d      e
    a      1      2      3
    b      4      5      6
    
    //At this point, you need to use max(case... when... then... Other 0 end, which is limited to the case where the converted field is of numerical type and positive value
    
    Create table:
    create table row2col(col1 string,col2 string,col3 int)
    row format delimited
    fields terminated by ',';
    
    //Load data:
    load data local inpath '/root/hivedata/row2col.txt' into table row2col;
    a,c,1
    a,d,2
    a,e,3
    b,c,4
    b,d,5
    b,e,6
    
    select col1,
    max(case col2 when 'c' then col3 else 0 end) as c,
    max(case col2 when 'd' then col3 else 0 end) as d,
    max(case col2 when 'e' then col3 else 0 end) as e
    from row2col
    group by col1;
    
  • Multi-row to single column (important)

    Data table row2col_1:
    col1    col2    col3
    a       b       1
    a       b       2
    a       b       3
    c       d       4
    c       d       5
    c       d       6
    
    Convert it into:
    col1    col2    col3
    a       b       1,2,3
    c       d       4,5,6
    
    Two built-in functions are required:
    a) concat_ws (parameter 1, parameter 2) for character stitching 
    	Parameter 1 - Specifies a delimiter 
    	PARAMETER 2 - CONTENTS OF SPLITTING 
    b) collect_set(col3), whose main function is to de-aggregate the values of a field and generate array type fields.
       If you don't want to reuse collect_list()
    
    Create table:
    create table row2col_1(col1 string,col2 string,col3 int)
    row format delimited
    fields terminated by ',';
    
    //Load data:
    load data local inpath '/root/hivedata/row2col_1.txt' into table row2col_1;
    a,b,1
    a,b,2
    a,b,3
    c,d,4
    c,d,5
    c,d,6
    
    select col1, col2, concat_ws('|', collect_set(cast(col3 as string))) as col3
    from row2col_1
    group by col1, col2;
    

Column switching

  • Multi-column to multi-row

    Data sheet col2row: 
    col1   c      d      e
    a      1      2      3
    b      4      5      6
    
    //It is now to be translated into:
    col1   col2    col3
    a      c       1
    a      d       2
    a      e       3
    b      c       4
    b      d       5
    b      e       6
    
    //There is a need for union splicing.
    union You can combine multiple select Statement returns a common result set
    //Ensure that the number of data types returned by each select statement is consistent.
    
    Create table:
    create table col2row(col1 string,c int,d int,e int)
    row format delimited
    fields terminated by ',';
    
    //Load data:
    load data local inpath '/root/hivedata/col2row.txt' into table col2row;
    a,1,2,3
    b,4,5,6
    
    select col1, 'c' as col2, c as col3 from col2row
    UNION
    select col1, 'd' as col2, d as col3 from col2row
    UNION
    select col1, 'e' as col2, e as col3 from col2row
    order by col1, col2;
    
  • Single column to multi-row (important)

    Data sheet col2row_2: 
    col1    col2    col3
    a       b       1,2,3
    c       d       4,5,6
    
    //It is now to be translated into:
    col1    col2    col3
    a       b       1
    a       b       2
    a       b       3
    c       d       4
    c       d       5
    c       d       6
    
    //UDTF (table generating function) explode() is needed here, which accepts parameters of array type, and its effect is exactly the opposite of collect_set, which implements row-to-column transformation of array type data. Explode cooperates with lateral view to split a column of data into multiple rows.
    
    Create table:
    create table col2row_2(col1 string,col2 string,col3 Array<string>)
    row format delimited
    fields terminated by '\t'
    collection items terminated by ',';
    
    create table col2row_2(col1 string,col2 string,col3 string)
    row format delimited
    fields terminated by '\t';
    
    

Load data:
load data local inpath '/root/hivedata/col2row_2.txt' into table col2row_2;
a b 1,2,3
c d 4,5,6

  

select col1, col2, lv.col3 as col3
from col2row_2
lateral view explode(split(col3, ',')) lv as col3;

Tags: hive

Posted on Mon, 07 Oct 2019 01:40:00 -0700 by Russia