Hive later view and expand

explode(Official website link)

Expand is a UDTF (table generation function) that converts a single input row to multiple output rows. Generally, it is used in combination with general view, mainly in two ways:

Input type

Usage method

describe

T

explode(ARRAY<T> a)

Decompose the array into multiple rows, return a single column and multiple rows, each row represents an element of the array

Tkey,Tvalue

explode(MAP<Tkey,Tvalue> m)

Decompose the MAP into multiple rows. The returned rows have two columns (key value). Each row represents a key value pair in the input

Example:

explode(array)

hive (default)> select explode(array('A','B','C'));
OK
col
A
B
C
Time taken: 0.402 seconds, Fetched: 3 row(s)
hive (default)> select explode(array('A','B','C')) as col1;
OK
col1
A
B
C
Time taken: 0.145 seconds, Fetched: 3 row(s)
hive (default)> select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf;
OK
tf.col
A
B
C
Time taken: 0.191 seconds, Fetched: 3 row(s)
hive (default)> select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf as col1;
OK
tf.col1
A
B
C

explode(map)

hive (default)> select explode(map('A',10,'B',20,'C',30));
OK
key    value
A    10
B    20
C    30
Time taken: 0.153 seconds, Fetched: 3 row(s)
hive (default)> select explode(map('A',10,'B',20,'C',30)) as (my_key,my_value);
OK
my_key    my_value
A    10
B    20
C    30
Time taken: 0.137 seconds, Fetched: 3 row(s)
hive (default)> select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf;
OK
tf.key    tf.value
A    10
B    20
C    30
Time taken: 0.128 seconds, Fetched: 3 row(s)
hive (default)> select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf as my_key,my_value;
OK
tf.my_key    tf.my_value
A    10
B    20
C    30
Time taken: 0.109 seconds, Fetched: 3 row(s)

LateralView(Official website link)

grammar

lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)*

fromClause: FROM baseTable (lateralView)*

describe

lateralview is used in conjunction with a user-defined table generation function (UDTF), such as expand, which generates zero or more output rows for each input row. lateralview first applies UDTF to each row of the underlying table, and then joins the resulting output row with the input row to form a virtual table with the provided table alias.

Example

The base table, pageads, has two columns: pageid (page name) and add list (an array of ads displayed on the page).

hive (test)> !cat pageads
           > ;
front_page    1,2,3
contact_page    3,4,5
hive (test)> create table pageads(
      > pageid string,
      > add_list array<int>)
      > ROW FORMAT delimited
      > fields terminated by '\t'
      > collection items terminated by ','
      > lines terminated by '\n'
     > ;
OK
Time taken: 0.099 seconds
hive (test)> load data local inpath '/home/hadoop/pageads' into table pageads;
Loading data to table test.pageads
OK
Time taken: 0.331 seconds
hive (test)> select * from pageads;
OK
pageads.pageid    pageads.add_list
front_page    [1,2,3]
contact_page    [3,4,5]
Time taken: 0.106 seconds, Fetched: 2 row(s)
hive (test)> select pageid,addid from pageads lateral view explode(add_list) adTable as addid;
OK
pageid    addid
front_page    1
front_page    2
front_page    3
contact_page    3
contact_page    4
contact_page    5
Time taken: 0.105 seconds, Fetched: 6 row(s)
hive (test)> select addid,count(1) from pageads lateral view explode(add_list) adTable as addid group by addid;
.......
OK
addid    _c1
1    1
2    1
3    2
4    1
5    1

Multiple lateralview s

The from clause can have multiple lateralview clauses. Subsequent lateralview clauses can refer to any column in the table to the left of lateralview. For example, the following query:

SELECT * FROM exampleTable

LATERAL VIEW explode(col1) myTable1 AS myCol1

LATERAL VIEW explode(myCol1) myTable2 AS myCol2;

Note that the later view clause is applied in the order in which it appears.

Example

hive (test)> !cat basetable;
1,2    a,b,c
3,4    d,e,f
hive (test)> create table basetable(
           > col1 array<int>,
           > col2 array<string>)
           > ROW FORMAT delimited
           > fields terminated by '\t'
           > collection items terminated by ','
           > lines terminated by '\n'
           > ;
OK
Time taken: 0.113 seconds
hive (test)> load data local inpath '/home/hadoop/basetable' into table basetable;
Loading data to table test.basetable
OK
Time taken: 0.329 seconds
hive (test)> select * from basetable;
OK
basetable.col1    basetable.col2
[1,2]    ["a","b","c"]
[3,4]    ["d","e","f"]
Time taken: 0.104 seconds, Fetched: 2 row(s)
hive (test)> SELECT myCol1, col2 FROM basetable
           > LATERAL VIEW explode(col1) myTable1 AS myCol1;
OK
mycol1    col2
1    ["a","b","c"]
2    ["a","b","c"]
3    ["d","e","f"]
4    ["d","e","f"]
Time taken: 0.089 seconds, Fetched: 4 row(s)
hive (test)> SELECT myCol1, myCol2 FROM baseTable
           > LATERAL VIEW explode(col1) myTable1 AS myCol1
           > LATERAL VIEW explode(col2) myTable2 AS myCol2;
OK
mycol1    mycol2
1    a
1    b
1    c
2    a
2    b
2    c
3    d
3    e
3    f
4    d
4    e
4    f
Time taken: 0.093 seconds, Fetched: 12 row(s)
Outer Lateral Views
hive (test)> SELECT * FROM basetable LATERAL VIEW explode(array()) C AS a limit 10;
OK
basetable.col1    basetable.col2    c.a
Time taken: 0.063 seconds
hive (test)> SELECT * FROM basetable LATERAL VIEW OUTER explode(array()) C AS a limit 10;
OK
basetable.col1    basetable.col2    c.a
[1,2]    ["a","b","c"]    NULL
[3,4]    ["d","e","f"]    NULL
Time taken: 0.092 seconds, Fetched: 2 row(s)


Keywords: Big Data hive Hadoop

Added by leony on Sun, 08 Dec 2019 13:56:09 +0200