impala, hive, phoenix, hbase mapping test

**Requirements: you want to maintain two wide tables on phoenix, one for ad hoc query, one for one day's data and one for history.
The characteristic of wide table is that it is composed of multiple tables, but the arrival time of each table is inconsistent. Some tables arrive first, and some tables may arrive every other day.
What you want to achieve: the wide table used for ad hoc query is to load the data of one table as soon as one table comes, and then load the data that does not come. When there is a query in the intermediate process, the query results are: updated fields (table fields that have arrived) and non updated fields (table fields that have not arrived)
Requirements: the data in the wide table of ad hoc query should be available not only in impala and hive, but also in phoenix and habse.
**
Current scheme:
Suppose: the wide table consists of two tables: table1 and table2
1) Each small table (each table constituting the wide table) is loaded into hive through scheduling
ods.table1(partition='202108010'),ods.table2(partition='202108010')
2) Create a mapping table between each small table and phoenix in hive (the table structure is the same as that in step 1)
Purpose: in order to operate the table data in hive in phoenix, phoenix should be used instead of hive when inserting data in the wide table, because if hive only inserts some fields, other fields will be null, and the requirement is to update the table fields and keep the original appearance.

   hive>: 
        drop table if exists  ods.table1_phoenix;
        create table ods.table1_phoenix(
          `id`        string    comment 'user id',
          `name`      string    comment 'Age',
          `etl_date`  timestamp comment 'ETL execution time'  
       ) comment 'User table 1'
       STORED BY 'org.apache.phoenix.hive.PhoenixStorageHandler'
	   TBLPROPERTIES (
	  "phoenix.table.name" = "test.phoenix_table1",
	  "phoenix.zookeeper.quorum" = "hdp1,hdp2,hdp3",
	  "phoenix.zookeeper.znode.parent" = "/hbase",
	  "phoenix.zookeeper.client.port" = "2181",
	  "phoenix.rowkeys" = "id, name",
	  "phoenix.column.mapping" = "id:ID, name:NAME, etl_date:ETL_DATE"
	);
   --take ods.table1 Data import for ods.table1_phoenix
   --At this time, phoenix surface test.phoenix_table1 There will be data in the 
   inser into table  ods.table1_phoenix 
   select id,name,etl_date from ods.table1 where partiton='202108010';

Note: table2 also performs the same steps

3) Creating immediate query wide tables using phoenix

 phoenix: >
 create table test.user_hbase_wide1(
  id varchar primary key,
  "a".name varchar,
  "a".sex  varchar,
  "a".age  varchar,
  "a".etl_date varchar
)COLUMN_ENCODE_BYTES=0;
--At this point, phoenix This wide table can be found in
--take test.phoenix_table1 Insert data from into wide table
upsert into test.user_hbase_wide1(id,name,etl_date)
select id,name,etl_date from test.phoenix_table1;

Note: 1: "a" refers to the name of the specified hbase column family. Generally, there is one column family in a table, preferably no more than 2 (recommended on the official website). It should not be too many.
2: COLUMN_ENCODE_BYTES=0 should be added. If not, the name of column will be stored in hexadecimal in hbase. The performance of query is the so-called garbled code.
3: The field types are all varchar types, because apart from the string type, other data types of hbase will be automatically converted to hexadecimal. When hive and impala query, the value will become null.

4) In hive, create test user_ hbase_ Wide1 wide table mapping
Purpose: this wide table can support hive and impala queries

hive:>
   drop table if exists ods.user_hbase_wide1;
   create external table ods.user_hbase_wide1(
    id       string,
    name     string,
    sex      string,
    age      int,
    etl_date string
   )
   STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'    
   WITH SERDEPROPERTIES (
   "hbase.columns.mapping" = ":key,a:NAME,a:SEX,a:AGE,a:ETL_DATE")    
   TBLPROPERTIES("hbase.table.name" = "TEST.USER_HBASE_WIDE1");
  --At this time, you go hive and impala When querying this table, the data in it is the same as hbase and phoenix Agreed
  --be careful:impala Refresh before querying, otherwise the table cannot be found
   shell: >
    impala-shell -i hdp1(host alias ) -q "
     invalidate metadata ods.user_hbase_wide1;
     REFRESH ods.user_hbase_wide1;
    "

Note: 1: the field types here can be set according to the actual situation. 2: hbase.columns.mapping" =
": key,a:NAME,a:SEX,a:AGE,a:ETL_DATE" the column family name here should be consistent with that of phoenix when creating the wide table.
3: "hbase.table.name" =
"TEST.USER_HBASE_WIDE1" corresponds to the wide table name in HBase. It must be capitalized, because phoenix is not case sensitive, but HBase is case sensitive and defaults to uppercase. If you want to add "" in lowercase, test.exe created in phoenix user_ hbase_ Wide1, in HBase, is actually test USER_ HBASE_ WIDE1

So far, the mapping of hbase, phoenix, impala and hive has been completed. Each time hive comes over, phoenix can be used to insert data into the wide table without making other fields null. Hive and impala also dynamically update the wide table data.

Keywords: Big Data HBase hive impala

Added by crashmaster on Sat, 25 Dec 2021 03:58:04 +0200