Introduction to big data development foundation and project practice Hadoop core and ecosystem technology stack 6.Impala interactive query

preface

The core development language of impala is SQL statement. Impala has shell command line window, JDBC and other methods to receive SQL statement execution. For complex type analysis, you can use C + + or Java to write UDF functions.

Impala's SQL syntax is highly integrated with Apache Hive's HQL syntax. Impala supports Hive's supported data types and some Hive's built-in functions.

Note:

Impala is similar to Hive. Their focus is on query, so it is better not to use this tool for Update operations such as Update and Delete. For data deletion operations, you can use Drop Table and Alter Table Drop Partition. You can try to use Insert overwrite for Update;
The usual way to use Impala is to store data files in the Hdfs file system, and query and manage data files on Hdfs with the help of Impala's table definition;
Impala is mostly the same as Hive. For example, impala also supports internal and external tables and partitions. You can refer to Hive for reference.

1.Impala-Shell

Impala shell commands are divided into external commands and internal commands.

Impala shell external commands refer to command parameters that can be executed without entering the Impala shell interactive command line. Impala shell can be executed with many options, which can be set when starting impala shell to modify the command execution environment.
Impala shell – h you can view the help manual.

Several common options are as follows:

option	meaning
impala-shell –r	Refreshing impala metadata has the same effect as executing REFRESH statement after establishing connection (when metadata changes)
Impala shell – f file path	Executes the specified SQL query file
impala-shell –i	Specify the host to which the impalad daemon is running; The default port is 21000, which can be connected to any host running impalad in the cluster
impala-shell –o	Save the execution results to a file

Use the Impala shell – r command as follows:

[root@node03 ~]$ impala-shell -r
Starting Impala Shell without Kerberos authentication
Connected to node03:21000
Server version: impalad version 2.5.0-cdh5.7.6 RELEASE (build ecbba4f4e6d5eec6c33c1e02412621b8b9c71b6a)
Invalidating Metadata
***********************************************************************************
Welcome to the Impala shell. Copyright (c) 2015 Cloudera, Inc. All rights reserved.
(Impala Shell v2.5.0-cdh5.7.6 (ecbba4f) built on Tue Feb 21 14:54:50 PST 2017)

The '-B' command line flag turns off pretty-printing for query results. Use this
flag to remove formatting from results you want to save for later, or to benchmark
Impala.
***********************************************************************************
Query: invalidate metadata

Fetched 0 row(s) in 6.11s
[node03:21000] > exit;
Goodbye root

You can see that after executing the Impala shell – r command, you enter the Impala command line while refreshing the metadata.

Then use impala shell – f to create the SQL file vim impala_data/test.sql, the content is:

show databases;

Then proceed as follows:

[root@node03 ~]$ impala-shell -f impala_data/test.sql 
Starting Impala Shell without Kerberos authentication
Connected to node03:21000
Server version: impalad version 2.5.0-cdh5.7.6 RELEASE (build ecbba4f4e6d5eec6c33c1e02412621b8b9c71b6a)
Query: show databases
+------------------+----------------------------------------------+
| name             | comment                                      |
+------------------+----------------------------------------------+
| _impala_builtins | System database for Impala builtin functions |
| default          | Default Hive database                        |
| homework         |                                              |
| mydb             |                                              |
| mydb2            |                                              |
| sale             |                                              |
| tuning           |                                              |
+------------------+----------------------------------------------+
Fetched 7 row(s) in 0.02s

You can see that all databases have been queried;

Among them_ impala_builtins is Impala's own system database.

You can view the following:

[node03:21000] > show functions;
Query: show functions

Fetched 0 row(s) in 0.02s
[node03:21000] > use _impala_builtins;
Query: use _impala_builtins
[node03:21000] > show functions;
Query: show functions
+--------------+-------------------------------------------------+-------------+---------------+
| return type  | signature                                       | binary type | is persistent |
+--------------+-------------------------------------------------+-------------+---------------+
| BIGINT       | abs(BIGINT)                                     | BUILTIN     | true          |
| DECIMAL(*,*) | abs(DECIMAL(*,*))                               | BUILTIN     | true          |
| DOUBLE       | abs(DOUBLE)                                     | BUILTIN     | true          |
...
| INT          | zeroifnull(INT)                                 | BUILTIN     | true          |
| SMALLINT     | zeroifnull(SMALLINT)                            | BUILTIN     | true          |
| TINYINT      | zeroifnull(TINYINT)                             | BUILTIN     | true          |
+--------------+-------------------------------------------------+-------------+---------------+
Fetched 592 row(s) in 0.16s

As you can see, in the system database_ Impala_ All functions of Impala can be viewed in builtins, but not in other databases.

Then use impala shell – i, as follows:

[root@node03 ~]$ impala-shell -i node02
Starting Impala Shell without Kerberos authentication
Connected to node02:21000
Server version: impalad version 2.5.0-cdh5.7.6 RELEASE (build ecbba4f4e6d5eec6c33c1e02412621b8b9c71b6a)
***********************************************************************************
Welcome to the Impala shell. Copyright (c) 2015 Cloudera, Inc. All rights reserved.
(Impala Shell v2.5.0-cdh5.7.6 (ecbba4f) built on Tue Feb 21 14:54:50 PST 2017)

You can change the Impala daemon that you're connected to by using the CONNECT
command.To see how Impala will plan to run your query without actually executing
it, use the EXPLAIN command. You can change the level of detail in the EXPLAIN
output by setting the EXPLAIN_LEVEL query option.
***********************************************************************************
[node02:21000] > exit;
Goodbye root

You can see that impala server connected to other nodes;

If this option is not specified, the native impala server is used by default.

Internal commands refer to the syntax that can be executed after entering the Impala shell command line.

Common internal commands are as follows:

command	meaning
help	View common commands and help documents
connect hostname	Connect to the specified machine impalad for execution
refresh dbname.tablename	Incremental refresh, which refreshes the metadata of a table, is mainly used to refresh the data changes in the data table in Hive
invalidate metadata	Full refresh, which consumes a lot of performance, is mainly used to refresh when creating a new database or database table in Hive
quit/exit	Exit from impala shell
explain	Used to view the execution plan of SQL statements; Explain you can set explain_ Level, used to set the level of output content, including 0, 1, 2, 3 and other values, of which level 3 is the highest, which can print the most complete information
profile	After executing the SQL statement, you can print out more detailed execution steps, which are mainly used for viewing query results, cluster optimization, etc

Use the following:

[node03:21000] > help;

Documented commands (type help <topic>):
========================================
compute   exit     history  quit    shell  unset   version
connect   explain  insert   select  show   use     with   
describe  help     profile  set     tip    values

Undocumented commands:
======================
alter  create  desc  drop  load  summary

[node03:21000] > help compute;
Executes a COMPUTE STATS query.
    Impala shell cannot get child query handle so it cannot
    query live progress for COMPUTE STATS query. Disable live
    progress/summary callback for COMPUTE STATS query.
[node03:21000] > help alter;
*** No help on alter
[node03:21000] > refresh mydb.t1;
Query: refresh mydb.t1

Fetched 0 row(s) in 0.98s
[node03:21000] > connect node01;
Connected to node01:21000
Server version: impalad version 2.5.0-cdh5.7.6 RELEASE (build ecbba4f4e6d5eec6c33c1e02412621b8b9c71b6a)
[node01:21000] > select * from t1;
Query: select * from t1
+--------------------+------+-----+--------+
| id                 | name | age | gender |
+--------------------+------+-----+--------+
| 392456197008193000 | Zhang San | 20  | 0      |
| 267456198006210000 | Li Si | 25  | 1      |
| 892456199007203000 | Wang Wu | 24  | 1      |
| 492456198712198000 | Zhao Liu | 26  | 2      |
| 392456197008193000 | Zhang San | 20  | 0      |
| 392456197008193000 | Zhang San | 20  | 0      |
+--------------------+------+-----+--------+
Fetched 6 row(s) in 6.99s
[node01:21000] > explain select * from t1;
Query: explain select * from t1
+------------------------------------------------------------------------------------+
| Explain String                                                                     |
+------------------------------------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=32.00MB VCores=1                           |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| default.t1                                                                         |
|                                                                                    |
| 01:EXCHANGE [UNPARTITIONED]                                                        |
| |                                                                                  |
| 00:SCAN HDFS [default.t1]                                                          |
|    partitions=1/1 files=1 size=186B                                                |
+------------------------------------------------------------------------------------+
Fetched 8 row(s) in 0.06s
[node01:21000] > set explain_level = 3;
EXPLAIN_LEVEL set to 3
[node01:21000] > explain select * from t1;
Query: explain select * from t1
+------------------------------------------------------------------------------------+
| Explain String                                                                     |
+------------------------------------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=32.00MB VCores=1                           |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| default.t1                                                                         |
|                                                                                    |
| F01:PLAN FRAGMENT [UNPARTITIONED]                                                  |
|   01:EXCHANGE [UNPARTITIONED]                                                      |
|      hosts=1 per-host-mem=unavailable                                              |
|      tuple-ids=0 row-size=38B cardinality=unavailable                              |
|                                                                                    |
| F00:PLAN FRAGMENT [RANDOM]                                                         |
|   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, UNPARTITIONED]                       |
|   00:SCAN HDFS [default.t1, RANDOM]                                                |
|      partitions=1/1 files=1 size=186B                                              |
|      table stats: unavailable                                                      |
|      column stats: unavailable                                                     |
|      hosts=1 per-host-mem=32.00MB                                                  |
|      tuple-ids=0 row-size=38B cardinality=unavailable                              |
+------------------------------------------------------------------------------------+
Fetched 17 row(s) in 0.03s
[node01:21000] > select * from t1;
Query: select * from t1
+--------------------+------+-----+--------+
| id                 | name | age | gender |
+--------------------+------+-----+--------+
| 392456197008193000 | Zhang San | 20  | 0      |
| 267456198006210000 | Li Si | 25  | 1      |
| 892456199007203000 | Wang Wu | 24  | 1      |
| 492456198712198000 | Zhao Liu | 26  | 2      |
| 392456197008193000 | Zhang San | 20  | 0      |
| 392456197008193000 | Zhang San | 20  | 0      |
+--------------------+------+-----+--------+
Fetched 6 row(s) in 0.43s
[node01:21000] > profile;
Query Runtime Profile:
Query (id=944cd4a45137b96a:b8ab935b8d60adab):
  Summary:
    Session ID: e44a9c110e12d2cb:24ff32547ca3fc99
    Session Type: BEESWAX
    Start Time: 2021-10-11 11:20:45.844713000
    End Time: 2021-10-11 11:20:46.276056000
    Query Type: QUERY
    Query State: FINISHED
    Query Status: OK
    Impala Version: impalad version 2.5.0-cdh5.7.6 RELEASE (build ecbba4f4e6d5eec6c33c1e02412621b8b9c71b6a)
    User: root
    Connected User: root
    Delegated User: 
    Network Address: ::ffff:192.168.31.157:59392
    Default Db: default
    Sql Statement: select * from t1
    Coordinator: node01:22000
    Query Options (non default): EXPLAIN_LEVEL=3
    Plan: 
----------------
Estimated Per-Host Requirements: Memory=32.00MB VCores=1
WARNING: The following tables are missing relevant table and/or column statistics.
default.t1

01:EXCHANGE [UNPARTITIONED]
|  hosts=1 per-host-mem=unavailable
|  tuple-ids=0 row-size=38B cardinality=unavailable
|
00:SCAN HDFS [default.t1, RANDOM]
   partitions=1/1 files=1 size=186B
   table stats: unavailable
   column stats: unavailable
   hosts=1 per-host-mem=32.00MB
   tuple-ids=0 row-size=38B cardinality=unavailable
----------------
    Estimated Per-Host Mem: 33554432
    Estimated Per-Host VCores: 1
    Tables Missing Stats: default.t1
    Request Pool: default-pool
    ExecSummary: 
Operator       #Hosts   Avg Time   Max Time  #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail        
-------------------------------------------------------------------------------------------------------
01:EXCHANGE         1    8.275us    8.275us      6          -1         0        -1.00 B  UNPARTITIONED 
00:SCAN HDFS        1  158.588ms  158.588ms      6          -1  61.00 KB       32.00 MB  default.t1    
    Planner Timeline: 6.764ms
       - Analysis finished: 2.300ms (2.300ms)
       - Equivalence classes computed: 2.466ms (166.581us)
       - Single node plan created: 3.901ms (1.434ms)
       - Runtime filters computed: 3.917ms (16.218us)
       - Distributed plan created: 4.972ms (1.054ms)
       - Planning finished: 6.764ms (1.792ms)
    Query Timeline: 432.815ms
       - Start execution: 44.969us (44.969us)
       - Planning finished: 14.847ms (14.803ms)
       - Ready to start 1 remote fragments: 15.241ms (393.172us)
       - All 1 remote fragments started: 20.207ms (4.966ms)
       - Child queries finished: 20.823ms (615.552us)
       - Rows available: 344.381ms (323.558ms)
       - First row fetched: 428.666ms (84.284ms)
       - Unregister query: 431.338ms (2.672ms)
  ImpalaServer:
     - ClientFetchWaitTimer: 86.786ms
     - RowMaterializationTimer: 40.424us
  Execution Profile 944cd4a45137b96a:b8ab935b8d60adab:(Total: 329.230ms, non-child: 0.000ns, % non-child: 0.00%)
    Number of filters: 0
    Filter routing table: 
 ID  Src. Node  Tgt. Node  Targets  Type   Partition filter
-----------------------------------------------------------

    Fragment start latencies: Count: 1, 25th %-ile: 3ms, 50th %-ile: 3ms, 75th %-ile: 3ms, 90th %-ile: 3ms, 95th %-ile: 3ms, 99.9th %-ile: 3ms
    Per Node Peak Memory Usage: node03:22000(71.66 KB) node01:22000(0) 
     - FiltersReceived: 0 (0)
     - FinalizationTimer: 0.000ns
    Coordinator Fragment F01:(Total: 323.661ms, non-child: 107.505us, % non-child: 0.03%)
      MemoryUsage(500.000ms): 8.00 KB
       - AverageThreadTokens: 0.00 
       - BloomFilterBytes: 0
       - PeakMemoryUsage: 12.05 KB (12336)
       - PerHostPeakMemUsage: 0
       - PrepareTime: 29.640us
       - RowsProduced: 6 (6)
       - TotalCpuTime: 90.045ms
       - TotalNetworkReceiveTime: 323.548ms
       - TotalNetworkSendTime: 0.000ns
       - TotalStorageWaitTime: 0.000ns
      BlockMgr:
         - BlockWritesOutstanding: 0 (0)
         - BlocksCreated: 0 (0)
         - BlocksRecycled: 0 (0)
         - BufferedPins: 0 (0)
         - BytesWritten: 0
         - MaxBlockSize: 8.00 MB (8388608)
         - MemoryLimit: 1.14 GB (1221100928)
         - PeakMemoryUsage: 0
         - TotalBufferWaitTime: 0.000ns
         - TotalEncryptionTime: 0.000ns
         - TotalIntegrityCheckTime: 0.000ns
         - TotalReadBlockTime: 0.000ns
      EXCHANGE_NODE (id=1):(Total: 323.554ms, non-child: 8.275us, % non-child: 0.00%)
        BytesReceived(500.000ms): 0
         - BytesReceived: 264.00 B (264)
         - ConvertRowBatchTime: 1.697us
         - DeserializeRowBatchTimer: 7.709us
         - FirstBatchArrivalWaitTime: 323.545ms
         - PeakMemoryUsage: 0
         - RowsReturned: 6 (6)
         - RowsReturnedRate: 18.00 /sec
         - SendersBlockedTimer: 0.000ns
         - SendersBlockedTotalTimer(*): 0.000ns
    Averaged Fragment F00:(Total: 161.640ms, non-child: 0.000ns, % non-child: 0.00%)
      split sizes:  min: 186.00 B, max: 186.00 B, avg: 186.00 B, stddev: 0
      completion times: min:326.012ms  max:326.012ms  mean: 326.012ms  stddev:0.000ns
      execution rates: min:570.00 B/sec  max:570.00 B/sec  mean:570.00 B/sec  stddev:0.53 B/sec
      num instances: 1
       - AverageThreadTokens: 1.00 
       - BloomFilterBytes: 0
       - PeakMemoryUsage: 71.66 KB (73384)
       - PerHostPeakMemUsage: 71.66 KB (73384)
       - PrepareTime: 132.318ms
       - RowsProduced: 6 (6)
       - TotalCpuTime: 325.383ms
       - TotalNetworkReceiveTime: 0.000ns
       - TotalNetworkSendTime: 684.107us
       - TotalStorageWaitTime: 24.116ms
      BlockMgr:
         - BlockWritesOutstanding: 0 (0)
         - BlocksCreated: 0 (0)
         - BlocksRecycled: 0 (0)
         - BufferedPins: 0 (0)
         - BytesWritten: 0
         - MaxBlockSize: 8.00 MB (8388608)
         - MemoryLimit: 1.14 GB (1221100928)
         - PeakMemoryUsage: 0
         - TotalBufferWaitTime: 0.000ns
         - TotalEncryptionTime: 0.000ns
         - TotalIntegrityCheckTime: 0.000ns
         - TotalReadBlockTime: 0.000ns
      CodeGen:(Total: 290.903ms, non-child: 290.903ms, % non-child: 100.00%)
         - CodegenTime: 4.578ms
         - CompileTime: 13.251ms
         - LoadTime: 0.000ns
         - ModuleBitcodeSize: 1.87 MB (1959112)
         - OptimizationTime: 149.882ms
         - PrepareTime: 123.054ms
      DataStreamSender (dst_id=1):(Total: 165.873us, non-child: 165.873us, % non-child: 100.00%)
         - BytesSent: 264.00 B (264)
         - NetworkThroughput(*): 3.65 MB/sec
         - OverallThroughput: 1.52 MB/sec
         - PeakMemoryUsage: 2.66 KB (2728)
         - RowsReturned: 6 (6)
         - SerializeBatchTime: 40.210us
         - TransmitDataRPCTime: 68.991us
         - UncompressedRowBatchSize: 460.00 B (460)
      HDFS_SCAN_NODE (id=0):(Total: 158.588ms, non-child: 158.588ms, % non-child: 100.00%)
         - AverageHdfsReadThreadConcurrency: 0.00 
         - AverageScannerThreadConcurrency: 0.00 
         - BytesRead: 186.00 B (186)
         - BytesReadDataNodeCache: 0
         - BytesReadLocal: 186.00 B (186)
         - BytesReadRemoteUnexpected: 0
         - BytesReadShortCircuit: 186.00 B (186)
         - DecompressionTime: 0.000ns
         - MaxCompressedTextFileLength: 0
         - NumDisksAccessed: 1 (1)
         - NumScannerThreadsStarted: 1 (1)
         - PeakMemoryUsage: 61.00 KB (62464)
         - PerReadThreadRawHdfsThroughput: 3.53 MB/sec
         - RemoteScanRanges: 0 (0)
         - RowsRead: 6 (6)
         - RowsReturned: 6 (6)
         - RowsReturnedRate: 37.00 /sec
         - ScanRangesComplete: 1 (1)
         - ScannerThreadsInvoluntaryContextSwitches: 0 (0)
         - ScannerThreadsTotalWallClockTime: 24.753ms
           - DelimiterParseTime: 51.707us
           - MaterializeTupleTime(*): 903.000ns
           - ScannerThreadsSysTime: 351.000us
           - ScannerThreadsUserTime: 0.000ns
         - ScannerThreadsVoluntaryContextSwitches: 3 (3)
         - TotalRawHdfsReadTime(*): 50.213us
         - TotalReadThroughput: 0.00 /sec
    Fragment F00:
      Instance 944cd4a45137b96a:b8ab935b8d60adad (host=node03:22000):(Total: 161.640ms, non-child: 0.000ns, % non-child: 0.00%)
        Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:1/186.00 B 
        MemoryUsage(500.000ms): 0
        ThreadUsage(500.000ms): 1
         - AverageThreadTokens: 1.00 
         - BloomFilterBytes: 0
         - PeakMemoryUsage: 71.66 KB (73384)
         - PerHostPeakMemUsage: 71.66 KB (73384)
         - PrepareTime: 132.318ms
         - RowsProduced: 6 (6)
         - TotalCpuTime: 325.383ms
         - TotalNetworkReceiveTime: 0.000ns
         - TotalNetworkSendTime: 684.107us
         - TotalStorageWaitTime: 24.116ms
        BlockMgr:
           - BlockWritesOutstanding: 0 (0)
           - BlocksCreated: 0 (0)
           - BlocksRecycled: 0 (0)
           - BufferedPins: 0 (0)
           - BytesWritten: 0
           - MaxBlockSize: 8.00 MB (8388608)
           - MemoryLimit: 1.14 GB (1221100928)
           - PeakMemoryUsage: 0
           - TotalBufferWaitTime: 0.000ns
           - TotalEncryptionTime: 0.000ns
           - TotalIntegrityCheckTime: 0.000ns
           - TotalReadBlockTime: 0.000ns
        CodeGen:(Total: 290.903ms, non-child: 290.903ms, % non-child: 100.00%)
           - CodegenTime: 4.578ms
           - CompileTime: 13.251ms
           - LoadTime: 0.000ns
           - ModuleBitcodeSize: 1.87 MB (1959112)
           - OptimizationTime: 149.882ms
           - PrepareTime: 123.054ms
        DataStreamSender (dst_id=1):(Total: 165.873us, non-child: 165.873us, % non-child: 100.00%)
           - BytesSent: 264.00 B (264)
           - NetworkThroughput(*): 3.65 MB/sec
           - OverallThroughput: 1.52 MB/sec
           - PeakMemoryUsage: 2.66 KB (2728)
           - RowsReturned: 6 (6)
           - SerializeBatchTime: 40.210us
           - TransmitDataRPCTime: 68.991us
           - UncompressedRowBatchSize: 460.00 B (460)
        HDFS_SCAN_NODE (id=0):(Total: 158.588ms, non-child: 158.588ms, % non-child: 100.00%)
          ExecOption: Expr Evaluation Codegen Disabled, Codegen enabled: 1 out of 1
          Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:1/186.00 B 
          Hdfs Read Thread Concurrency Bucket: 0:0% 1:0% 2:0% 3:0% 4:0% 5:0% 
          File Formats: TEXT/NONE:1 
          BytesRead(500.000ms): 0
           - AverageHdfsReadThreadConcurrency: 0.00 
           - AverageScannerThreadConcurrency: 0.00 
           - BytesRead: 186.00 B (186)
           - BytesReadDataNodeCache: 0
           - BytesReadLocal: 186.00 B (186)
           - BytesReadRemoteUnexpected: 0
           - BytesReadShortCircuit: 186.00 B (186)
           - DecompressionTime: 0.000ns
           - MaxCompressedTextFileLength: 0
           - NumDisksAccessed: 1 (1)
           - NumScannerThreadsStarted: 1 (1)
           - PeakMemoryUsage: 61.00 KB (62464)
           - PerReadThreadRawHdfsThroughput: 3.53 MB/sec
           - RemoteScanRanges: 0 (0)
           - RowsRead: 6 (6)
           - RowsReturned: 6 (6)
           - RowsReturnedRate: 37.00 /sec
           - ScanRangesComplete: 1 (1)
           - ScannerThreadsInvoluntaryContextSwitches: 0 (0)
           - ScannerThreadsTotalWallClockTime: 24.753ms
             - DelimiterParseTime: 51.707us
             - MaterializeTupleTime(*): 903.000ns
             - ScannerThreadsSysTime: 351.000us
             - ScannerThreadsUserTime: 0.000ns
           - ScannerThreadsVoluntaryContextSwitches: 3 (3)
           - TotalRawHdfsReadTime(*): 50.213us
           - TotalReadThroughput: 0.00 /sec

Among them, the common commands output by help command include documented commands (you can use help to view specific usage) and non documented commands (you can't use help to view specific usage);

The refresh command and the invalidate metadata command have the same function and can be used to update metadata. However, the former can specify a table to update (incremental update), while the latter can only update all (full update) and has different performance;

When using explain to view query plans, SCAN HDFS is used to scan tables, partitions is used to specify partitions, and the number and size of files are specified for verification:

[root@node03 ~]$ hdfs dfs -ls /user/impala/t1
Found 1 items
-rw-r--r--   3 root supergroup        186 2021-10-10 21:44 /user/impala/t1/user.csv

As you can see, the results are the same.

At the same time, when the output level of explain is set to 3, there is more information.

Comparison between explain and profile:

Expatin: you can display the execution plan of a task without actually executing the task;
profile: after calling the task execution, you can observe the task of running Impala from the lower level and more detailed level, and tune it.

2.Impala SQL syntax

(1) Database operation

The CREATE DATABASE statement is used to create a new database in Impala:
CREATE DATABASE IF NOT EXISTS database_name; In, if not exists is an optional clause. If this clause is used, the database with the given name will be created only if there is no existing database with the same name.

As follows:

[node03:21000] > show databases;
Query: show databases
+------------------+----------------------------------------------+
| name             | comment                                      |
+------------------+----------------------------------------------+
| _impala_builtins | System database for Impala builtin functions |
| default          | Default Hive database                        |
| homework         |                                              |
| mydb             |                                              |
| mydb2            |                                              |
| sale             |                                              |
| tuning           |                                              |
+------------------+----------------------------------------------+
Fetched 7 row(s) in 0.01s
[node03:21000] > create database if not exists cltest;
Query: create database if not exists cltest

Fetched 0 row(s) in 0.79s
[node03:21000] > show databases;
Query: show databases
+------------------+----------------------------------------------+
| name             | comment                                      |
+------------------+----------------------------------------------+
| _impala_builtins | System database for Impala builtin functions |
| cltest           |                                              |
| default          | Default Hive database                        |
| homework         |                                              |
| mydb             |                                              |
| mydb2            |                                              |
| sale             |                                              |
| tuning           |                                              |
+------------------+----------------------------------------------+
Fetched 8 row(s) in 0.00s

As you can see, the database is created.

After the database is created, a new database name folder will be created under the data warehouse Path of Hive in HDFS by default, as follows:

[root@node03 ~]$ hdfs dfs -ls /user/hive/warehouse
Found 7 items
drwxrwxrwx   - impala    supergroup          0 2021-10-11 13:47 /user/hive/warehouse/cltest.db
drwxrwxrwx   - root      supergroup          0 2021-10-05 20:16 /user/hive/warehouse/homework.db
drwxrwxrwx   - root      supergroup          0 2021-10-10 22:13 /user/hive/warehouse/mydb.db
drwxrwxrwx   - anonymous supergroup          0 2021-09-28 20:30 /user/hive/warehouse/mydb2.db
drwxrwxrwx   - root      supergroup          0 2021-10-03 13:06 /user/hive/warehouse/sale.db
drwxrwxrwx   - root      supergroup          0 2021-09-28 21:00 /user/hive/warehouse/t2
drwxrwxrwx   - root      supergroup          0 2021-10-08 23:44 /user/hive/warehouse/tuning.db

The DROP DATABASE statement of Impala is used to delete the database from Impala. It is recommended to delete all tables from Impala before deleting the database;
If cascade is used to delete cascade, you can directly delete the database containing tables and data.

As follows:

[node03:21000] > drop database cltest cascade;
Query: drop database cltest cascade
[node03:21000] > show databases;
Query: show databases
+------------------+----------------------------------------------+
| name             | comment                                      |
+------------------+----------------------------------------------+
| _impala_builtins | System database for Impala builtin functions |
| default          | Default Hive database                        |
| homework         |                                              |
| mydb             |                                              |
| mydb2            |                                              |
| sale             |                                              |
| tuning           |                                              |
+------------------+----------------------------------------------+
Fetched 7 row(s) in 0.01s

(2) Table operation

The CREATE TABLE statement is used to create a new table in the required database in Impala. You need to specify the table name and define its columns and the data type of each column.
Impala supports data types similar to Hive;

Impala's support for complex data types:
Complex types in Text storage format are not supported. Complex types should use parquet format.

The syntax is as follows:

create table IF NOT EXISTS database_name.table_name(
  column1 data_type,
  column2 data_type,
  column3 data_type,
  ...,
  columnN data_type
);

Examples are as follows:

[node03:21000] > create table if not exists mydb.student(
               >   name string,
               >   age int,
               >   concat int
               > );
Query: create table if not exists mydb.student(
  name string,
  age int,
  concat int
)

Fetched 0 row(s) in 0.31s
[node03:21000] > use mydb;
Query: use mydb
[node03:21000] > show tables;
Query: show tables
+-----------------+
| name            |
+-----------------+
| course          |
| emp             |
| goodtbl         |
| line2row        |
| row2line        |
| stu             |
| student         |
| studscore       |
| t1              |
| t2              |
| t3              |
| tab1            |
| taba            |
| tabb            |
| tabc            |
| tabd            |
| tabe            |
| temp2           |
| u1              |
| u2              |
| uaction_orc     |
| uaction_parquet |
| uaction_text    |
| ulogin          |
| userpv          |
| zxz_data        |
+-----------------+
Fetched 26 row(s) in 0.01s

The default data storage path for table creation is the same as Hive. You can also specify the specific path through location during table creation.

Impala's INSERT statement has two clauses, into and overwrite:
into is used to insert new record data;
Overwrite is used to overwrite existing records.

The syntax of the insert into statement is as follows:

-- Form 1
insert into table_name(column1, column2, column3,...columnN) values(value1,value2, value3,...valueN);
-- Form 2
insert into table_name values(value1, value2, value2);

Here, column1, column2,..., columnN are the names of the columns in the table to insert data;
You can also add values without specifying column names, but you need to ensure that the order of values is the same as the order of columns in the table.

Examples are as follows:

[node03:21000] > create table employee(
               >   Id INT,
               >   name STRING,
               >   age INT,
               >   address STRING,
               >   salary BIGINT
               > );
Query: create table employee(
  Id INT,
  name STRING,
  age INT,
  address STRING,
  salary BIGINT
)

Fetched 0 row(s) in 0.40s
[node03:21000] > insert into employee VALUES (1, 'Ramesh', 32, 'Ahmedabad', 20000);
Query: insert into employee VALUES (1, 'Ramesh', 32, 'Ahmedabad', 20000)
Inserted 1 row(s) in 2.46s
[node03:21000] > insert into employee values (2, 'Khilan', 25, 'Delhi', 15000);
Query: insert into employee values (2, 'Khilan', 25, 'Delhi', 15000 )
Inserted 1 row(s) in 0.32s
[node03:21000] > Insert into employee values (3, 'kaushik', 23, 'Kota', 30000);
Query: insert into employee values (3, 'kaushik', 23, 'Kota', 30000 )
Inserted 1 row(s) in 0.32s
[node03:21000] > Insert into employee values (4, 'Chaitali', 25, 'Mumbai', 35000);
Query: insert into employee values (4, 'Chaitali', 25, 'Mumbai', 35000 )
Inserted 1 row(s) in 0.21s
[node03:21000] > Insert into employee values (5, 'Hardik', 27, 'Bhopal', 40000);
Query: insert into employee values (5, 'Hardik', 27, 'Bhopal', 40000 )
Inserted 1 row(s) in 0.46s
[node03:21000] > Insert into employee values (6, 'Komal', 22, 'MP', 32000);
Query: insert into employee values (6, 'Komal', 22, 'MP', 32000)
Inserted 1 row(s) in 0.33s

overwrite overwrites all records in the table. The overwritten records will be permanently deleted from the table.

Examples are as follows:

insert overwrite employee values (1, 'Ram', 26, 'Vishakhapatnam', 37000);

Impala SELECT statement is used to query data from the database and return data in the form of a table.

As follows:

[node03:21000] > select * from t3;
Query: select * from t3
+--------------------+------+-----+--------+
| id                 | name | age | gender |
+--------------------+------+-----+--------+
| 392456197008193000 | Zhang San | 20  | 0      |
| 392456197008193000 | Zhang San | 20  | 0      |
| 392456197008193000 | Zhang San | 20  | 0      |
+--------------------+------+-----+--------+
Fetched 3 row(s) in 2.85s

The describe/desc statement in Impala is used to provide a description of the table. The result contains information about the table, such as column names and their data types. s

Use the following:

[node03:21000] > desc employee;
Query: describe employee
+---------+--------+---------+
| name    | type   | comment |
+---------+--------+---------+
| id      | int    |         |
| name    | string |         |
| age     | int    |         |
| address | string |         |
| salary  | bigint |         |
+---------+--------+---------+
Fetched 5 row(s) in 0.01s
[node03:21000] > desc formatted employee;
Query: describe formatted employee
+------------------------------+------------------------------------------------------------+----------------------+
| name                         | type                                                       | comment              |
+------------------------------+------------------------------------------------------------+----------------------+
| # col_name                   | data_type                                                  | comment              |
|                              | NULL                                                       | NULL                 |
| id                           | int                                                        | NULL                 |
| name                         | string                                                     | NULL                 |
| age                          | int                                                        | NULL                 |
| address                      | string                                                     | NULL                 |
| salary                       | bigint                                                     | NULL                 |
|                              | NULL                                                       | NULL                 |
| # Detailed Table Information | NULL                                                       | NULL                 |
| Database:                    | mydb                                                       | NULL                 |
| Owner:                       | root                                                       | NULL                 |
| CreateTime:                  | Mon Oct 11 13:58:02 CST 2021                               | NULL                 |
| LastAccessTime:              | UNKNOWN                                                    | NULL                 |
| Protect Mode:                | None                                                       | NULL                 |
| Retention:                   | 0                                                          | NULL                 |
| Location:                    | hdfs://node01:9000/user/hive/warehouse/mydb.db/employee    | NULL                 |
| Table Type:                  | MANAGED_TABLE                                              | NULL                 |
| Table Parameters:            | NULL                                                       | NULL                 |
|                              | transient_lastDdlTime                                      | 1633931882           |
|                              | NULL                                                       | NULL                 |
| # Storage Information        | NULL                                                       | NULL                 |
| SerDe Library:               | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe         | NULL                 |
| InputFormat:                 | org.apache.hadoop.mapred.TextInputFormat                   | NULL                 |
| OutputFormat:                | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL                 |
| Compressed:                  | No                                                         | NULL                 |
| Num Buckets:                 | 0                                                          | NULL                 |
| Bucket Columns:              | []                                                         | NULL                 |
| Sort Columns:                | []                                                         | NULL                 |
+------------------------------+------------------------------------------------------------+----------------------+
Fetched 28 row(s) in 0.03s

The Alter table statement in Impala is used to make changes to a given table. You can add, delete, or modify columns in an existing table, or rename them.

It should be noted that tables can be modified in Impala, but it is not recommended to do so in Impala. Instead, it is regarded as a special query tool, and the update operation can be done in Hive.

Impala drop table statement is used to delete existing tables in impala. It will delete the complete table structure from the database and the underlying HDFS file of the internal table.

The syntax is as follows:

drop table database_name.table_name;

be careful:
You must be careful when using this command, because when you delete a table, all the information available in the table will be lost forever.

Impala's Truncate Table statement is used to delete all records from an existing table and preserve the table structure.

The syntax format is as follows:

truncate table_name;

Examples of drop table and truncate are as follows:

[node03:21000] > drop table temp2;
Query: drop table temp2
[node03:21000] > show tables;
Query: show tables
+-----------------+
| name            |
+-----------------+
| course          |
| emp             |
| employee        |
| goodtbl         |
| line2row        |
| row2line        |
| stu             |
| student         |
| studscore       |
| t1              |
| t2              |
| t3              |
| tab1            |
| taba            |
| tabb            |
| tabc            |
| tabd            |
| tabe            |
| u1              |
| u2              |
| uaction_orc     |
| uaction_parquet |
| uaction_text    |
| ulogin          |
| userpv          |
| zxz_data        |
+-----------------+
Fetched 26 row(s) in 0.01s
[node03:21000] > select * from t3;
Query: select * from t3
+--------------------+------+-----+--------+
| id                 | name | age | gender |
+--------------------+------+-----+--------+
| 392456197008193000 | Zhang San | 20  | 0      |
| 392456197008193000 | Zhang San | 20  | 0      |
| 392456197008193000 | Zhang San | 20  | 0      |
+--------------------+------+-----+--------+
Fetched 3 row(s) in 2.85s
[node03:21000] > truncate t3;
Query: truncate t3

Fetched 0 row(s) in 0.22s
[node03:21000] > select * from t3;
Query: select * from t3

Fetched 0 row(s) in 0.15s

View is a statement of Impala query language with associated name stored in the database. It is a combination of tables in the form of predefined SQL query.
A view can contain all rows of a table or selected rows.

The syntax for creating, modifying, and deleting views is as follows:

-- Create view
create view if not exists view_name as select statement;
-- Modify view
alter view database_name.view_name as select statement;
-- Delete view
drop view database_name.view_name;

Use the following:

[node03:21000] > create view if not exists emp_view as select * from employee where salary > 20000;
Query: create view if not exists emp_view as select * from employee where salary > 20000

Fetched 0 row(s) in 0.14s
[node03:21000] > show tables;
Query: show tables
+-----------------+
| name            |
+-----------------+
| course          |
| emp             |
| emp_view        |
| employee        |
| goodtbl         |
| line2row        |
| row2line        |
| stu             |
| student         |
| studscore       |
| t1              |
| t2              |
| t3              |
| tab1            |
| taba            |
| tabb            |
| tabc            |
| tabd            |
| tabe            |
| u1              |
| u2              |
| uaction_orc     |
| uaction_parquet |
| uaction_text    |
| ulogin          |
| userpv          |
| zxz_data        |
+-----------------+
Fetched 27 row(s) in 0.02s
[node03:21000] > select * from emp_view;
Query: select * from emp_view
+----+----------+-----+---------+--------+
| id | name     | age | address | salary |
+----+----------+-----+---------+--------+
| 3  | kaushik  | 23  | Kota    | 30000  |
| 4  | Chaitali | 25  | Mumbai  | 35000  |
| 5  | Hardik   | 27  | Bhopal  | 40000  |
| 6  | Komal    | 22  | MP      | 32000  |
+----+----------+-----+---------+--------+
Fetched 4 row(s) in 4.14s
[node03:21000] > drop view emp_view;
Query: drop view emp_view
[node03:21000] > show tables;
Query: show tables
+-----------------+
| name            |
+-----------------+
| course          |
| emp             |
| employee        |
| goodtbl         |
| line2row        |
| row2line        |
| stu             |
| student         |
| studscore       |
| t1              |
| t2              |
| t3              |
| tab1            |
| taba            |
| tabb            |
| tabc            |
| tabd            |
| tabe            |
| u1              |
| u2              |
| uaction_orc     |
| uaction_parquet |
| uaction_text    |
| ulogin          |
| userpv          |
| zxz_data        |
+-----------------+
Fetched 26 row(s) in 0.01s

Impala ORDER BY clause is used to sort data in ascending or descending order according to one or more columns.

The syntax is as follows:

select * from table_name ORDER BY col_name [ASC|DESC] [NULLS FIRST|NULLS LAST];

By default, some databases sort query results in ascending order. You can use the keyword ASC (default) or DESC to sort the data in the table in ascending or descending order respectively.
If NULLS FIRST is used, all null values in the table are arranged in the top row;
If we use NULLS LAST, rows containing null values are arranged last.

The Impala GROUP BY clause is used in conjunction with a SELECT statement to arrange the same data into groups.

The having clause is used to filter the data obtained after grouping and other queries;
It is easily confused with where filtering. The differences are as follows:
The data filtered by where is the original data, which already exists in the table;
having filters query result data.

The limit clause in Impala is used to limit the number of rows in the result set to the required number, that is, the result set of the query does not contain records that exceed the specified limit.
Generally speaking, the rows in the resultset of the select query start from 0. Using the offset clause, you can decide where to start the output.

order by, group by and having are used as follows:

[node03:21000] > select * from employee order by salary desc;
Query: select * from employee order by salary desc
+----+----------+-----+-----------+--------+
| id | name     | age | address   | salary |
+----+----------+-----+-----------+--------+
| 5  | Hardik   | 27  | Bhopal    | 40000  |
| 4  | Chaitali | 25  | Mumbai    | 35000  |
| 6  | Komal    | 22  | MP        | 32000  |
| 3  | kaushik  | 23  | Kota      | 30000  |
| 1  | Ramesh   | 32  | Ahmedabad | 20000  |
| 2  | Khilan   | 25  | Delhi     | 15000  |
+----+----------+-----+-----------+--------+
Fetched 6 row(s) in 0.87s
[node03:21000] > select name, sum(salary) from employee group by name;
Query: select name, sum(salary) from employee group by name
+----------+-------------+
| name     | sum(salary) |
+----------+-------------+
| Ramesh   | 20000       |
| Hardik   | 40000       |
| Komal    | 32000       |
| Chaitali | 35000       |
| kaushik  | 30000       |
| Khilan   | 15000       |
+----------+-------------+
Fetched 6 row(s) in 1.28s
[node03:21000] > select age, max(salary) from employee group by age having max(salary) > 20000;
Query: select age, max(salary) from employee group by age having max(salary) > 20000
+-----+-------------+
| age | max(salary) |
+-----+-------------+
| 23  | 30000       |
| 25  | 35000       |
| 27  | 40000       |
| 22  | 32000       |
+-----+-------------+
Fetched 4 row(s) in 0.67s
[node03:21000] > select * from employee order by salarylimit 2 offset 0;
Query: select * from employee order by salarylimit 2 offset 0
ERROR: AnalysisException: Syntax error in line 1:
select * from employee order by salarylimit 2 offset 0
                                            ^
Encountered: INTEGER LITERAL
Expected: AND, AS, ASC, BETWEEN, CROSS, DESC, DIV, ELSE, END, FOLLOWING, FROM, FULL, GROUP, HAVING, ILIKE, IN, INNER, IREGEXP, IS, JOIN, LEFT, LIKE, LIMIT, LOCATION, NOT, NULLS, OFFSET, ON, OR, ORDER, PRECEDING, RANGE, REGEXP, RIGHT, RLIKE, ROWS, STRAIGHT_JOIN, THEN, UNION, USING, WHEN, WHERE, COMMA, IDENTIFIER

CAUSED BY: Exception: Syntax error

[node03:21000] > select * from employee order by salary limit 2 offset 0;
Query: select * from employee order by salary limit 2 offset 0
+----+--------+-----+-----------+--------+
| id | name   | age | address   | salary |
+----+--------+-----+-----------+--------+
| 2  | Khilan | 25  | Delhi     | 15000  |
| 1  | Ramesh | 32  | Ahmedabad | 20000  |
+----+--------+-----+-----------+--------+
Fetched 2 row(s) in 6.16s
[node03:21000] > select * from employee order by salary limit 2 offset 2;
Query: select * from employee order by salary limit 2 offset 2
+----+---------+-----+---------+--------+
| id | name    | age | address | salary |
+----+---------+-----+---------+--------+
| 3  | kaushik | 23  | Kota    | 30000  |
| 6  | Komal   | 22  | MP      | 32000  |
+----+---------+-----+---------+--------+
Fetched 2 row(s) in 2.53s
[node03:21000] > select * from employee order by salary limit 2 offset 4;
Query: select * from employee order by salary limit 2 offset 4
+----+----------+-----+---------+--------+
| id | name     | age | address | salary |
+----+----------+-----+---------+--------+
| 4  | Chaitali | 25  | Mumbai  | 35000  |
| 5  | Hardik   | 27  | Bhopal  | 40000  |
+----+----------+-----+---------+--------+
Fetched 2 row(s) in 2.45s
[node03:21000] > select * from employee order by salary limit 2 offset 6;
Query: select * from employee order by salary limit 2 offset 6

Fetched 0 row(s) in 1.47s

You can see that the limit... offset... Statement realizes the paging effect;

It should be noted that using the offset keyword requires that the result data must be sorted.

3. Import data and query Impala via JDBC

Like Hive, there are many ways to import data:

mode	meaning
insert into values	Similar to the data insertion method of RDBMS, the data is manually specified when inserting data
insert into select	The data inserted into a table comes from the results returned by the subsequent select query statement
create table as select	The number, type and data of the fields in the table are from the subsequent select query statements
load data	Import data from HDFS file. This method is not recommended to be used in Impala. You can first load the data into Hive table by load data, and then insert it into Impala table by using the above method

In actual work, because impala query is relatively fast, impala may be used for database query. Impala query can be carried out through Java code.

Create a Maven project ImpalaJDBC with the following dependencies:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.bigdata.impala</groupId>
    <artifactId>ImpalaJDBC</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
    </properties>

    <dependencies>
        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-
        common -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.9.2</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hive/hive-common -->
        <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-common</artifactId>
        <version>2.3.7</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hive/hive-metastore -->
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-metastore</artifactId>
            <version>2.3.7</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hive/hive-service -->
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-service</artifactId>
            <version>2.3.7</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hive/hive-jdbc -->
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>2.3.7</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hive/hive-exec -->
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>2.3.7</version>
        </dependency>
    </dependencies>
</project>

Wait to download and load all dependencies.

Create package com.bigdata.impala, and create class ImpalaTest under impala package, as follows:

package com.bigdata.impala;

import java.sql.*;

/**
 * @author Corley
 * @date 2021/10/11 15:41
 * @description ImpalaJDBC-com.bigdata.impala
 */
public class ImpalaTest {


    public static void main(String[] args) throws ClassNotFoundException, SQLException {

        // Define the driver and connection url to connect to Impala
        String driver = "org.apache.hive.jdbc.HiveDriver";
        String driverUrl = "jdbc:hive2://node02:21050/mydb;auth=noSasl";
        // sql statement of query
        String querySql = "select * from employee;";

        // Load driver
        Class.forName(driver);

        // Get connection through DriverManager
        final Connection connection = DriverManager.getConnection(driverUrl);
        final PreparedStatement ps = connection.prepareStatement(querySql);
        // Execute query
        final ResultSet resultSet = ps.executeQuery();

        // Parsing returned results
        // Gets the number of columns per data
        final int columnCount = resultSet.getMetaData().getColumnCount();
        // Traversal result set
        while (resultSet.next()) {
            for (int i = 1; i <= columnCount; i++) {
                final String string = resultSet.getString(i);
                System.out.print(string + "\t");
            }
            System.out.println();
        }

        //close resource
        ps.close();
        connection.close();
    }
}

Output:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/Users/LENOVO/.m2/repository/org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/Users/LENOVO/.m2/repository/org/apache/logging/log4j/log4j-slf4j-impl/2.6.2/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
1  Ramesh  32  Ahmedabad  20000  
3  kaushik  23  Kota  30000  
4  Chaitali  25  Mumbai  35000  
5  Hardik  27  Bhopal  40000  
6  Komal  22  MP  32000  
2  Khilan  25  Delhi  15000

As you can see, the query results are obtained.

summary

Impala is also an interactive query engine, which can realize efficient query. When choosing between hives, you can choose according to the actual needs.

Added by Bookmark on Sun, 28 Nov 2021 19:47:24 +0200

Programming VIP

Introduction to big data development foundation and project practice Hadoop core and ecosystem technology stack 6.Impala interactive query

preface

1.Impala-Shell

2.Impala SQL syntax

3. Import data and query Impala via JDBC

summary

Popular Keywords