Introduction to TPC and TPC-H
According to TPC official website( http://tpc.org )TPC is a non-profit company, focusing on developing data centric benchmark standards and disseminating objective and verifiable data to the industry. TPC releases TPC-C, TPC-H and TPC-DS, which are widely used in the industry. TPC-C is the test benchmark for OLTP, while TPC-H and TPC-DS are usually the test benchmarks for OLAP.
TPC-H is a Decision Support Benchmark, which consists of a set of business oriented special queries and concurrent data modification. The data that queries and populates the database has a wide range of industry relevance. This benchmark demonstrates a decision support system that examines large amounts of data, performs highly complex queries, and answers key business questions. The performance indicators reported by TPC-H are called TPC-H hourly composite query performance indicators( QphH@Size ), which reflects many aspects of the system's query processing ability. These aspects include the database size selected when executing queries, the query processing capacity when submitting queries by a single stream, and the query throughput when submitting queries by multiple concurrent users.
On February 18, 2021, TPC-H Standard Specification released V3 version. Download link: http://tpc.org/tpc_documents_...
TPC-H table structure and relationship
TPC-H test standard simulates a system for online sales of parts and components, and defines 8 tables in total:
- REGION region table
- NATION country table
- SUPPLIER supplier table
- PART parts list
- PARTSUPP parts supply table
- CUSTOMER customer table
- ORDERS order form
- LINEITEM order details
The structure, data volume and mutual relationship of each table are shown in the figure below. The column name prefix of the table is in the brackets behind the table name, the arrow points to the direction of one to many relationship between tables, and the number or formula below the table name is the cardinality (row number) of the table. The cardinality in the LINEITEM table is an approximate value, and sf in the cardinality is a scale factor. It is used to generate data sets with different data sizes. sf must be selected from the following fixed values: 1 / 10 / 30 / 100 / 1000 / 3000 / 10000, which is equivalent to 1GB/10GB/30GB/100GB/1TB/3TB/1PB.
Pay attention to official account big data research society: BigDataRLC, first time to read the latest articles.
TPC-H query SQL statement
Q1: Pricing summary report query
Pricing summary report query is a single table query operation with grouping, sorting and aggregation. It reports the invoiced, shipped and returned business volume; Provides a summary pricing report for all order items shipped on a given date, by l_returnflag and l_linestatus groups and sorts, and lists the extended price, discount extended price, discount extended price plus tax, average quantity, average extended price, average discount, and the total number of rows in each group.
select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice*(1-l_discount)) as sum_disc_price, sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date '1998-12-01' - interval '[DELTA]' day (3) //DELTA is a value within 60 ~ 120 group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus;
Q2: lowest cost supplier query
The lowest cost supplier query is a multi table query operation with sorting, aggregation and sub query. It is used to find the supplier that should be selected to place an order for a given part in a given region; In a given area, find suppliers who can supply at the lowest cost for each part of a specific type and size. If multiple suppliers in the region supply at the same (lowest) cost. The query lists the supplier's account balance, name and country, part number, manufacturer, supplier's address, telephone number and comment information.
select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment /*Query the supplier's account balance, name, country, part number, manufacturer, supplier's address, telephone number and remarks */ from part, supplier, partsupp, nation, region //Five meter connection where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = [SIZE] //Specify the size and select randomly in the interval [1, 50] and p_type like '%[TYPE]' //Specify the type and select randomly within the range specified by TPC-H standard and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = '[REGION]' //Designated areas are randomly selected within the range specified by TPC-H standard and ps_supplycost = ( //Subquery select min(ps_supplycost) //Aggregate function from partsupp, supplier, nation, region //Overlap with table of parent query where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = '[REGION]' ) order by //sort s_acctbal desc, n_name, s_name, p_partkey;
Q3: transportation priority query
Transportation priority query is a three table query operation with grouping, sorting and aggregation. It retrieves the unsealed orders with the highest value; Among the orders that have not been shipped on a given date, the transportation priority and potential revenue of the order with the largest revenue (the sum of l_extendedprice * (1-l_discount)), and the orders are sorted in descending order of revenue.
select l_orderkey, sum(l_extendedprice*(1-l_discount)) as revenue, //Potential revenue, aggregation operations o_orderdate, o_shippriority from customer, orders, lineitem //Three meter connection where c_mktsegment = '[SEGMENT]' //Randomly selected within the range specified by TPC-H standard and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < date '[DATE]' //Specify the date period and select it randomly in [1995-03-01, 1995-03-31] and l_shipdate > date '[DATE]' group by //Grouping operation l_orderkey, //Order ID o_orderdate, //Order date o_shippriority //Transportation priority order by //Sort operation revenue desc, //In descending order, the potential maximum income is listed first o_orderdate;
Q4: order priority query
Order priority query is a single table query operation with grouping, sorting, aggregation and sub query. It determines the work of the order priority system and evaluates customer satisfaction; The number of orders in a given quarter, where the customer receives at least one order item after the commitment log, listing the count of such orders for each order priority sorted in ascending order of priority.
select o_orderpriority, //Order Priority count(*) as order_count //Order priority count from orders //Single table query where o_orderdate >= date '[DATE]' and o_orderdate < date '[DATE]' + interval '3' month //Specify the time period of the order + three months. DATE is the first day of a month randomly selected between January 1993 and October 1997 and exists ( //Subquery select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by //Group by order priority o_orderpriority order by //Sort by order priority o_orderpriority;
Q5: local supplier revenue query
Local supplier revenue query is a multi table join query operation with grouping, sorting, aggregation and sub query. It lists the revenue completed by local suppliers; List the revenue generated by order item transactions for each country in a region to determine whether to establish a local distribution center in a given region.
select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue //Aggregate operation from customer,orders,lineitem,supplier,nation,region //Six meter connection where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = '[REGION]' //Designated areas are randomly selected within the range specified by TPC-H standard and o_orderdate >= date '[DATE]' //DATE is January 1 of a randomly selected year from 1993 to 1997 and o_orderdate < date '[DATE]' + interval '1' year group by //Group by name n_name order by //Sort by income in descending order. Pay attention to the difference between grouping and sorting clauses revenue desc;
Q6: forecast revenue change query
Forecast revenue change query is a single table query operation with aggregation operation, which quantifies the revenue growth caused by eliminating discounts in some companies within a given percentage range in a given year, and is used to find ways to increase revenue; For all order items shipped in a given year, the DISCOUNT is between "count-0.01" and "count + 0.01", if l_ The increase in total revenue after the DISCOUNT of orders with quantity less than quantity is eliminated.
select sum(l_extendedprice*l_discount) as revenue //Potential revenue increase from lineitem //Single table query where l_shipdate >= date '[DATE]' //DATE is January 1 of a year randomly selected from [1993, 1997] and l_shipdate < date '[DATE]' + interval '1' year //Within one year and l_discount between [DISCOUNT] - 0.01 and [DISCOUNT] + 0.01 //between and l_quantity < [QUANTITY]; // QUANTITY is randomly selected in the interval [24, 25]
Q7: batch shipment query
Batch shipment query is a multi table query operation with grouping, sorting, aggregation and sub query operations. It determines the value of goods directly transported in some countries to help renegotiate transportation contracts; For two given countries, the total discount income obtained from the order items of suppliers in one country shipping parts to customers in the other country between 1995 and 1996.
select supp_nation, //Supplier country cust_nation, //Customer country l_year, sum(volume) as revenue //Annual and annual freight revenue from ( //Subquery select n1.n_name as supp_nation, n2.n_name as cust_nation, extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume from supplier,lineitem,orders,customer,nation n1,nation n2 //Six meter connection where s_suppkey = l_suppkey and o_orderkey = l_orderkey and c_custkey = o_custkey and s_nationkey = n1.n_nationkey and c_nationkey = n2.n_nationkey and ( // The values of NATION2 and NATION1 are different, which means that the query is about transnational freight transportation (n1.n_name = '[NATION1]' and n2.n_name = '[NATION2]') or (n1.n_name = '[NATION2]' and n2.n_name = '[NATION1]') ) and l_shipdate between date '1995-01-01' and date '1996-12-31' ) as shipping group by supp_nation, cust_nation, l_year order by supp_nation, cust_nation, l_year;
Q8: national market share query
The national market share query has the query operations of grouping, sorting, aggregation and sub query, so as to determine the change of the market share of a given country in a given region for a given part type within two years; The market share of a given country in a given region is defined as part of revenue, i.e. L_ The sum of extendedprice * (1-l_discount).
select o_year, //particular year sum(case when nation = '[NATION]'//Designated countries are selected randomly within the range specified in TPC-H standard then volume else 0 end) / sum(volume) as mkt_share //Market share: the percentage of revenue from a specific category of products; Aggregate operation from ( //Subquery select extract(year from o_orderdate) as o_year, //Break down the year l_extendedprice * (1-l_discount) as volume, //Revenue from specific types of products n2.n_name as nation from part,supplier,lineitem,orders,customer,nation n1,nation n2,region //Eight meter connection where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = '[REGION]' //Designated areas are randomly selected within the range specified by TPC-H standard and s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' //Only check the situation in 1995 and 1996 and p_type = '[TYPE]' //Specify the part type and select it randomly within the range specified in TPC-H standard ) as all_nations group by //Grouping by year o_year order by //Sort by year o_year;
Q9: product type profit measurement query
Product type profit measurement query is a query operation with grouping, sorting, aggregation and sub query operations to determine the profit of a given part series broken down by supplier country and year.
select nation, o_year, sum(amount) as sum_profit //Total profit of all ordered parts in each country in each year from( select n_name as nation, //country extract(year from o_orderdate) as o_year, //Take out the year l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount //profit from part,supplier,lineitem,partsupp,orders,nation //Six meter connection where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%[COLOR]%' //LIKE operation, query optimizer may optimize ) as profit group by //Grouping by country and year nation, o_year order by //Sorted by country and year, the older one comes first nation, o_year desc;
Q10: return report query
Return report query is a multi table join query operation with grouping, sorting and aggregation operations to identify customers who may have problems with the parts shipped to them; Identify the top ranked customers who have returned parts based on the impact on the loss of revenue for a given quarter.
select c_custkey, c_name, //Customer information sum(l_extendedprice * (1 - l_discount)) as revenue, //Loss of income c_acctbal, n_name, c_address, c_phone, c_comment //Country, address, telephone number, opinion information, etc from customer, orders, lineitem, nation where c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate >= date '[DATE]' // DATE is the first of any month from January 1993 to December 1994 and o_orderdate < date '[DATE]' + interval '3' month //Within 3 months and l_returnflag = 'R' //The goods were returned and c_nationkey = n_nationkey group by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment order by revenue desc;
Q11: inventory value query
Inventory value query is a multi table join query operation with grouping, sorting, aggregation and sub query operations to find the inventory value of suppliers in a given country / region; Scan the available inventory of suppliers in a given country to find all parts that account for a significant percentage of the total value of all available parts.
select ps_partkey, sum(ps_supplycost * ps_availqty) as value //Aggregate operation, total value of goods from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = '[NATION]' group by ps_partkey having //Grouping operation with HAVING clause sum(ps_supplycost * ps_availqty) > ( //A subquery is included in the HAVING clause select sum(ps_supplycost * ps_availqty) * [FRACTION] //Aggregation operation exists in sub query; FRACTION is 0.0001/SF1 from partsupp, supplier, nation //Consistent with the table join of the parent query where //Consistent with WHEWR condition of parent query ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = '[NATION]' //Designated country ) order by //Sort goods in descending order of value value desc;
Q12: shipping method and order priority query
Shipping method and order priority query are two table connection query operations with grouping, sorting and aggregation operations. Determine whether choosing a cheaper shipping method will have a negative impact on key priority orders, resulting in consumers receiving more goods after the contract date.
select l_shipmode, sum(//Aggregate operation case when o_orderpriority ='1-URGENT' //OR operation, one of the two can be satisfied, and the original OR HIGH can be selected or o_orderpriority ='2-HIGH' then 1 else 0 end) as high_line_count, sum( case when o_orderpriority <> '1-URGENT' //AND operation, both of which are not satisfied. It is not URGENT AND not HIGH and o_orderpriority <> '2-HIGH' then 1 else 0 end) as low_line_count from orders,lineitem where o_orderkey = l_orderkey and l_shipmode in ('[SHIPMODE1]', '[SHIPMODE2]') /* Specify the type of freight mode and select it randomly within the range specified in TPC-H standard. SHIPMODE2 must be different from SHIPMODE1 */ and l_commitdate < l_receiptdate and l_shipdate < l_commitdate and l_receiptdate >= date '[DATE]' //January 1st of any year from 1993 to 1997 and l_receiptdate < date '[DATE]' + interval '1' year //Within 1 year group by //Grouping operation l_shipmode order by //Sort operation l_shipmode;
Q13: customer distribution query
Customer distribution query is a query operation with grouping, sorting, aggregation, sub query and left external connection; Determine the customer distribution based on the customer's order quantity, calculate and report how many customers have no orders, and how many customers have 1 / 2 / 3 orders.
select c_count, count(*) as custdist //Aggregate operation to count the number of each group from (//Subquery select c_custkey, count(o_orderkey) from customer left outer join orders on //The subquery includes left outer join operations c_custkey = o_custkey and o_comment not like '%[WORD1]%[WORD2]%' //LIKE operation //WORD1 is any of the following four possible values: special, pending, unused, express //WORD2 is any of the following four possible values: packages, requests, accounts and deposits group by //Grouping operations in subqueries c_custkey ) as c_orders (c_custkey, c_count) group by //Grouping operation c_count order by //Sort operation custdist desc, //Sort descending from large to small c_count desc;
Q14: promotion effect query
Promotion effect query is a query operation with grouping, sorting, aggregation, sub query and left outer connection to monitor the market's response to promotion; Determine what percentage of revenue in a given year and month is related to the promotion portion.
select 100.00 * sum( case when p_type like 'PROMO%' //Promotional parts then l_extendedprice*(1-l_discount) //Income at a particular time else 0 end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue from lineitem, part where l_partkey = p_partkey and l_shipdate >= date '[DATE]' // DATE is the first day of any month in any year from 1993 to 1997 and l_shipdate < date '[DATE]' + interval '1' month;
Q15: top supplier query
Top level supplier query is a connection operation between common tables and views with sub query operations of sorting, aggregation and aggregation. It determines top-level suppliers to reward them, give them more business or obtain special recognition; The supplier that contributes the most to the total shipment in a given quarter of a given annual fee.
create view revenue[STREAM_ID](supplier_no, total_revenue) as //Create complex views with grouping operations select l_suppkey, sum(l_extendedprice * (1 - l_discount)) //Obtain the total profit brought by the supplier to the company from lineitem where l_shipdate >= date '[DATE]' //DATE is the first of any month from January 1993 to October 1997 and l_shipdate < date '[DATE]' + interval '3' month //Within 3 months group by //The grouping key is the same as one of the query objects l_suppkey; //Query statement select s_suppkey, s_name, s_address, s_phone, total_revenue from supplier,revenue[STREAM_ID] //Join common tables to complex views where s_suppkey = supplier_no and total_revenue = (//Aggregate subquery select max(total_revenue) from revenue[STREAM_ID] //Aggregate subqueries get data from views ) order by s_suppkey; //Delete view drop view revenue[STREAM_ID];
TPC-H standard defines the variant SQL equivalent to Q15 statement, uses the WITH clause, and then uses the object of WITH to connect WITH the table. The deformed SQL statement is as follows:
WITH revenue (supplier_no, total_revenue) as ( SELECT l_suppkey, SUM(l_extendedprice * (1-l_discount)) FROM lineitem WHERE l_shipdate >= date ':1' AND l_shipdate < date ':1' + interval '3' month GROUP BY l_suppkey ) SELECT s_suppkey, s_name, s_address, s_phone, total_revenue FROM supplier, revenue WHERE s_suppkey = supplier_no AND total_revenue = ( SELECT MAX(total_revenue) FROM revenue ) ORDER BY s_suppkey;
Q16: parts / supplier relationship query
Component / supplier relationship query is a two table connection operation with grouping, sorting, aggregation, de duplication and NOT IN sub query; Find out how many suppliers can provide parts with given attributes.
select p_brand, p_type, p_size, count(distinct ps_suppkey) as supplier_cnt //Aggregation and de duplication from partsupp, part where p_partkey = ps_partkey and p_brand <> '[BRAND]' // Brand = brand MN, M and N are two letters, representing two values, which are independent of each other, and the value is between 1 and 5 and p_type not like '[TYPE]%' //Types and sizes that consumers are not interested in and p_size in ([SIZE1], [SIZE2], [SIZE3], [SIZE4], [SIZE5], [SIZE6], [SIZE7], [SIZE8]) //TYPEX is a set of eight different values arbitrarily selected between 1 and 50 and ps_suppkey not in ( //NOT IN sub query, consumers exclude some suppliers select s_suppkey from supplier where s_comment like '%Customer%Complaints%' ) group by //Grouping operation p_brand, p_type, p_size order by //Sort operation supplier_cnt desc, //In descending order of quantity, in ascending order of brand, type and size p_brand, p_type, p_size;
Q17: small order revenue query
Small order revenue query is a two table connection operation with aggregation and aggregation sub query; Determine how much revenue will be lost per year if orders are not completed for a small number of specific parts, and reduce management expenses by focusing on larger shipments; For parts of a given brand and container type, determine the average order quantity of such parts ordered by all orders in the database within 7 years. If no more orders below the average level of 20% are received, what is the average annual total revenue loss.
select sum(l_extendedprice) / 7.0 as avg_yearly //Aggregate operation from lineitem, part where p_partkey = l_partkey and p_brand = '[BRAND]' /*Designated BRAND. BRAND = 'Brand#MN', M and N are two letters, representing two values, which are independent of each other, and the value is between 1 and 5 */ and p_container = '[CONTAINER]' //Specify the packaging type. Randomly selected within the range specified by TPC-H standard and l_quantity < ( //Aggregate subquery select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey );
Q18: mass customer query
Mass customer query is a three table connection operation with grouping, sorting, aggregation and IN sub query. Customers are ranked according to their mass orders.
select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, //essential information sum(l_quantity) //Total orders from customer, orders, lineitem where o_orderkey in ( //IN subquery with grouping operation select l_orderkey from lineitem group by l_orderkey having sum(l_quantity) > [QUANTITY] // QUANTITY is any value between 312 and 315 ) and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate;
Q19: discount revenue query
Discount revenue query is a three table connection operation with grouping, sorting, aggregation and IN sub query. It is attributed to the total discount revenue of selected parts processed IN a specific way. This query is an example of using data mining tools to generate formatting code; Total discount revenue for all orders of three different categories for some air or manual transportation parts. The selection of parts takes into account the specific brand, packaging and size range.
select sum(l_extendedprice * (1 - l_discount) ) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = '[BRAND1]' /*Specific brands. BRAND1, BRAND2 and BRAND3 = 'brand MN', M and N are two letters representing two values, which are independent of each other, and the value is between 1 and 5 */ and p_container in ( 'SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') //Packaging scope and l_quantity >= [QUANTITY1] and l_quantity <= [QUANTITY1] + 10 /* QUANTITY1 Is any value between 1 and 10 */ and p_size between 1 and 5 //Size range and l_shipmode in ('AIR', 'AIR REG') //For the transportation mode, the conditions indicated in bold with shadow below are the same, and there is the possibility of condition simplification and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = '[BRAND2]' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= [QUANTITY2] and l_quantity <= [QUANTITY2] + 10 /* QUANTITY2 Is any value between 10 and 20 */ and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = '[BRAND3]' and p_container in ( 'LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= [QUANTITY3] and l_quantity <= [QUANTITY3] + 10 /* QUANTITY3 Is any value between 20 and 30 */ and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' );
Q20: potential parts promotion query
Q20 statement query determines to find the supply goods in a specified country that can provide a more competitive price for a part commodity in a certain year. The so-called more competitive suppliers refer to those suppliers with excess parts. If they exceed 50% of the parts shipped by the supplier or supplier in a given country in a given year, it is excess.
Potential parts promotion query is a two table connection operation with sorting, aggregation, IN sub query and common sub query. It identifies suppliers IN specific countries. The parts they select may be candidates for promotion quotation; Identify suppliers who provide too many given parts, which is defined as more than 50% of the parts.
select s_name, s_address from supplier, nation where s_suppkey in ( //IN sub query of the first layer select ps_suppkey from partsupp where ps_partkey in ( //Second level nested IN subquery select p_partkey from part where p_name like '[COLOR]%' //COLOR is p_ Any value in the list of values for name ) and ps_availqty > (//Second level nested subquery select 0.5 * sum(l_quantity) //Aggregate subquery from lineitem where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= date('[DATE]') // DATE is January 1 of any year from 1993 to 1997 and l_shipdate < date('[DATE]') + interval '1' year //Within 1 year ) ) and s_nationkey = n_nationkey and n_name = '[NATION]' //Any value defined in TPC-H standard order by s_name;
Q21: supplier leaves a document for query
The supplier's document waiting query is a four table connection operation with grouping, sorting, aggregation, EXISTS sub query and NOT EXISTS sub query. It identifies the supplier who cannot deliver goods in time.
select s_name, count(*) as numwait from supplier, lineitem l1, orders, nation where s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = 'F' and l1.l_receiptdate > l1.l_commitdate and exists ( //EXISTS subquery select * from lineitem l2 where l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey ) and not exists ( //NOT EXISTS subquery select * from lineitem l3 where l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate ) and s_nationkey = n_nationkey and n_name = '[NATION]' //Any value defined in TPC-H standard group by s_name order by numwait desc, s_name;
Q22: global opportunity query
Global sales opportunity query is a four table connection operation with grouping, sorting, aggregation, EXISTS sub query and NOT EXISTS sub query. It determines the geographical location of customers who are likely to purchase, and calculates how many customers within a specific country code have not placed orders for 7 years, but their "positive" account balance is higher than the average level.
select cntrycode, count(*) as numcust, sum(c_acctbal) as totacctbal from ( //First level subquery select substring(c_phone from 1 for 2) as cntrycode, c_acctbal from customer where // I1... I7 are any values that are not repeated among the possible values of the country code defined in TPC-H substring(c_phone from 1 for 2) in ('[I1]','[I2]','[I3]','[I4]','[I5]','[I6]','[I7]') and c_acctbal > (//Level 2 aggregate subquery select avg(c_acctbal) from customer where c_acctbal > 0.00 and substr (c_phone from 1 for 2) in ('[I1]','[I2]','[I3]','[I4]','[I5]','[I6]','[I7]') ) and not exists (//Tier 2 NOT EXISTS subquery select * from orders where o_custkey = c_custkey ) ) as custsale group by cntrycode order by cntrycode;
reference material
Pay attention to official account big data research society: BigDataRLC, first time to read the latest articles.