TPC-H table structure and 22 SQL query statements

Introduction to TPC and TPC-H

According to TPC official website( http://tpc.org )TPC is a non-profit company, focusing on developing data centric benchmark standards and disseminating objective and verifiable data to the industry. TPC releases TPC-C, TPC-H and TPC-DS, which are widely used in the industry. TPC-C is the test benchmark for OLTP, while TPC-H and TPC-DS are usually the test benchmarks for OLAP.

TPC-H is a Decision Support Benchmark, which consists of a set of business oriented special queries and concurrent data modification. The data that queries and populates the database has a wide range of industry relevance. This benchmark demonstrates a decision support system that examines large amounts of data, performs highly complex queries, and answers key business questions. The performance indicators reported by TPC-H are called TPC-H hourly composite query performance indicators( QphH@Size ), which reflects many aspects of the system's query processing ability. These aspects include the database size selected when executing queries, the query processing capacity when submitting queries by a single stream, and the query throughput when submitting queries by multiple concurrent users.

On February 18, 2021, TPC-H Standard Specification released V3 version. Download link: http://tpc.org/tpc_documents_...

TPC-H table structure and relationship

TPC-H test standard simulates a system for online sales of parts and components, and defines 8 tables in total:

  • REGION region table
  • NATION country table
  • SUPPLIER supplier table
  • PART parts list
  • PARTSUPP parts supply table
  • CUSTOMER customer table
  • ORDERS order form
  • LINEITEM order details

The structure, data volume and mutual relationship of each table are shown in the figure below. The column name prefix of the table is in the brackets behind the table name, the arrow points to the direction of one to many relationship between tables, and the number or formula below the table name is the cardinality (row number) of the table. The cardinality in the LINEITEM table is an approximate value, and sf in the cardinality is a scale factor. It is used to generate data sets with different data sizes. sf must be selected from the following fixed values: 1 / 10 / 30 / 100 / 1000 / 3000 / 10000, which is equivalent to 1GB/10GB/30GB/100GB/1TB/3TB/1PB.

Pay attention to official account big data research society: BigDataRLC, first time to read the latest articles.

TPC-H query SQL statement

Q1: Pricing summary report query

Pricing summary report query is a single table query operation with grouping, sorting and aggregation. It reports the invoiced, shipped and returned business volume; Provides a summary pricing report for all order items shipped on a given date, by l_returnflag and l_linestatus groups and sorts, and lists the extended price, discount extended price, discount extended price plus tax, average quantity, average extended price, average discount, and the total number of rows in each group.

select
  l_returnflag,
  l_linestatus,
  sum(l_quantity) as sum_qty,
  sum(l_extendedprice) as sum_base_price, 
  sum(l_extendedprice*(1-l_discount)) as sum_disc_price, 
  sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge, 
  avg(l_quantity) as avg_qty,
  avg(l_extendedprice) as avg_price,
  avg(l_discount) as avg_disc,
  count(*) as count_order
from
    lineitem
where
    l_shipdate <= date '1998-12-01' - interval '[DELTA]' day (3)  //DELTA is a value within 60 ~ 120
group by 
    l_returnflag, l_linestatus 
order by
    l_returnflag, l_linestatus;

Q2: lowest cost supplier query

The lowest cost supplier query is a multi table query operation with sorting, aggregation and sub query. It is used to find the supplier that should be selected to place an order for a given part in a given region; In a given area, find suppliers who can supply at the lowest cost for each part of a specific type and size. If multiple suppliers in the region supply at the same (lowest) cost. The query lists the supplier's account balance, name and country, part number, manufacturer, supplier's address, telephone number and comment information.

select
    s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment 
  /*Query the supplier's account balance, name, country, part number, manufacturer, supplier's address, telephone number and remarks */
from
    part, supplier, partsupp, nation, region //Five meter connection
where
    p_partkey = ps_partkey
    and s_suppkey = ps_suppkey
    and p_size = [SIZE] //Specify the size and select randomly in the interval [1, 50]
    and p_type like '%[TYPE]' //Specify the type and select randomly within the range specified by TPC-H standard
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = '[REGION]' //Designated areas are randomly selected within the range specified by TPC-H standard
    and ps_supplycost = ( //Subquery
        select
            min(ps_supplycost) //Aggregate function
        from
            partsupp, supplier, nation, region //Overlap with table of parent query
        where
            p_partkey = ps_partkey
            and s_suppkey = ps_suppkey
            and s_nationkey = n_nationkey
            and n_regionkey = r_regionkey
            and r_name = '[REGION]'
    )
order by //sort
    s_acctbal desc,
    n_name,
    s_name,
    p_partkey;

Q3: transportation priority query

Transportation priority query is a three table query operation with grouping, sorting and aggregation. It retrieves the unsealed orders with the highest value; Among the orders that have not been shipped on a given date, the transportation priority and potential revenue of the order with the largest revenue (the sum of l_extendedprice * (1-l_discount)), and the orders are sorted in descending order of revenue.

select
    l_orderkey,
    sum(l_extendedprice*(1-l_discount)) as revenue, //Potential revenue, aggregation operations
    o_orderdate,
    o_shippriority
from
    customer, orders, lineitem //Three meter connection
where
    c_mktsegment = '[SEGMENT]' //Randomly selected within the range specified by TPC-H standard
    and c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate < date '[DATE]' //Specify the date period and select it randomly in [1995-03-01, 1995-03-31]
    and l_shipdate > date '[DATE]'
group by //Grouping operation
    l_orderkey, //Order ID
    o_orderdate, //Order date
    o_shippriority //Transportation priority
order by //Sort operation
    revenue desc, //In descending order, the potential maximum income is listed first
    o_orderdate;

Q4: order priority query

Order priority query is a single table query operation with grouping, sorting, aggregation and sub query. It determines the work of the order priority system and evaluates customer satisfaction; The number of orders in a given quarter, where the customer receives at least one order item after the commitment log, listing the count of such orders for each order priority sorted in ascending order of priority.

select
    o_orderpriority, //Order Priority 
    count(*) as order_count //Order priority count
from orders //Single table query
where
    o_orderdate >= date '[DATE]'
    and o_orderdate < date '[DATE]' + interval '3' month //Specify the time period of the order + three months. DATE is the first day of a month randomly selected between January 1993 and October 1997
    and exists ( //Subquery
        select
            *
        from
            lineitem
        where
            l_orderkey = o_orderkey
            and l_commitdate < l_receiptdate
    )
group by //Group by order priority
    o_orderpriority
order by //Sort by order priority
    o_orderpriority;

Q5: local supplier revenue query

Local supplier revenue query is a multi table join query operation with grouping, sorting, aggregation and sub query. It lists the revenue completed by local suppliers; List the revenue generated by order item transactions for each country in a region to determine whether to establish a local distribution center in a given region.

select
    n_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue //Aggregate operation
from
    customer,orders,lineitem,supplier,nation,region //Six meter connection
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and l_suppkey = s_suppkey
    and c_nationkey = s_nationkey
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = '[REGION]' //Designated areas are randomly selected within the range specified by TPC-H standard
    and o_orderdate >= date '[DATE]' //DATE is January 1 of a randomly selected year from 1993 to 1997
    and o_orderdate < date '[DATE]' + interval '1' year
group by //Group by name
    n_name
order by //Sort by income in descending order. Pay attention to the difference between grouping and sorting clauses
    revenue desc;

Q6: forecast revenue change query

Forecast revenue change query is a single table query operation with aggregation operation, which quantifies the revenue growth caused by eliminating discounts in some companies within a given percentage range in a given year, and is used to find ways to increase revenue; For all order items shipped in a given year, the DISCOUNT is between "count-0.01" and "count + 0.01", if l_ The increase in total revenue after the DISCOUNT of orders with quantity less than quantity is eliminated.

select
    sum(l_extendedprice*l_discount) as revenue //Potential revenue increase
from
    lineitem //Single table query
where
    l_shipdate >= date '[DATE]' //DATE is January 1 of a year randomly selected from [1993, 1997]
    and l_shipdate < date '[DATE]' + interval '1' year //Within one year
    and l_discount between [DISCOUNT] - 0.01 and [DISCOUNT] + 0.01 //between
    and l_quantity < [QUANTITY]; // QUANTITY is randomly selected in the interval [24, 25]

Q7: batch shipment query

Batch shipment query is a multi table query operation with grouping, sorting, aggregation and sub query operations. It determines the value of goods directly transported in some countries to help renegotiate transportation contracts; For two given countries, the total discount income obtained from the order items of suppliers in one country shipping parts to customers in the other country between 1995 and 1996.

select
    supp_nation, //Supplier country
    cust_nation, //Customer country
    l_year, sum(volume) as revenue //Annual and annual freight revenue
from ( //Subquery
    select
        n1.n_name as supp_nation,
        n2.n_name as cust_nation,
        extract(year from l_shipdate) as l_year,
        l_extendedprice * (1 - l_discount) as volume
    from
        supplier,lineitem,orders,customer,nation n1,nation n2 //Six meter connection
    where
        s_suppkey = l_suppkey
        and o_orderkey = l_orderkey
        and c_custkey = o_custkey
        and s_nationkey = n1.n_nationkey
        and c_nationkey = n2.n_nationkey
        and ( // The values of NATION2 and NATION1 are different, which means that the query is about transnational freight transportation
            (n1.n_name = '[NATION1]' and n2.n_name = '[NATION2]')
            or (n1.n_name = '[NATION2]' and n2.n_name = '[NATION1]')
        )
        and l_shipdate between date '1995-01-01' and date '1996-12-31'
    ) as shipping
group by
    supp_nation,
    cust_nation,
    l_year
order by
    supp_nation,
    cust_nation,
    l_year;

Q8: national market share query

The national market share query has the query operations of grouping, sorting, aggregation and sub query, so as to determine the change of the market share of a given country in a given region for a given part type within two years; The market share of a given country in a given region is defined as part of revenue, i.e. L_ The sum of extendedprice * (1-l_discount).

select
    o_year, //particular year
    sum(case
    when nation = '[NATION]'//Designated countries are selected randomly within the range specified in TPC-H standard
    then volume
    else 0
    end) / sum(volume) as mkt_share //Market share: the percentage of revenue from a specific category of products; Aggregate operation
from ( //Subquery
    select
        extract(year from o_orderdate) as o_year, //Break down the year
        l_extendedprice * (1-l_discount) as volume, //Revenue from specific types of products
        n2.n_name as nation
    from
        part,supplier,lineitem,orders,customer,nation n1,nation n2,region //Eight meter connection
    where
        p_partkey = l_partkey
        and s_suppkey = l_suppkey
        and l_orderkey = o_orderkey
        and o_custkey = c_custkey
        and c_nationkey = n1.n_nationkey
        and n1.n_regionkey = r_regionkey
        and r_name = '[REGION]' //Designated areas are randomly selected within the range specified by TPC-H standard
        and s_nationkey = n2.n_nationkey
        and o_orderdate between date '1995-01-01' and date '1996-12-31' //Only check the situation in 1995 and 1996
        and p_type = '[TYPE]' //Specify the part type and select it randomly within the range specified in TPC-H standard
) as all_nations
group by //Grouping by year
    o_year
order by //Sort by year
    o_year;

Q9: product type profit measurement query

Product type profit measurement query is a query operation with grouping, sorting, aggregation and sub query operations to determine the profit of a given part series broken down by supplier country and year.

select
    nation,
    o_year,
    sum(amount) as sum_profit //Total profit of all ordered parts in each country in each year
from(
  select
        n_name as nation, //country
        extract(year from o_orderdate) as o_year, //Take out the year
        l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount //profit
    from
        part,supplier,lineitem,partsupp,orders,nation //Six meter connection
    where
        s_suppkey = l_suppkey
        and ps_suppkey = l_suppkey
        and ps_partkey = l_partkey
        and p_partkey = l_partkey
        and o_orderkey = l_orderkey
        and s_nationkey = n_nationkey
        and p_name like '%[COLOR]%' //LIKE operation, query optimizer may optimize
) as profit
group by //Grouping by country and year
    nation,
    o_year
order by //Sorted by country and year, the older one comes first
    nation,
    o_year desc;

Q10: return report query

Return report query is a multi table join query operation with grouping, sorting and aggregation operations to identify customers who may have problems with the parts shipped to them; Identify the top ranked customers who have returned parts based on the impact on the loss of revenue for a given quarter.

select
    c_custkey, c_name, //Customer information
    sum(l_extendedprice * (1 - l_discount)) as revenue, //Loss of income
    c_acctbal,
    n_name, c_address, c_phone, c_comment //Country, address, telephone number, opinion information, etc
from
    customer, orders, lineitem, nation
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate >= date '[DATE]' // DATE is the first of any month from January 1993 to December 1994
    and o_orderdate < date '[DATE]' + interval '3' month //Within 3 months
    and l_returnflag = 'R' //The goods were returned
    and c_nationkey = n_nationkey
group by
    c_custkey,
    c_name,
    c_acctbal,
    c_phone,
    n_name,
    c_address,
    c_comment
order by
    revenue desc;

Q11: inventory value query

Inventory value query is a multi table join query operation with grouping, sorting, aggregation and sub query operations to find the inventory value of suppliers in a given country / region; Scan the available inventory of suppliers in a given country to find all parts that account for a significant percentage of the total value of all available parts.

select
    ps_partkey,
    sum(ps_supplycost * ps_availqty) as value //Aggregate operation, total value of goods
from
    partsupp, supplier, nation
where
    ps_suppkey = s_suppkey
    and s_nationkey = n_nationkey
    and n_name = '[NATION]'
group by
    ps_partkey 
having //Grouping operation with HAVING clause
    sum(ps_supplycost * ps_availqty) > ( //A subquery is included in the HAVING clause
        select
            sum(ps_supplycost * ps_availqty) * [FRACTION] //Aggregation operation exists in sub query; FRACTION is 0.0001/SF1
        from
            partsupp, supplier, nation //Consistent with the table join of the parent query
        where //Consistent with WHEWR condition of parent query
            ps_suppkey = s_suppkey
            and s_nationkey = n_nationkey
            and n_name = '[NATION]' //Designated country
    )
order by //Sort goods in descending order of value
    value desc;

Q12: shipping method and order priority query

Shipping method and order priority query are two table connection query operations with grouping, sorting and aggregation operations. Determine whether choosing a cheaper shipping method will have a negative impact on key priority orders, resulting in consumers receiving more goods after the contract date.

select
    l_shipmode,
    sum(//Aggregate operation
        case when 
      o_orderpriority ='1-URGENT' //OR operation, one of the two can be satisfied, and the original OR HIGH can be selected
            or o_orderpriority ='2-HIGH'
        then 1
    else 0
        end) as high_line_count,
    sum(
    case when 
        o_orderpriority <> '1-URGENT' //AND operation, both of which are not satisfied. It is not URGENT AND not HIGH
            and o_orderpriority <> '2-HIGH'
        then 1
        else 0
        end) as low_line_count
from
    orders,lineitem
where
    o_orderkey = l_orderkey
    and l_shipmode in ('[SHIPMODE1]', '[SHIPMODE2]') 
  /* Specify the type of freight mode and select it randomly within the range specified in TPC-H standard. SHIPMODE2 must be different from SHIPMODE1 */
    and l_commitdate < l_receiptdate
    and l_shipdate < l_commitdate
    and l_receiptdate >= date '[DATE]' //January 1st of any year from 1993 to 1997
    and l_receiptdate < date '[DATE]' + interval '1' year //Within 1 year
group by //Grouping operation
    l_shipmode
order by //Sort operation
    l_shipmode;

Q13: customer distribution query

Customer distribution query is a query operation with grouping, sorting, aggregation, sub query and left external connection; Determine the customer distribution based on the customer's order quantity, calculate and report how many customers have no orders, and how many customers have 1 / 2 / 3 orders.

select
    c_count, count(*) as custdist //Aggregate operation to count the number of each group
from (//Subquery
    select
        c_custkey,
        count(o_orderkey)
    from
        customer left outer join orders 
  on //The subquery includes left outer join operations
        c_custkey = o_custkey
        and o_comment not like '%[WORD1]%[WORD2]%' //LIKE operation
            //WORD1 is any of the following four possible values: special, pending, unused, express
            //WORD2 is any of the following four possible values: packages, requests, accounts and deposits
    group by //Grouping operations in subqueries
        c_custkey
) as c_orders (c_custkey, c_count)
group by //Grouping operation
    c_count
order by //Sort operation
    custdist desc, //Sort descending from large to small
    c_count desc;

Q14: promotion effect query

Promotion effect query is a query operation with grouping, sorting, aggregation, sub query and left outer connection to monitor the market's response to promotion; Determine what percentage of revenue in a given year and month is related to the promotion portion.

select
    100.00 * sum(
    case when p_type like 'PROMO%' //Promotional parts
        then l_extendedprice*(1-l_discount) //Income at a particular time
        else 0
        end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
    lineitem, part
where
    l_partkey = p_partkey
    and l_shipdate >= date '[DATE]' // DATE is the first day of any month in any year from 1993 to 1997
    and l_shipdate < date '[DATE]' + interval '1' month;

Q15: top supplier query

Top level supplier query is a connection operation between common tables and views with sub query operations of sorting, aggregation and aggregation. It determines top-level suppliers to reward them, give them more business or obtain special recognition; The supplier that contributes the most to the total shipment in a given quarter of a given annual fee.

create view revenue[STREAM_ID](supplier_no, total_revenue) as //Create complex views with grouping operations
select
    l_suppkey,
    sum(l_extendedprice * (1 - l_discount)) //Obtain the total profit brought by the supplier to the company
from
    lineitem
where
    l_shipdate >= date '[DATE]' //DATE is the first of any month from January 1993 to October 1997
    and l_shipdate < date '[DATE]' + interval '3' month //Within 3 months
group by //The grouping key is the same as one of the query objects
    l_suppkey;


//Query statement
select
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
from
    supplier,revenue[STREAM_ID] //Join common tables to complex views
where
    s_suppkey = supplier_no
    and total_revenue = (//Aggregate subquery
        select
            max(total_revenue)
        from
            revenue[STREAM_ID] //Aggregate subqueries get data from views
    )
order by
    s_suppkey;
  
//Delete view
drop view revenue[STREAM_ID];

TPC-H standard defines the variant SQL equivalent to Q15 statement, uses the WITH clause, and then uses the object of WITH to connect WITH the table. The deformed SQL statement is as follows:

WITH revenue (supplier_no, total_revenue) as (
    SELECT
        l_suppkey,
        SUM(l_extendedprice * (1-l_discount))
    FROM
        lineitem
    WHERE
        l_shipdate >= date ':1'
        AND l_shipdate < date ':1' + interval '3' month
    GROUP BY
        l_suppkey
)

SELECT
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
FROM
    supplier,
    revenue
WHERE
    s_suppkey = supplier_no
    AND total_revenue = (
        SELECT
            MAX(total_revenue)
        FROM
            revenue
    )
ORDER BY
    s_suppkey;

Q16: parts / supplier relationship query

Component / supplier relationship query is a two table connection operation with grouping, sorting, aggregation, de duplication and NOT IN sub query; Find out how many suppliers can provide parts with given attributes.

select
    p_brand,
    p_type,
    p_size,
    count(distinct ps_suppkey) as supplier_cnt //Aggregation and de duplication
from
    partsupp,
    part
where
    p_partkey = ps_partkey
    and p_brand <> '[BRAND]'
        // Brand = brand MN, M and N are two letters, representing two values, which are independent of each other, and the value is between 1 and 5
    and p_type not like '[TYPE]%' //Types and sizes that consumers are not interested in
    and p_size in ([SIZE1], [SIZE2], [SIZE3], [SIZE4], [SIZE5], [SIZE6], [SIZE7], [SIZE8])
        //TYPEX is a set of eight different values arbitrarily selected between 1 and 50
    and ps_suppkey not in ( //NOT IN sub query, consumers exclude some suppliers
        select
            s_suppkey
        from
            supplier
        where
            s_comment like '%Customer%Complaints%'
    )
group by //Grouping operation
    p_brand,
    p_type,
    p_size
order by //Sort operation
    supplier_cnt desc, //In descending order of quantity, in ascending order of brand, type and size
    p_brand,
    p_type,
    p_size;

Q17: small order revenue query

Small order revenue query is a two table connection operation with aggregation and aggregation sub query; Determine how much revenue will be lost per year if orders are not completed for a small number of specific parts, and reduce management expenses by focusing on larger shipments; For parts of a given brand and container type, determine the average order quantity of such parts ordered by all orders in the database within 7 years. If no more orders below the average level of 20% are received, what is the average annual total revenue loss.

select
    sum(l_extendedprice) / 7.0 as avg_yearly //Aggregate operation
from
    lineitem, part
where
    p_partkey = l_partkey
    and p_brand = '[BRAND]' /*Designated BRAND. BRAND = 'Brand#MN', M and N are two letters, representing two values, which are independent of each other, and the value is between 1 and 5 */
    and p_container = '[CONTAINER]' //Specify the packaging type. Randomly selected within the range specified by TPC-H standard
    and l_quantity < ( //Aggregate subquery
        select
            0.2 * avg(l_quantity)
        from
            lineitem
        where
            l_partkey = p_partkey
);

Q18: mass customer query

Mass customer query is a three table connection operation with grouping, sorting, aggregation and IN sub query. Customers are ranked according to their mass orders.

select
    c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, //essential information
    sum(l_quantity) //Total orders
from
    customer, orders, lineitem
where
    o_orderkey in ( //IN subquery with grouping operation
        select
            l_orderkey
        from
            lineitem
        group by 
            l_orderkey 
    having
            sum(l_quantity) > [QUANTITY] // QUANTITY is any value between 312 and 315
    )
    and c_custkey = o_custkey
    and o_orderkey = l_orderkey 
group by
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice
order by
    o_totalprice desc,
    o_orderdate;

Q19: discount revenue query

Discount revenue query is a three table connection operation with grouping, sorting, aggregation and IN sub query. It is attributed to the total discount revenue of selected parts processed IN a specific way. This query is an example of using data mining tools to generate formatting code; Total discount revenue for all orders of three different categories for some air or manual transportation parts. The selection of parts takes into account the specific brand, packaging and size range.

select
    sum(l_extendedprice * (1 - l_discount) ) as revenue
from
    lineitem, part
where (
        p_partkey = l_partkey
        and p_brand = '[BRAND1]' /*Specific brands. BRAND1, BRAND2 and BRAND3 = 'brand MN', M and N are two letters representing two values, which are independent of each other, and the value is between 1 and 5 */
        and p_container in ( 'SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') //Packaging scope
        and l_quantity >= [QUANTITY1] and l_quantity <= [QUANTITY1] + 10 /* QUANTITY1 Is any value between 1 and 10 */
        and p_size between 1 and 5 //Size range
        and l_shipmode in ('AIR', 'AIR REG') //For the transportation mode, the conditions indicated in bold with shadow below are the same, and there is the possibility of condition simplification
        and l_shipinstruct = 'DELIVER IN PERSON'
    ) or (
        p_partkey = l_partkey
        and p_brand = '[BRAND2]'
        and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
        and l_quantity >= [QUANTITY2] and l_quantity <= [QUANTITY2] + 10 /* QUANTITY2 Is any value between 10 and 20 */
        and p_size between 1 and 10
    and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    ) or (
    p_partkey = l_partkey
    and p_brand = '[BRAND3]'
        and p_container in ( 'LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
        and l_quantity >= [QUANTITY3] and l_quantity <= [QUANTITY3] + 10 /* QUANTITY3 Is any value between 20 and 30 */
        and p_size between 1 and 15
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    );

Q20: potential parts promotion query

Q20 statement query determines to find the supply goods in a specified country that can provide a more competitive price for a part commodity in a certain year. The so-called more competitive suppliers refer to those suppliers with excess parts. If they exceed 50% of the parts shipped by the supplier or supplier in a given country in a given year, it is excess.
Potential parts promotion query is a two table connection operation with sorting, aggregation, IN sub query and common sub query. It identifies suppliers IN specific countries. The parts they select may be candidates for promotion quotation; Identify suppliers who provide too many given parts, which is defined as more than 50% of the parts.

select
    s_name, s_address
from
    supplier, nation
where
    s_suppkey in ( //IN sub query of the first layer
        select
            ps_suppkey
        from
            partsupp
        where
            ps_partkey in ( //Second level nested IN subquery
                select
                    p_partkey
                from
                    part
                where
                    p_name like '[COLOR]%' //COLOR is p_ Any value in the list of values for name
            )
            and ps_availqty > (//Second level nested subquery
                select
                    0.5 * sum(l_quantity) //Aggregate subquery
                from
                    lineitem
                where
                    l_partkey = ps_partkey
                    and l_suppkey = ps_suppkey
                    and l_shipdate >= date('[DATE]') // DATE is January 1 of any year from 1993 to 1997
                    and l_shipdate < date('[DATE]') + interval '1' year //Within 1 year
            )
    )
  and s_nationkey = n_nationkey
  and n_name = '[NATION]' //Any value defined in TPC-H standard
order by
    s_name;

Q21: supplier leaves a document for query

The supplier's document waiting query is a four table connection operation with grouping, sorting, aggregation, EXISTS sub query and NOT EXISTS sub query. It identifies the supplier who cannot deliver goods in time.

select
    s_name, count(*) as numwait
from
    supplier, lineitem l1, orders, nation
where
    s_suppkey = l1.l_suppkey
    and o_orderkey = l1.l_orderkey
    and o_orderstatus = 'F'
    and l1.l_receiptdate > l1.l_commitdate
    and exists ( //EXISTS subquery
        select
            *
        from
            lineitem l2
        where
            l2.l_orderkey = l1.l_orderkey
            and l2.l_suppkey <> l1.l_suppkey
    )
    and not exists ( //NOT EXISTS subquery
        select
            *
        from
            lineitem l3
        where
            l3.l_orderkey = l1.l_orderkey
            and l3.l_suppkey <> l1.l_suppkey
            and l3.l_receiptdate > l3.l_commitdate
    )
    and s_nationkey = n_nationkey
    and n_name = '[NATION]' //Any value defined in TPC-H standard
group by
    s_name
order by
    numwait desc,
    s_name;

Q22: global opportunity query

Global sales opportunity query is a four table connection operation with grouping, sorting, aggregation, EXISTS sub query and NOT EXISTS sub query. It determines the geographical location of customers who are likely to purchase, and calculates how many customers within a specific country code have not placed orders for 7 years, but their "positive" account balance is higher than the average level.

select
    cntrycode,
    count(*) as numcust,
    sum(c_acctbal) as totacctbal
from ( //First level subquery
    select
        substring(c_phone from 1 for 2) as cntrycode,
        c_acctbal
    from
        customer
    where
        // I1... I7 are any values that are not repeated among the possible values of the country code defined in TPC-H
        substring(c_phone from 1 for 2) in ('[I1]','[I2]','[I3]','[I4]','[I5]','[I6]','[I7]')
        and c_acctbal > (//Level 2 aggregate subquery
            select
                avg(c_acctbal)
            from
                customer
            where
                c_acctbal > 0.00
                and substr (c_phone from 1 for 2) in ('[I1]','[I2]','[I3]','[I4]','[I5]','[I6]','[I7]')
        )
        and not exists (//Tier 2 NOT EXISTS subquery
            select
                *
            from
                orders
            where
                o_custkey = c_custkey
        )
    ) as custsale
group by
    cntrycode
order by
    cntrycode;

reference material

Pay attention to official account big data research society: BigDataRLC, first time to read the latest articles.

Keywords: SQL

Added by sitestem on Sat, 22 Jan 2022 03:09:07 +0200