The result of your warehouse function is unstable. Maybe the attribute is specified incorrectly

Abstract: This paper briefly introduces the relevant knowledge of GaussDB(DWS) function push down attributes, and provides several typical cases related to function attributes for your reference.

This article is shared from Huawei cloud community< Introduction to GaussDB(DWS) function push down attribute >, original author: Arrow0lf.

When using GaussDB(DWS), users should specify function attributes correctly. Specifying function attributes incorrectly will not only lead to low efficiency of query statement execution, but also lead to unstable result set. This paper briefly introduces the relevant knowledge of GaussDB(DWS) function push down attributes, and provides several typical cases related to function attributes for your reference.

1. Introduction to function push down attribute

When creating a function, GaussDB(DWS) can specify many function attributes, among which the attributes related to function push down are volatile level {and push down attributes, among which:

Volatility:

  • IMMUTABLE: the function of this attribute will not modify the database, and it is guaranteed that the same input parameters will always return the same results in any case;
  • STABLE: the function of this attribute will not modify the database, and ensure that in the same query, for the same input parameters, the results returned by the function are the same;
  • VOLATILE: for the function of this attribute, the return result of the function may not be valid for the same input parameters, such as time of day. If it is not explicitly specified when creating the function, it will be VOLATILE by default;

Push down attribute:

  • SHIPPABLE: the function can be pushed down to DN for execution
  • NOT SHIPPABLE: the function cannot be pushed down to DN for execution. If it is not explicitly specified when creating the function, it defaults to NOT SHIPPABLE.

In GaussDB(DWS), the function of IMMUTABLE attribute can be pushed down to DN for execution. Whether the pushed down attribute is SHIPPABLE or not depends on the specified SHIPPABLE attribute. Therefore, if both IMMUTABLE and NOT SHIPPABLE attributes are specified when creating a function, the following prompt will appear when the function is created successfully:

NOTICE:  Immutable function will be shippable anyway.

2. Typical cases of function push down attribute

Case 1: the function is not pushed down because the volatility level of the function is not specified

The function is defined as follows:

create function try_cast_int(p_in text, p_default int default 0) returns int
as $$
begin
    begin
        return $1::int;
	exception
	when others then
        return p_default;
    end;
end;
$$
language plpgsql;

Since the volatile level and attribute of the function are not explicitly specified when creating the function, the function defaults to VOLATILE NOT SHIPPABLE. When using the function, the execution plan is as follows:

postgres=# explain verbose select try_cast_int(b) from test order by a;
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Sort  (cost=13.91..14.04 rows=50 width=36)
   Output: (try_cast_int(test.b, 0)), test.a
   Sort Key: test.a
   ->  Data Node Scan on "__REMOTE_SORT_QUERY__"  (cost=0.00..12.50 rows=50 width=36)
         Output: try_cast_int(test.b, 0), test.a
         Node/s: All datanodes
         Remote query: SELECT a, b FROM ONLY public.test WHERE true ORDER BY 1
(7 rows)

It can be seen that the sql execution plan is not pushed down and the execution efficiency is low. After analyzing the function, it is found that the function can be specified as IMMUTABLE attribute so that the function can be pushed down. Therefore, it can be optimized in the following ways:

ALTER FUNCTION try_cast_int(text,int) IMMUTABLE;

Case 2: the function push down attribute is incorrectly specified, resulting in the instability of the result set

Push down functions can be pushed down to DN for execution, which has higher execution efficiency than non push down functions. Sometimes, in order to speed up the execution efficiency of functions, developers will specify SHIPPABLE when creating all user-defined functions. A function is defined as follows:

create function get_count() returns int
SHIPPABLE
as $$
declare
    result int;
begin
    result = (select count(*) from test);  --test Table is hash surface
	return result;
end;
$$
language plpgsql;

Calling this function found the following phenomena:

postgres=# select get_count();
 get_count 
-----------
      2106
(1 row)

postgres=# select get_count() from t_src;
 get_count 
-----------
      1032
(1 row)

It is found that the return value of the function has changed after adding the from table! Why does this happen? This is because this function specifies the function attribute of SHIPPABLE, so when generating a plan, the function will be pushed down to the DN for execution. After the function is pushed down to the DN, because the test table in the function definition is a hash table, there is only part of the data in each DN, so the result returned by select count(*) from test is not the result of the full data in the test table, It is the result of some data on each DN. Therefore, the expected return of the function changes after adding the from table. The optimization method is as follows:

(1) Change the function to not push down: alter function get_count() not shippable;

(2) Change the table used in the function into a copy table, so that each DN is a copy of the full data of the table. Even if it is pushed down to the DN for execution, it can ensure that the result set meets the expectations.

3. Summary

When creating a custom function, you should correctly specify the properties of the function to ensure that the function properties meet the expectations, so as to prevent performance degradation or unstable result set caused by incorrect setting of function properties.

 

Click follow to learn about Huawei's new cloud technology for the first time

Keywords: GaussDB

Added by matthijs on Tue, 08 Mar 2022 08:55:50 +0200