PostgreSQL escape, UNICODE, and SQL injection

Label

PostgreSQL, json, string escape, unicode, SQL injection, backslash_quote, escape_string_warning, standard_conforming_strings

background

Through this article, you can understand:

1. How to input special characters in a string, such as carriage return.

2. How to enter single quotation marks in a string.

3. What is an escape character? Usage.

4. Is backslash an escape character in the SQL standard? How does PostgreSQL use backslash escape? How about the writing format?

5. Why is there a risk of SQL injection when escaping single quotes? What parameters of the database can control this risk? What parameters can be warned?

6. How to input strings in the form of UNICODE.

7. Enter in UNICODE format, support all character sets? What character sets are currently supported and what CODE range are supported?

8. What's wrong with entering UNICODE value in Greenplum's JSON? Is there the same problem in PostgreSQL?

Escape configuration

Three configuration controls are escaped as follows:

1. (Related to SQL injection) Is backslash escape single quotation marks allowed?

backslash_quote = on allows, off does not allow, safe_encoding (only when client_encoding does not allow backslash\ to appear in multi-byte characters (most character sets are single-byte representations), then escaped single quotes are allowed.)

Why should we control the escape single quotation marks?

Because this may introduce the risk of SQL injection, such as when the end customer puts at the end, the single quotation mark can be escaped, and the string terminator that should have been typed normally does not exist.

backslash_quote (enum)    
    
This controls whether a quote mark can be represented by \' in a string literal.     
    
The preferred, SQL-standard way to represent a quote mark is by doubling it ('') but PostgreSQL has historically also accepted \'.     
    
However, use of \' creates security risks because in some client character set encodings, there are multibyte characters in which the last byte is numerically equivalent to ASCII \.     
    
If client-side code does escaping incorrectly then a SQL-injection attack is possible.     
    
This risk can be prevented by making the server reject queries in which a quote mark appears to be escaped by a backslash.     
    
The allowed values of backslash_quote are on (allow \' always), off (reject always), and safe_encoding (allow only if client encoding does not allow ASCII \ within a multibyte character).     
    
safe_encoding is the default setting.    
    
Note that in a standard-conforming string literal, \ just means \ anyway. This parameter only affects the handling of non-standard-conforming literals, including escape string syntax (E'...').    

Note that the configuration of backslash_quote only works when standard_conforming_strings=off or when E''is used.

Otherwise, when standard_conforming_strings=on, will be treated as a normal string.

Example 1: When standard_conforming_strings=on, is treated as a normal string

postgres=# set standard_conforming_strings=on;    
SET    
postgres=# select '\';                            
 ?column?     
----------    
 \    
(1 row)    

Example 2: The configuration of backslash_quote works when standard_conforming_strings=off or using the E''notation.

postgres=# set backslash_quote = off; -- escape is not allowed    
SET    
postgres=# select E'\';                  
';    
ERROR:  22P06: unsafe use of \' in a string literal    
LINE 1: select E'\';    
               ^    
HINT:  Use '' to write quotes in strings. \' is insecure in client-only encodings.    
LOCATION:  core_yylex, scan.l:641    
postgres=# set backslash_quote = off; -- escape is not allowed    
SET    
postgres=# set standard_conforming_strings=off;    
SET    
postgres=# select '\';                             
';    
ERROR:  22P06: unsafe use of \' in a string literal    
LINE 1: select '\';    
               ^    
HINT:  Use '' to write quotes in strings. \' is insecure in client-only encodings.    
LOCATION:  core_yylex, scan.l:641    
    
    
postgres=# set backslash_quote = on; -- allow escape'    
SET    
postgres=# select '\';                             
';    
WARNING:  22P06: nonstandard use of \' in a string literal    
LINE 1: select '\';    
               ^    
HINT:  Use '' to write quotes in strings, or use the escape string syntax (E'...').    
LOCATION:  check_string_escape_warning, scan.l:1483    
 ?column?     
----------    
 ';      +    
     
(1 row)    
    
    
postgres=# Setbackslash_quote = safe_encoding; -- When client_encoding does not allow the backslash ````` to appear in multi-byte characters (most character sets ````are single-byte representations'), it allows the use of ```escape single quotation marks'.    
SET    
postgres=# show client_encoding;    
 client_encoding     
-----------------    
 UTF8    
(1 row)    
    
postgres=# select '\';                             
';    
WARNING:  22P06: nonstandard use of \' in a string literal    
LINE 1: select '\';    
               ^    
HINT:  Use '' to write quotes in strings, or use the escape string syntax (E'...').    
LOCATION:  check_string_escape_warning, scan.l:1483    
 ?column?     
----------    
 ';      +    
     
(1 row)    

2. Whether to output a warning when standard_conforming_strings=off and include backslashes in the''string.

Because in the SQL standard, backslashes in strings are not escape characters, but ordinary characters.

If you want to escape, use the E'.

escape_string_warning (boolean)    
    
When on, a warning is issued if a backslash (\) appears in an ordinary string literal ('...' syntax) and standard_conforming_strings is off.     
    
The default is on.    
    
Applications that wish to use backslash as escape should be modified to use escape string syntax (E'...'),     
    
because the default behavior of ordinary strings is now to treat backslash as an ordinary character, per SQL standard.     
    
This variable can be enabled to help locate code that needs to be changed.    

Example 1

postgres=# set escape_string_warning=on;    
SET    
postgres=# set standard_conforming_strings=off;    
SET    
postgres=# select '\';                         
';    
WARNING:  22P06: nonstandard use of \' in a string literal  -- Receive warning message    
LINE 1: select '\';    
               ^    
HINT:  Use '' to write quotes in strings, or use the escape string syntax (E'...').    
LOCATION:  check_string_escape_warning, scan.l:1483    
 ?column?     
----------    
 ';      +    
     
(1 row)    

Example 2

postgres=# set escape_string_warning=on;    
SET    
postgres=# set standard_conforming_strings=off;    
SET    
postgres=# select E';; -- Use E',  as an escape character                    
';    
 ?column?     
----------    
 ';      +    
     
(1 row)    

3. Tell the database whether the backslash in the''string is a common character.

standard_conforming_strings = on backslashes are used as common characters (standard SQL usage), and off backslashes are used as escape characters.

standard_conforming_strings (boolean)    
    
This controls whether ordinary string literals ('...') treat backslashes literally, as specified in the SQL standard.     
    
Beginning in PostgreSQL 9.1, the default is on (prior releases defaulted to off).     
    
Applications can check this parameter to determine how string literals will be processed.     
    
The presence of this parameter can also be taken as an indication that the escape string syntax (E'...') is supported.     
    
Escape string syntax (Section 4.1.2.2) should be used if an application desires backslashes to be treated as escape characters.    

If standard_conforming_strings=on, what about using escape? Sampling E''is enough

postgres=# set standard_conforming_strings=on;    
SET    
postgres=# Select';'; - Standard Writing of SQL    
 ?column?     
----------    
 \    
(1 row)    
    
postgres=# select E'\;'; -- Escape writing, at this time as an escape character    
 ?column?     
----------    
 '    
(1 row)    

How to Transliterate

1. standard_conforming_strings=on(SQL standard), then the backslash in the string''is a common character. Using the E''notation, the backslash is the escape character.

2. Standard_conforming_strings = off (non-SQL standard), then the backslash in the string''is an escape character.

C Language Style String

Backslash Escape Sequence Interpretation
\b backspace
\f form feed
\n newline
\r carriage return
\t tab
\o, \oo, \ooo (o = 0 - 7) octal byte value
\xh, \xhh (h = 0 - 9, A - F) hexadecimal byte value
\uxxxx, \Uxxxxxxxx (x = 0 - 9, A - F) 16 or 32-bit hexadecimal Unicode character value
\\ Backslash
' Single quotation mark

unicode input

How do I enter unicode strings?

Users can input UNICODE encoding directly to the database, eliminating the encoding conversion process, but users must ensure that the encoding is consistent with the database server encoding, otherwise there may be coding overflow or scrambling problems. Once saved, it is the same as storing strings directly.

unicode input format 1

When the backslash is configured for escaped characters, enter UNICODE like this

(Note that this input format requires that the backslash must be an escape character, refer to the previous section, if the backslash becomes an escape character)

\uxxxx, \Uxxxxxxxx (x = 0 - 9, A - F)    
    
A lowercase letter    
    
16 or 32-bit hexadecimal Unicode character value    

Example

postgres=# set standard_conforming_strings =on;    
SET    
postgres=# SelectE'_ 00f7'; --_ is an escape character    
 ?column?     
----------    
 ÷    
(1 row)    
    
postgres=# Select'00f7'; -- is a common character, not a unicode, but a string.    
 ?column?     
----------    
 \u00f7    
(1 row)    
    
postgres=# set standard_conforming_strings =off; -- is an escape character    
SET    
postgres=# select '\u00f7';    
WARNING:  22P06: nonstandard use of escape in a string literal    
LINE 1: select '\u00f7';    
               ^    
HINT:  Use the escape string syntax for escapes, e.g., E'\r\n'.    
LOCATION:  check_escape_warning, scan.l:1508    
 ?column?     
----------    
 ÷    
(1 row)    
    
postgres=# create table u(id int, info text);    
CREATE TABLE    
postgres=# select E'\u00f0';    
 ?column?     
----------    
 e    
(1 row)    
    
postgres=# insert into u values (1, 'e');    
INSERT 0 1    
postgres=# insert into u values (2, E'\u00f0');    
INSERT 0 1    
postgres=# select * from u;    
 id | info     
----+------    
  1 | e    
  2 | e    
(2 rows)    

At present, only UTF8 character sets allow input of unicode greater than 007f, while other character sets can only input unicode in ascii range.

postgres=# \l    
                                      List of databases    
        Name        |  Owner   | Encoding  |  Collate   |   Ctype    |   Access privileges       
--------------------+----------+-----------+------------+------------+-----------------------    
 contrib_regression | postgres | UTF8      | C          | C          |     
 db                 | postgres | SQL_ASCII | C          | C          |     
 db1                | postgres | EUC_CN    | C          | C          |     
 postgres           | postgres | UTF8      | C          | C          |     
 template0          | postgres | UTF8      | C          | C          | =c/postgres          +    
                    |          |           |            |            | postgres=CTc/postgres    
 template1          | postgres | UTF8      | C          | C          | =c/postgres          +    
                    |          |           |            |            | postgres=CTc/postgres    
 test               | postgres | UTF8      | en_US.UTF8 | en_US.UTF8 |     
 test01             | postgres | UTF8      | C          | C          |     
 test02             | postgres | UTF8      | zh_CN.UTF8 | zh_CN.UTF8 |     
(9 rows)    
    
postgres=# \ C db1-EUC_CN Coding    
You are now connected to database "db1" as user "postgres".    
    
db1=# select U&'\00fe';    
ERROR:  42601: Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8 at or near "\00fe'"    
LINE 1: select U&'\00fe';    
                  ^    
LOCATION:  scanner_yyerror, scan.l:1086    
    
db1=# \ C Postgres - UTF8 encoding, allowing input of unicode larger than 007F    
You are now connected to database "postgres" as user "postgres".    
postgres=# select U&'\00fe';    
 ?column?     
----------    
 t    
(1 row)    

details

It is your responsibility that the byte sequences you create, especially when using the octal or hexadecimal escapes, compose valid characters in the server character set encoding.     
    
When the server encoding is UTF-8, then the Unicode escapes or the alternative Unicode escape syntax, explained in Section 4.1.2.3, should be used instead.     
    
(The alternative would be doing the UTF-8 encoding by hand and writing out the bytes, which would be very cumbersome.)    
    
The Unicode escape syntax works fully only when the server encoding is UTF8.     
    
When other server encodings are used, only code points in the ASCII range (up to \u007F) can be specified.     
    
Both the 4-digit and the 8-digit form can be used to specify UTF-16 surrogate pairs to compose characters with code points larger than U+FFFF, although the availability of the 8-digit form technically makes this unnecessary.     
    
(When surrogate pairs are used when the server encoding is UTF8, they are first combined into a single code point that is then encoded in UTF-8.)    
    
Caution    
    
If the configuration parameter standard_conforming_strings is off, then PostgreSQL recognizes backslash escapes in both regular and escape string constants.     
    
However, as of PostgreSQL 9.1, the default is on, meaning that backslash escapes are recognized only in escape string constants.     
    
This behavior is more standards-compliant, but might break applications which rely on the historical behavior, where backslash escapes were always recognized.     
    
As a workaround, you can set this parameter to off, but it is better to migrate away from using backslash escapes.     
    
If you need to use a backslash escape to represent a special character, write the string constant with an E.    
    
In addition to standard_conforming_strings, the configuration parameters escape_string_warning and backslash_quote govern treatment of backslashes in string constants.    
    
The character with the code zero cannot be in a string constant.    

unicode input format 2

Format 2, on the contrary, backslashes are not unicode when escaping characters.

This format supports 2 or 3 bytes of UNICODE.

The format is as follows.

U &' 4 [0-F]'    
    
U &'+6 [0-F]'    
    
u case insensitive    
    
Characters in other positions in the string are recognized as standard characters, while is an escape character of UNICODE.  must be followed by four [0-F] or + six [0-F]. If you want to enter, enter\.    

If you want to replace escape characters, use UESCAPE grammar.

Example 1

U&'d\0061t\+000061'    
    
U&'\0441\043B\043E\043D'    
    
U&'d!0061t!+000061' UESCAPE '!'  -- Change the escape character to!    
    
U&'\\'  -- Enter two escape characters and output the escape character itself    

Example 2, you can't use escape format

postgres=# set standard_conforming_strings =off; -- escape character mode,  in the''sign is an escape character, not an ordinary character    
SET    
postgres=# select U&'d\0061t\+000061';    
ERROR:  0A000: unsafe use of string constant with Unicode escapes    
LINE 1: select U&'d\0061t\+000061';    
               ^    
DETAIL:  String constants with Unicode escapes cannot be used when standard_conforming_strings is off.    
LOCATION:  core_yylex, scan.l:534    
postgres=# set standard_conforming_strings =on; -- non-escaped character mode, \ in''is a common character, at which point the use of u &'' is normal.    
SET    
postgres=# select U&'d\0061t\+000061';    
 ?column?     
----------    
 data    
(1 row)    

Example 3

postgres=# select U&'d\0061t\+000061';    
 ?column?     
----------    
 data    
(1 row)    
    
postgres=# select U&'\0441\043B\043E\043D';    
 ?column?     
----------    
 слон    
(1 row)    
    
postgres=# select U&'d!0061t!+000061' UESCAPE '!';    
 ?column?     
----------    
 data    
(1 row)    
    
postgres=# select U&'\\';    
 ?column?     
----------    
 \    
(1 row)    
    
postgres=# select U&'\\\+000061';    
 ?column?     
----------    
 \a    
(1 row)    
    
postgres=# select U&'\\!+000061' UESCAPE '!';    
 ?column?     
----------    
 \\a    
(1 row)    

details

PostgreSQL also supports another type of escape syntax for strings that allows specifying arbitrary Unicode characters by code point.     
    
A Unicode escape string constant starts with U& (upper or lower case letter U followed by ampersand) immediately before the opening quote, without any spaces in between, for example U&'foo'.     
    
(Note that this creates an ambiguity with the operator &. Use spaces around the operator to avoid this problem.)     
    
Inside the quotes, Unicode characters can be specified in escaped form by writing a backslash followed by the four-digit hexadecimal code point number     
    
or alternatively a backslash followed by a plus sign followed by a six-digit hexadecimal code point number.     
    
For example, the string 'data' could be written as    
    
U&'d\0061t\+000061'    
    
The following less trivial example writes the Russian word "slon" (elephant) in Cyrillic letters:    
    
U&'\0441\043B\043E\043D'    
    
If a different escape character than backslash is desired, it can be specified using the UESCAPE clause after the string, for example:    
    
U&'d!0061t!+000061' UESCAPE '!'    
    
The escape character can be any single character other than a hexadecimal digit, the plus sign, a single quote, a double quote, or a whitespace character.    
    
The Unicode escape syntax works only when the server encoding is UTF8. When other server encodings are used, only code points in the ASCII range (up to \007F) can be specified.     
    
Both the 4-digit and the 6-digit form can be used to specify UTF-16 surrogate pairs to compose characters with code points larger than U+FFFF,     
    
although the availability of the 6-digit form technically makes this unnecessary.     
    
(When surrogate pairs are used when the server encoding is UTF8, they are first combined into a single code point that is then encoded in UTF-8.)    
    
Also, the Unicode escape syntax for string constants only works when the configuration parameter standard_conforming_strings is turned on.     
    
This is because otherwise this syntax could confuse clients that parse the SQL statements to the point that it could lead to SQL injections and similar security issues.     
    
If the parameter is set to off, this syntax will be rejected with an error message.    
    
To include the escape character in the string literally, write it twice.    

String Writing

1. Single quotation marks

Example

'String content'

2. Dollar symbol

Example

$string content$$

unicode Problem in greenplum JSON

When a unicode native string is stored in the json of greenplum, an error may be reported by using - > extraction. (Estimates are exceptions caused by the conversion process)

Create test tables

postgres=# create table tbl(id int, info json);  
NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.  
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.  
CREATE TABLE  

Currently it is an escape mode, that is to say, UNICODE will be converted and stored in advance.

postgres=# set standard_conforming_strings =off; -- Set to escape mode, i.e. \ is used as escape character, and UNICODE format can also be recognized and converted normally.  

Insertion, UNICODE conversion to final string is inserted, extraction is normal

postgres=#  Insert into TBL (id, info) values (1,'{activitytitle":"Hello People's Republic of China","giftname":"_________________________  
WARNING:  nonstandard use of escape in a string literal  
LINE 1: insert into tbl(id,info) values (1,'{"activitytitle":"How are you doing?...  
                                           ^  
HINT:  Use the escape string syntax for escapes, e.g., E'\r\n'.  
INSERT 0 1  
postgres=# select info->'activitytitle' from tbl;  
       ?column?         
----------------------  
 "Hello People's Republic of China"  
(1 row)  
  
postgres=# select * from tbl;  
 id |                                      info                                        
----+--------------------------------------------------------------------------------  
  1 | {"activitytitle":"Hello People's Republic of China","giftname":"Shake a red envelope","ruledesc":null}  
(1 row)  

Use non-escape mode, that is to say, UNICODE strings are inserted as they are

At this point, an exception occurs to the extraction.

postgres=# set standard_conforming_strings =on; -- In non-escape mode,  will be inserted as ordinary characters, so it will not be recognized as unicode mode, inserting the original characters.  
SET  
postgres=#  Insert into TBL (id, info) values (1,'{activitytitle":"Hello People's Republic of China","giftname":"_________________________  
INSERT 0 1  
postgres=# select * from tbl;  
 id |                                                info                                                  
----+----------------------------------------------------------------------------------------------------  
  1 | {"activitytitle":"Hello People's Republic of China","giftname":"Shake a red envelope","ruledesc":null}  
  1 | {"activitytitle":"Hello People's Republic of China","giftname":"\u6447\u4E00\u6447\u7EA2\u5305","ruledesc":null}  
(2 rows)  
  
-- extract unicode The corresponding field failed  
postgres=# \set VERBOSITY verbose  
postgres=# select info->'activitytitle' from tbl;  
ERROR:  22021: invalid byte sequence for encoding "UTF8": 0xe6  (seg0 slice1 digoal:39999 pid=22528)  
LOCATION:  cdbdisp_finishCommand, cdbdisp.c:1326  
  
postgres=# select info->'giftname' from tbl;  
ERROR:  22021: invalid byte sequence for encoding "UTF8": 0xe6  (seg0 slice1 digoal:39999 pid=27053)  
LOCATION:  cdbdisp_finishCommand, cdbdisp.c:1326  

Solution

Using escape mode, UNICODE is escaped and stored in JSON.

There will be no such problem in normal strings, just the JSON type of Greenplum.

The JSON of PG 9.4 has no such problem, as follows.

postgres=# create table tbl(id int, info json);  
CREATE TABLE  
postgres=# set standard_conforming_strings =on;   
SET  
postgres=# Insert into TBL (id, info) values (1,'{activitytitle":"Hello People's Republic of China","giftname":"_________________________  
INSERT 0 1  
postgres=# select * from tbl;  
 id |                                                info                                                  
----+----------------------------------------------------------------------------------------------------  
  1 | {"activitytitle":"Hello People's Republic of China","giftname":"\u6447\u4E00\u6447\u7EA2\u5305","ruledesc":null}  
(1 row)  
  
postgres=# select info->'activitytitle' from tbl;  
       ?column?         
----------------------  
 "Hello People's Republic of China"  
(1 row)  
  
postgres=# select info->'giftname' from tbl;  
             ?column?               
----------------------------------  
 "\u6447\u4E00\u6447\u7EA2\u5305"  
(1 row)  

Summary

1. How to input special characters in a string, such as carriage return.

1. Enter special characters by escaping
E'\?'

2. Use UNIDOCE Method to input special characters, standard_conforming_strings=off, escape_string_warning=off
U&'\xxxx'

2. How to enter single quotation marks in a string.

4 Input single quotation marks in strings in one way or another

U&'\????'

E'\''

$$'$$

''''

3. What is an escape character? How does PostgreSQL use backslash escape? How about the writing format?

Backslash as escape character, similar to C language style, can input special characters.

E'\?'

perhaps

standard_conforming_strings=off, escape_string_warning=off

'\?'

4. Is backslash an escape character in the SQL standard?

SQL In the standard, backslashes are common characters

standard_conforming_strings=on

'\' Medium\It's a common character.

5. Why is there a risk of SQL injection when escaping single quotes? What parameters of the database can control this risk? What parameters can be warned?

Escaped characters can convert single quotation marks into ordinary characters, which may result in normal string termination becoming unfinished.

Parameters allow you to control whether escaped single quotes are allowed
 backslash_quote = on allows, off does not allow, safe_encoding (only when client_encoding does not allow the backslash ```'to appear in multi-byte characters (most character sets `````are `single-byte') is allowed to use ``````escape single quotation marks').   

6. How to input strings in the form of UNICODE.

6 Kind of Writing

U&'\xxxx'

U&'\+xxxxxx'

E'\uxxxx'

E'\Uxxxxxxxx'

standard_conforming_strings=off
'\uxxxx'

'\Uxxxxxxxx'

7. Enter in UNICODE format, support all character sets? What character sets are currently supported and what CODE range are supported?

When the database encoding is UTF8, all legal characters except zero characters are supported

When the database encoding <> UTF8, only ASCII characters, namely CODE < 007F, are supported.

8. What's wrong with entering UNICODE value in Greenplum's JSON? Is there the same problem in PostgreSQL?

At present, if you use json in green plum, it is not recommended to store UNICODE in value, please save the target character, otherwise it will be wrong.

To store the target character, UNICODE is converted into a string corresponding to the character set.

Four Ways to Realize
 
When the input is'???', set standard_conforming_strings=off.

Or use the following formats

E'\u????'

perhaps

U&'\xxxx'

perhaps

U&'\+xxxxxx'

PostgreSQL is OK.

Reference resources

https://www.postgresql.org/docs/9.6/static/runtime-config-compatible.html#RUNTIME-CONFIG-COMPATIBLE-VERSION

https://www.postgresql.org/docs/9.6/static/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE

Keywords: SQL encoding PostgreSQL JSON

Added by primefalcon on Sun, 14 Jul 2019 00:59:38 +0300