Vehicle Networking Case, Track Cleaning - Aliyun RDS PostgreSQL Best Practices - Window Query

Label

PostgreSQL, window function, vehicle networking, trajectory, trajectory cleaning, lag, lead

background

A typical scenario in Vehicle Networking is to collect the trajectories of vehicles. Generally speaking, the trajectories of vehicles are not reported in real time, and may accumulate a number of trajectory records, or how often to report them at intervals.

A typical data structure is as follows

(car_id, pos geometry, crt_time timestamp)  

Vehicles will encounter traffic jams and traffic lights while driving, so the reported trajectory record may be like this.

Location 1,'2017-01-01 12:00'  
Location 1,'2017-01-01 12:00:05'  
Location 1,'2017-01-01 12:00:10'  
Location 1,'2017-01-01 12:00:15'  
Location 1,'2017-01-01 12:00:20'  
1, Position 2,'2017-01-01 12:00:30'  

That is to say, in the same location, because of traffic jams, red lights, etc., may lead to uploading multiple records.

Then it involves the need to clean unnecessary waiting records in the database. At one point, we keep up to two records to indicate that we have reached and left this location.

This operation can be implemented using window functions.

Of course, from the perspective of optimum efficiency, trajectory cleaning is a more reasonable thing to do at the terminal. There are only two starting points for one location.

Example

1. Design Table Structure

create table car_trace (cid int, pos point, crt_time timestamp);  

2. Generate 10 million test data, assuming there are 1,000 vehicles. (In order to make the data easier to duplicate, in order to test and see the effect, 25 points are used for the location.)

insert into car_trace select random()*999, point((random()*5)::int, (random()*5)::int), clock_timestamp() from generate_series(1,10000000);  

3. Creating Index

create index idx_car on car_trace (cid, crt_time);  

4. Query data layout

select * from car_trace where cid=1 order by crt_time limit 1000;  
  
   1 | (3,1) | 2017-07-22 21:30:09.84984  
   1 | (1,4) | 2017-07-22 21:30:09.850297  
   1 | (1,4) | 2017-07-22 21:30:09.852586  
   1 | (1,4) | 2017-07-22 21:30:09.854155  
   1 | (1,4) | 2017-07-22 21:30:09.854425  
   1 | (3,1) | 2017-07-22 21:30:09.854493  
  
//Several repetitions were observed.  

5. Use windows to filter single-location records, keeping at most two records that arrive and leave this location.

Two window functions are used here:

lag, which represents the previous record of the current record.

lead, which represents the next record of the current record.

The methods of judging the arrival and departure points are as follows:

  • The current POS is not equal to the previous pos, indicating that this record is the arrival point of the current location.

  • The current POS is not equal to the next pos, indicating that this record is the departure point of the current position.

  • The former pos is empty, indicating that this record is the first record.

  • The next pos is empty, indicating that this record is the last one.

select * from   
(  
select   
  *,   
  lag(pos) over (partition by cid order by crt_time) as lag,   
  lead(pos) over (partition by cid order by crt_time) as lead   
from car_trace   
  where cid=1   
  and crt_time between '2017-07-22 21:30:09.83994' and '2017-07-22 21:30:09.859735'  
) t  
  where pos <> lag  
  or pos <> lead  
  or lag is null  
  or lead is null;  
  
 cid |  pos  |          crt_time          |  lag  | lead    
-----+-------+----------------------------+-------+-------  
   1 | (2,1) | 2017-07-22 21:30:09.83994  |       | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.839953 | (2,1) | (5,2)  
   1 | (5,2) | 2017-07-22 21:30:09.840704 | (3,1) | (4,4)  
   1 | (4,4) | 2017-07-22 21:30:09.84179  | (5,2) | (5,2)  
   1 | (5,2) | 2017-07-22 21:30:09.843787 | (4,4) | (1,5)  
   1 | (1,5) | 2017-07-22 21:30:09.844165 | (5,2) | (0,5)  
   1 | (0,5) | 2017-07-22 21:30:09.84536  | (1,5) | (4,1)  
   1 | (4,1) | 2017-07-22 21:30:09.845896 | (0,5) | (3,3)  
   1 | (3,3) | 2017-07-22 21:30:09.846958 | (4,1) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.84984  | (3,3) | (1,4)  
   1 | (1,4) | 2017-07-22 21:30:09.850297 | (3,1) | (1,4)  
   1 | (1,4) | 2017-07-22 21:30:09.854425 | (1,4) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.854493 | (1,4) | (3,2)  
   1 | (3,2) | 2017-07-22 21:30:09.854541 | (3,1) | (2,0)  
   1 | (2,0) | 2017-07-22 21:30:09.855297 | (3,2) | (4,1)  
   1 | (4,1) | 2017-07-22 21:30:09.857592 | (2,0) | (4,1)  
   1 | (4,1) | 2017-07-22 21:30:09.857595 | (4,1) | (0,4)  
   1 | (0,4) | 2017-07-22 21:30:09.857597 | (4,1) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.858996 | (0,4) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.859735 | (3,1) |   
(20 rows)  

Without cleaning trajectory, the results are as follows:

select   
  *,   
  lag(pos) over (partition by cid order by crt_time) as lag,   
  lead(pos) over (partition by cid order by crt_time) as lead   
from car_trace   
  where cid=1   
  and crt_time between '2017-07-22 21:30:09.83994' and '2017-07-22 21:30:09.859735';  
  
 cid |  pos  |          crt_time          |  lag  | lead    
-----+-------+----------------------------+-------+-------  
   1 | (2,1) | 2017-07-22 21:30:09.83994  |       | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.839953 | (2,1) | (5,2)  
   1 | (5,2) | 2017-07-22 21:30:09.840704 | (3,1) | (4,4)  
   1 | (4,4) | 2017-07-22 21:30:09.84179  | (5,2) | (5,2)  
   1 | (5,2) | 2017-07-22 21:30:09.843787 | (4,4) | (1,5)  
   1 | (1,5) | 2017-07-22 21:30:09.844165 | (5,2) | (0,5)  
   1 | (0,5) | 2017-07-22 21:30:09.84536  | (1,5) | (4,1)  
   1 | (4,1) | 2017-07-22 21:30:09.845896 | (0,5) | (3,3)  
   1 | (3,3) | 2017-07-22 21:30:09.846958 | (4,1) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.84984  | (3,3) | (1,4)  
   1 | (1,4) | 2017-07-22 21:30:09.850297 | (3,1) | (1,4)  
   1 | (1,4) | 2017-07-22 21:30:09.852586 | (1,4) | (1,4)  
   1 | (1,4) | 2017-07-22 21:30:09.854155 | (1,4) | (1,4)  
   1 | (1,4) | 2017-07-22 21:30:09.854425 | (1,4) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.854493 | (1,4) | (3,2)  
   1 | (3,2) | 2017-07-22 21:30:09.854541 | (3,1) | (2,0)  
   1 | (2,0) | 2017-07-22 21:30:09.855297 | (3,2) | (4,1)  
   1 | (4,1) | 2017-07-22 21:30:09.857592 | (2,0) | (4,1)  
   1 | (4,1) | 2017-07-22 21:30:09.857595 | (4,1) | (0,4)  
   1 | (0,4) | 2017-07-22 21:30:09.857597 | (4,1) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.858996 | (0,4) | (3,1)  
   1 | (3,1) | 2017-07-22 21:30:09.859735 | (3,1) |   
(22 rows)  

Use lag, lead to clean the record of stay.

Optimizing Scanning IO Amplification Caused by Tracked Object Scattering

Because there may be more vehicle ID s involved in the business, the data collected by different vehicles will be written into the database. If no optimization is done, the data of different vehicles may be stored staggeringly after entering the database, that is to say, in a data block, there may be data of different vehicles.

When querying the trajectory of a single vehicle, many data blocks (scanning IO enlargement) are scanned.

There are two ways to optimize.

1. Business-side aggregates packet sorting and writes to the database. For example, after receiving the data submitted by the vehicle terminal, the program grouped according to the vehicle ID, sorted by time, and wrote to the database (insert into tbl values (), (), ();). In this way, the data of the same vehicle may fall into the same data block as far as possible.

2. The database uses partition to reorganize data. For example, each vehicle, or vehicle HASH partition, is stored according to the vehicle ID.

The above two methods are to reorganize the data according to the query requirements, so as to reduce the scanning IO.

This method and "PostgreSQL Securities Industry Database Requirements Analysis and Application" The method is similar. Interested friends can refer to it.

Keywords: Database PostgreSQL Windows

Added by wiseoleweazel on Tue, 11 Jun 2019 21:15:30 +0300