PostgreSQL solves the shortest path

Some businesses need to solve the shortest path. A pgrouting plug-in in PostgreSQL has algorithms related to calculating the shortest path built in.
Here's an example

Table definition

postgres=# \d testpath
              Table "public.testpath"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 id     | integer |           |          | 
 source | integer |           |          | 
 target | integer |           |          | 
 cost   | integer |           |          | 

This is a business table. Each row represents an edge and its cost. There are more than 1000 records in total (actually corresponding to the result set size filtered by business conditions). All other business-related attributes are hidden.

Solving the shortest path between two points

postgres=# SELECT * FROM pgr_dijkstra(
  'SELECT id,source,target,cost FROM testpath',
  10524, 10379, directed:=true);
 seq | path_seq | node  |  edge   | cost | agg_cost 
-----+----------+-------+---------+------+----------
   1 |        1 | 10524 | 1971852 |    1 |        0
   2 |        2 |  7952 |   32256 |    1 |        1
   3 |        3 |  7622 |   76615 |    2 |        2
   4 |        4 | 44964 |   76616 |    1 |        4
   5 |        5 |  7861 |   19582 |    1 |        5
   6 |        6 |  7629 |   14948 |    2 |        6
   7 |        7 | 17135 |   14949 |    1 |        8
   8 |        8 | 10379 |      -1 |    0 |        9
(8 rows)

Time: 22.979 ms

Solve the shortest N paths between 2 points

postgres=# SELECT * FROM pgr_ksp(
  'SELECT id,source,target,cost FROM testpath',
  10524, 10379, 1000,directed:=true);
 seq | path_id | path_seq | node  |  edge   | cost | agg_cost 
-----+---------+----------+-------+---------+------+----------
   1 |       1 |        1 | 10524 | 1971852 |    1 |        0
   2 |       1 |        2 |  7952 |   32256 |    1 |        1
   3 |       1 |        3 |  7622 |   54740 |    2 |        2
   4 |       1 |        4 | 35389 |   54741 |    1 |        4
   5 |       1 |        5 |  7861 |   19582 |    1 |        5
   6 |       1 |        6 |  7629 |   14948 |    2 |        6
   7 |       1 |        7 | 17135 |   14949 |    1 |        8
   8 |       1 |        8 | 10379 |      -1 |    0 |        9
...(slightly)
 100 |      12 |        4 | 53179 |   95137 |    1 |        4
 101 |      12 |        5 |  7625 |   90682 |    2 |        5
 102 |      12 |        6 | 51211 |   90683 |    1 |        7
 103 |      12 |        7 |  7861 |   19582 |    1 |        8
 104 |      12 |        8 |  7629 | 1173911 |    2 |        9
 105 |      12 |        9 | 59579 | 1173917 |    1 |       11
 106 |      12 |       10 | 10379 |      -1 |    0 |       12
(106 rows)

Time: 201.223 ms

The shortest path of pure SQL solution

The previous shortest path is calculated through the pgrouting plug-in. Can we simply use the SQL of PG itself to calculate the shortest path?

In real business scenarios, the path length can be limited, for example, if we discard all paths with more than 7 sides.
Then we can use simple depth traversal to calculate the shortest path. The computing speed is also increased by five times.

postgres=# WITH RECURSIVE line AS(
SELECT source,target,cost from testpath
),
path(fullpath,pathseq,node,total_cost) AS (
    select ARRAY[10524],1,10524,0
  UNION ALL
    select array_append(fullpath,target),pathseq+1,target,total_cost+cost from path join line on(source=node) where node!=10379 and pathseq<=8
)
SELECT * FROM path where fullpath @> ARRAY[10379] order by total_cost limit 1;
                   fullpath                    | pathseq | node  | total_cost 
-----------------------------------------------+---------+-------+------------
 {10524,7952,7622,80465,7861,7629,17135,10379} |       8 | 10379 |          9
(1 row)

Time: 4.334 ms

If the cost of each side is the same, the order by total cost above can be removed, and the performance will be greatly improved on the big data set.

The shortest N paths for pure SQL

Using the previous SQL, only the limit value is modified, which improves the performance more than pgr_ksp function of pgrouting. 50 times better performance.

postgres=# WITH RECURSIVE line AS(
SELECT source,target,cost from testpath
),
path(fullpath,pathseq,node,total_cost) AS (
    select ARRAY[10524],1,10524,0
  UNION ALL
    select array_append(fullpath,target),pathseq+1,target,total_cost+cost from path join line on(source=node) where node!=10379 and pathseq<=8
)
SELECT * FROM path where fullpath @> ARRAY[10379] order by total_cost limit 1000;
                      fullpath                      | pathseq | node  | total_cost 
----------------------------------------------------+---------+-------+------------
 {10524,7952,7622,80465,7861,7629,17135,10379}      |       8 | 10379 |          9
 {10524,7952,7622,35389,7861,7629,17135,10379}      |       8 | 10379 |          9
 {10524,7952,7622,44964,7861,7629,17135,10379}      |       8 | 10379 |          9
 {10524,7952,7622,80465,7861,7629,59579,10379}      |       8 | 10379 |          9
 {10524,7952,7622,35389,7861,7629,59579,10379}      |       8 | 10379 |          9
 {10524,7952,7622,44964,7861,7629,59579,10379}      |       8 | 10379 |          9
 {10524,7952,7622,53179,7625,7861,7629,17135,10379} |       9 | 10379 |         10
 {10524,7952,7622,53179,7625,7861,7629,59579,10379} |       9 | 10379 |         10
(8 rows)

Time: 4.425 ms

Let's take a look at the implementation plan

postgres=# explain analyze
WITH RECURSIVE line AS(
SELECT source,target,cost from testpath
),
path(fullpath,pathseq,node,total_cost) AS (
    select ARRAY[10524],1,10524,0
  UNION ALL
    select array_append(fullpath,target),pathseq+1,target,total_cost+cost from path join line on(source=node) where node!=10379 and pathseq<=8
)
SELECT * FROM path where fullpath @> ARRAY[10379] order by total_cost limit 1000;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=277.18..277.18 rows=1 width=44) (actual time=9.992..10.001 rows=8 loops=1)
   CTE line
     ->  Seq Scan on testpath  (cost=0.00..16.45 rows=1045 width=12) (actual time=0.017..0.624 rows=1045 loops=1)
   CTE path
     ->  Recursive Union  (cost=0.00..257.09 rows=161 width=44) (actual time=0.003..9.889 rows=42 loops=1)
           ->  Result  (cost=0.00..0.01 rows=1 width=44) (actual time=0.001..0.002 rows=1 loops=1)
           ->  Hash Join  (cost=0.29..25.39 rows=16 width=44) (actual time=0.451..1.090 rows=5 loops=9)
                 Hash Cond: (line.source = path_1.node)
                 ->  CTE Scan on line  (cost=0.00..20.90 rows=1045 width=12) (actual time=0.003..0.678 rows=1045 loops=8)
                 ->  Hash  (cost=0.25..0.25 rows=3 width=44) (actual time=0.007..0.007 rows=3 loops=9)
                       Buckets: 1024  Batches: 1  Memory Usage: 8kB
                       ->  WorkTable Scan on path path_1  (cost=0.00..0.25 rows=3 width=44) (actual time=0.001..0.004 rows=3 loops=9)
                             Filter: ((node <> 10379) AND (pathseq <= 8))
                             Rows Removed by Filter: 1
   ->  Sort  (cost=3.63..3.64 rows=1 width=44) (actual time=9.991..9.994 rows=8 loops=1)
         Sort Key: path.total_cost
         Sort Method: quicksort  Memory: 26kB
         ->  CTE Scan on path  (cost=0.00..3.62 rows=1 width=44) (actual time=7.851..9.979 rows=8 loops=1)
               Filter: (fullpath @> '{10379}'::integer[])
               Rows Removed by Filter: 34
 Planning time: 0.234 ms
 Execution time: 10.111 ms
(22 rows)

Time: 10.973 ms

Comparison of big data sets

The data set of the above test is relatively small, with only more than 1000 sides. If the data set is 100w, what is the result?

Calculate shortest path Time (seconds)
pgr_dijkstra() 52 seconds
Recursive CTE (maximum depth 2 sides) 2 seconds
Recursive CTE (max depth 3 sides) 5 seconds
Recursive CTE (max depth 4 sides) 105 seconds
Recursive CTE (max depth 5 sides) I can't figure it out. Give up
Recursive CTE (the maximum depth is 7 sides, assuming that the cost of each side is equal and not sorted, the shortest path is 3 sides) 1.6 seconds

Summary

Simple depth traversal can be applied to small data sets or scenes with small depth.
In the scene that meets these conditions, the effect is good.

Tags: Database SQL Big Data PostgreSQL

Posted on Sun, 26 Apr 2020 19:41:13 -0700 by astoller