I had a fairly simple query that was running slow (165 secs) which I needed to tune:
SELECT id
FROM property_vw
WHERE id IN (list of approx 32K numbers)
AND baths in ('2')
AND beds in ('2', '3')
AND property_type = 'UNIT'
ORDER BY a, few, columns;
The first thing I noticed was the huge IN list of over 32,000 Ids. Frankly, I’m amazed that PostgreSQL even accepts such a query and doesn’t spit out an error message immediately.
The explain analyze for this query is:
(anonymised and colorised via Depesz’s awesome online explain tool)
Sort (cost=655585.950..655585.950 rows=1 width=81) (actual time=165765.458..165765.629 rows=2432 loops=1)
Sort Key: quebec_three.romeo_seven, quebec_three.mike_five, quebec_three.yankee_india, quebec_three.quebec_five, quebec_three.six_bravo, quebec_three.romeo_india, quebec_three.five, (lima_sierra(quebec_three.six_mike, 0)), (lima_sierra(quebec_three.charlie_hotel, ''alpha''::six_delta mike_seven))
Sort Method: quicksort Memory: 439kB
-> Nested Loop (cost=0.000..655585.940 rows=1 width=81) (actual time=22.486..165732.870 rows=2432 loops=1)
-> Nested Loop Left Join (cost=0.000..655576.530 rows=1 width=32) (actual time=22.473..165713.367 rows=2432 loops=1)
Join Filter: (lima_four.three_alpha = kilo.three_alpha)
-> Nested Loop (cost=0.000..655568.090 rows=1 width=36) (actual time=22.461..165692.927 rows=2432 loops=1)
-> Nested Loop (cost=0.000..655566.960 rows=2 width=40) (actual time=22.451..165680.208 rows=2496 loops=1)
-> Seq Scan on zulu_juliet zulu_kilo (cost=0.000..627393.710 rows=567 width=4) (actual time=0.021..5377.204 rows=642599 loops=1)
Filter: ((lima_sierra(((xray_xray -> 'six_kilo'::golf))::charlie_foxtrot, 0) = ANY ('oscar_victor'::charlie_foxtrot[])) AND (lima_sierra(((xray_xray -> 'tango'::golf))::charlie_foxtrot, 0) = 2))
-> Index Scan using xray_five on six_alpha kilo (cost=0.000..49.680 rows=1 width=36) (actual time=0.249..0.249 rows=0 loops=642599)
Index Cond: (kilo.three_alpha = zulu_kilo.three_alpha)
Filter: (kilo.three_alpha = ANY ('three_india'::charlie_foxtrot[]))
-> Index Scan using india on oscar_seven xray_sierra (cost=0.000..0.550 rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=2496)
Index Cond: (xray_sierra.oscar_juliet = kilo.oscar_juliet)
Filter: (xray_sierra.victor_echo = 'zulu_echo'::golf)
-> Index Scan using four on victor_yankee lima_four (cost=0.000..8.430 rows=1 width=8) (actual time=0.007..0.007 rows=1 loops=2432)
Index Cond: (lima_four.seven = kilo.three_sierra)
-> Index Scan using whiskey on yankee_zulu quebec_three (cost=0.000..9.390 rows=1 width=57) (actual time=0.006..0.006 rows=1 loops=2432)
Index Cond: (quebec_three.three_alpha = kilo.three_alpha)
It’s driving the query off an initial sequential scan of a large table. It’s estimating the scan will return 567 rows when in fact it returned 642599 rows. I can understand its limited ability to interpret the query when it has such a monster IN list. What I need is for the query to drive off an index lookup on a different table which is more selective.
Now I know what I want, but how to convince PostgreSQL that it’s the right choice?
So in Oracle, I would have tried adding an ORDERED hint (or more likely, a LEADING hint but then I wouldn’t have anything to compare with here!). However PostgreSQL does not support any hints, and most likely never will. What I did find in the PostgreSQL documentation is the join_collapse_limit query planner parameter:
“Setting it to 1 prevents any reordering of explicit JOINs. Thus, the explicit join order specified in the query will be the actual order in which the relations are joined.”
More info available in the documentation.
So join_collapse_limit seems to be the closest thing in PostgreSQL to the Oracle ordered hint. After setting this for my session:
set join_collapse_limit = 1;
I get the following new explain plan:
(anonymised and colorised via Depesz’s awesome online explain tool)
Sort (cost=1526179.540..1526179.550 rows=1 width=81) (actual time=281.272..281.445 rows=2432 loops=1)
Sort Key: quebec_three.romeo_seven, quebec_three.mike_five, quebec_three.yankee_india, quebec_three.quebec_five, quebec_three.six_bravo, quebec_three.romeo_india, quebec_three.five, (lima_sierra(quebec_three.six_mike, 0)), (lima_sierra(quebec_three.charlie_hotel, ''alpha''::six_delta mike_seven))
Sort Method: quicksort Memory: 439kB
-> Nested Loop Left Join (cost=90351.160..1526179.530 rows=1 width=81) (actual time=50.434..251.197 rows=2432 loops=1)
Join Filter: (lima_four.three_alpha = kilo.three_alpha)
-> Nested Loop (cost=90351.160..1526169.630 rows=1 width=85) (actual time=50.423..240.437 rows=2432 loops=1)
-> Nested Loop (cost=90351.160..1526160.230 rows=1 width=36) (actual time=50.413..232.088 rows=2432 loops=1)
-> Nested Loop (cost=90351.160..1499651.650 rows=2878 width=32) (actual time=50.158..144.098 rows=23920 loops=1)
-> Bitmap Heap Scan on six_alpha kilo (cost=90351.160..1481550.430 rows=32209 width=36) (actual time=50.145..59.728 rows=32255 loops=1)
Recheck Cond: (three_alpha = ANY ('victor_bravo'::charlie_foxtrot[]))
-> Bitmap Index Scan on xray_five (cost=0.000..90343.100 rows=32209 width=0) (actual time=49.636..49.636 rows=32255 loops=1)
Index Cond: (three_alpha = ANY ('victor_sierra'::charlie_foxtrot[]))
-> Index Scan using india on oscar_seven xray_sierra (cost=0.000..0.550 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=32255)
Index Cond: (xray_sierra.oscar_juliet = kilo.oscar_juliet)
Filter: (xray_sierra.victor_echo = 'zulu_echo'::golf)
-> Index Scan using papa on zulu_juliet zulu_kilo (cost=0.000..9.200 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=23920)
Index Cond: (zulu_kilo.three_alpha = kilo.three_alpha)
Filter: ((lima_sierra(((zulu_kilo.xray_xray -> 'six_kilo'::golf))::charlie_foxtrot, 0) = ANY ('oscar_victor'::charlie_foxtrot[])) AND (lima_sierra(((zulu_kilo.xray_xray -> 'tango'::golf))::charlie_foxtrot, 0) = 2))
-> Index Scan using whiskey on yankee_zulu quebec_three (cost=0.000..9.390 rows=1 width=57) (actual time=0.003..0.003 rows=1 loops=2432)
Index Cond: (quebec_three.three_alpha = kilo.three_alpha)
-> Index Scan using four on victor_yankee lima_four (cost=0.000..9.890 rows=1 width=8) (actual time=0.003..0.004 rows=1 loops=2432)
Index Cond: (lima_four.seven = kilo.three_sierra)
Which ran in only 281ms. Considering all the single page index lookups, I think this is quite fast.
I’m not happy with this solution though, as the view that is being queried has several other views inside it. If any of them were modified to join tables in a different order, the query would stop working again. Manually rewriting the views to make the query reference all tables would reduce the risk of this query breaking again, but of course you lose the benefit of having the views in the first place which is to encapsulate some of the complex logic from any calling queries.