Haki Benita
2 min readSep 16, 2019

--

Hey Mark,
I just ran a quick benchmark on my own and indeed the “University” method was faster and consumed less space.

db=# EXPLAIN (ANALYZE ON, BUFFERS ON)
db-# SELECT
db-# *
db-# FROM
db-# sale
db-# WHERE
db-# (user, created) IN (
db(# SELECT
db(# user,
db(# MAX(created)
db(# FROM
db(# sale
db(# GROUP BY
db(# user
db(# )
db-# ;
Hash Join (cost=2717.22..4196.43 rows=1 width=75) (actual time=94.962..114.427 rows=38639 loops=1)
Hash Cond: ((sale.user = sale_1.user) AND (sale.created = (max(sale_1.created))))
Buffers: shared hit=688 read=691
-> Seq Scan on sale (cost=0.00..1206.83 rows=51883 width=75) (actual time=0.529..4.452 rows=51280 loops=1)
Buffers: shared hit=687 read=1
-> Hash (cost=2181.09..2181.09 rows=35742 width=12) (actual time=94.198..94.199 rows=38639 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 2324kB
Buffers: shared hit=1 read=687
-> HashAggregate (cost=1466.24..1823.66 rows=35742 width=12) (actual time=79.480..87.120 rows=38639 loops=1)
Group Key: sale_1.user
Buffers: shared hit=1 read=687
-> Seq Scan on sale sale_1 (cost=0.00..1206.83 rows=51883 width=12) (actual time=0.011..29.354 rows=51280 loops=1)
Buffers: shared hit=1 read=687
Planning Time: 11.441 ms
Execution Time: 116.234 ms
db=# EXPLAIN (ANALYZE ON, BUFFERS ON)
db-# SELECT DISTINCT ON (user)
db-# *
db-# FROM
db-# sale
db-# ORDER BY
db-# user,
db-# created DESC;
Unique (cost=7576.54..7835.96 rows=35742 width=75) (actual time=96.598..113.452 rows=38639 loops=1)
Buffers: shared hit=691, temp read=544 written=546
-> Sort (cost=7576.54..7706.25 rows=51883 width=75) (actual time=96.596..104.272 rows=51280 loops=1)
Sort Key: user, created DESC
Sort Method: external merge Disk: 4352kB
Buffers: shared hit=691, temp read=544 written=546
-> Seq Scan on sale (cost=0.00..1206.83 rows=51883 width=75) (actual time=0.020..17.146 rows=51280 loops=1)
Buffers: shared hit=688
Planning Time: 1.512 ms
Execution Time: 115.921 ms

I think the main difference is that the “University” method uses a HashAggregate, which is more efficient in term of memory than the Sort used by the DISTINCT ON method.

In other words, the “University” method goes one row at a time and keeps only the max value for each group, while the DISTINCT ON method sort the entire table. Maintaining a hash is faster and smaller than sorting the entire table.

I was unable to use the “array trick” here because the benchmark fetch all the rows. But, if you only need some of the fields the performance and memory usage should be comparable to the “University” approach.

--

--

Haki Benita
Haki Benita

Written by Haki Benita

Full Stack Developer, Team Leader, Independent. More from me at https://hakibenita.com

No responses yet