While retrieving all or most rows from a table, the fastest way for this type of query typically is to aggregate / disambiguate first and join later:
SELECT *
FROM products p
JOIN (
SELECT DISTINCT ON (product_id) *
FROM meta
ORDER BY product_id, id DESC
) m ON m.product_id = p.id;
The more rows in meta
per row in products
, the bigger the impact on performance.
Of course, you'll want to add an ORDER BY
clause in the subquery do define which row to pick form each set in the subquery. @Craig and @Clodoaldo already told you about that. I am returning the meta
row with the highest id
.
SQL Fiddle.
Details for DISTINCT ON
:
Optimize performance
Still, this is not always the fastest solution. Depending on data distribution there are various other query styles. For this simple case involving another join, this one ran considerably faster in a test with big tables:
SELECT p.*, sub.meta_id, m.product_id, m.price, m.flag
FROM (
SELECT product_id, max(id) AS meta_id
FROM meta
GROUP BY 1
) sub
JOIN meta m ON m.id = sub.meta_id
JOIN products p ON p.id = sub.product_id;
If you wouldn't use the non-descriptive id
as column names, we would not run into naming collisions and could simply write SELECT p.*, m.*
. (I never use id
as column name.)
If performance is your paramount requirement, consider more options:
- a
MATERIALIZED VIEW
with pre-aggregated data from meta
, if your data does not change (much).
- a recursive CTE emulating a loose index scan for a big
meta
table with many rows per product (relatively few distinct product_id
).
This is the only way I know to use an index for a DISTINCT query over the whole table.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…