Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
738 views
in Technique[技术] by (71.8m points)

sql - Why can't I exclude dependent columns from `GROUP BY` when I aggregate by a key?

If I have the following tables (as an example using PostgreSQL, but could be any other relational database), where car has two keys (id and vin):

create table car (
  id int primary key not null,
  color varchar(10),
  brand varchar(10),
  vin char(17) unique not null
);

create table appraisal (
  id int primary key not null,
  recorded date not null,
  car_id int references car (id),
  car_vin char(17) references car (vin),
  price int
);

I can successfully include c.color and c.brand in the select list without aggregating them, since they depend on c.id:

select 
  c.id, c.color, c.brand,
  min(price) as min_appraisal,
  max(price) as max_appraisal
from car c
left join appraisal a on a.car_id = c.id
group by c.id; -- c.color, c.brand are not needed here

However, the following query fails since it doesn't allow me to include c.color and c.brand in the select list, even though it does depend on c.vin (that is a key) of the table.

select 
  c.vin, c.color, c.brand,
  min(price) as min_appraisal,
  max(price) as max_appraisal
from car c
left join appraisal a on a.car_vin = c.vin
group by c.vin; -- Why are c.color, c.brand needed here?

Error: ERROR: column "c.color" must appear in the GROUP BY clause or be used in an aggregate function Position: 18

Example in DB Fiddle.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Because only the PK covers all columns of an underlying table in the GROUP BY clause. Hence your first query works. A UNIQUE constraint does not.

The combination of a non-deferrable UNIQUE and a NOT NULL constraint would also qualify. But that's not implemented - as well as some other functional dependencies known to the SQL standard. Peter Eisentraut, the principal author of the feature, had more in mind, but it was determined at the time that the demand is low and associated costs might be high. See the discussion about the feature on pgsql-hackers.

The manual:

When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.

And more explicitly:

PostgreSQL recognizes functional dependency (allowing columns to be omitted from GROUP BY) only when a table's primary key is included in the GROUP BY list. The SQL standard specifies additional conditions that should be recognized.

Since c.vin is UNIQUE NOT NULL, you can fix your second query by using the PK column instead:

...
group by c.id;

Aside, while referential integrity is enforced and the whole table is queried, both of the given queries can be substantially cheaper: aggregate rows in appraisal before the join. This removes the need to GROUP BY in the outer SELECT a priori. Like:

SELECT c.vin, c.color, c.brand
     , a.min_appraisal
     , a.max_appraisal
FROM   car c
LEFT   JOIN (
   SELECT car_vin
        , min(price) AS min_appraisal
        , max(price) AS max_appraisal
   FROM   appraisal
   GROUP  BY car_vin
   ) a ON a.car_vin = c.vin;

See:

Related:


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...