sorting - How does PostgreSQL perform ORDER BY with a b-tree index on the field?

Question

Welcome To Ask or Share your Answers For Others

sorting - How does PostgreSQL perform ORDER BY with a b-tree index on the field?

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

sorting - How does PostgreSQL perform ORDER BY with a b-tree index on the field?

I have a table bsort:

CREATE TABLE bsort(a int, data text);

Here data may be incomplete. In other words, some tuples may not have data value.

And then I build a b-tree index on the table:

CREATE INDEX ON bsort USING BTREE(a);

Now if I perform this query:

SELECT * FROM bsort ORDER BY a;

Does PostgreSQL sort tuples with nlogn complexity, or does it get the order directly from the b-tree index?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

2.1m questions

2.1m answers

60 comments

57.0k users

Most popular tags

javascript python c# java How android c++ php ios html sql r c node.js .net iphone asp.net css reactjs jquery ruby What Android objective mysql linux Is git Python windows Why regex angular swift amazon excel algorithm macos Java visual how bash Can multithreading PHP Using scala angularjs typescript apache spring performance postgresql database flutter json rust arrays C# dart vba django wpf xml vue.js In go Get google jQuery xcode jsf http Google mongodb string shell oop powershell SQL C++ security assembly docker Javascript Android: Does haskell Convert azure debugging delphi vb.net Spring datetime pandas oracle math Django

联盟问答网站-Union QA website

Xstack问答社区

生活宝问答社区

OverStack问答社区

Ostack问答社区

在这了问答社区

在哪了问答社区

Xstack问答社区

无极谷问答社区

TouSu问答社区

SQlite问答社区

Qi-U问答社区

MLink问答社区

Jonic问答社区

Jike问答社区

16892问答社区

Vigges问答社区

55276问答社区

OGeek问答社区

深圳家问答社区

深圳家问答社区

深圳家问答社区

Vigges问答社区

Vigges问答社区

在这了问答社区

DevDocs API Documentations

Xstack问答社区

生活宝问答社区

OverStack问答社区

Ostack问答社区

在这了问答社区

在哪了问答社区

Xstack问答社区

无极谷问答社区

TouSu问答社区

SQlite问答社区

Qi-U问答社区

MLink问答社区

Jonic问答社区

Jike问答社区

16892问答社区

Vigges问答社区

55276问答社区

OGeek问答社区

深圳家问答社区

深圳家问答社区

深圳家问答社区

Vigges问答社区

Vigges问答社区

在这了问答社区

在这了问答社区

DevDocs API Documentations

Xstack问答社区

生活宝问答社区

OverStack问答社区

Ostack问答社区

在这了问答社区

在哪了问答社区

Xstack问答社区

无极谷问答社区

TouSu问答社区

SQlite问答社区

Qi-U问答社区

MLink问答社区

Jonic问答社区

Jike问答社区

16892问答社区

Vigges问答社区

55276问答社区

OGeek问答社区

深圳家问答社区

深圳家问答社区

深圳家问答社区

Vigges问答社区

Vigges问答社区

在这了问答社区

DevDocs API Documentations

广告位招租

深蓝 · Answer 1 · 2021-10-17T00:50:17+0000

For a simple query like this Postgres will use an index scan and retrieve readily sorted tuples from the index in order. Due to its MVCC model Postgres had to always visit the "heap" (data pages) additionally to verify entries are actually visible to the current transaction. Quoting the Postgres Wiki on index-only scans:

PostgreSQL indexes do not contain visibility information. That is, it is not directly possible to ascertain if any given tuple is visible to the current transaction, which is why it has taken so long for index-only scans to be implemented.

Which finally happened in version 9.2: index-only scans. The manual:

If the index stores the original indexed data values (and not some lossy representation of them), it is useful to support index-only scans, in which the index returns the actual data not just the TID of the heap tuple. This will only avoid I/O if the visibility map shows that the TID is on an all-visible page; else the heap tuple must be visited anyway to check MVCC visibility. But that is no concern of the access method's.

The visibility map decides whether index-only scans are possible. Only an option if all involved column values are included in the index. Else, the heap has to be visited (additionally) in any case. The sort step is still not needed.

That's why we sometimes append otherwise useless columns to indexes now. Like the data column in your example:

CREATE INDEX ON bsort (a, data);  -- btree is the default index type

It makes the index bigger (depends) and a bit more expensive to maintain and use for other purposes. So only append the data column if you get index-only scans out of it. The order of columns in the index is important:

Since Postgres 11, there are also "covering indexes" with the INCLUDE keyword. Like:

CREATE INDEX ON bsort (a) INCLUDE (data);

See:

Does a query with a primary key and foreign keys run faster than a query with just primary keys?

The benefit of an index-only scan, per documentation:

If it's known that all tuples on the page are visible, the heap fetch can be skipped. This is most noticeable on large data sets where the visibility map can prevent disk accesses. The visibility map is vastly smaller than the heap, so it can easily be cached even when the heap is very large.

The visibility map is maintained by VACUUM which happens automatically if you have autovacuum running (the default setting in modern Postgres). Details:

Are regular VACUUM ANALYZE still recommended under 9.1?

But there is some delay between write operations to the table and the next VACUUM run. The gist of it:

Read-only tables stay ready for index-only scans once vacuumed.
Data pages that have been modified lose their "all-visible" flag in the visibility map until the next VACUUM (and all older tansactions being finished), so it depends on the ratio between write operations and VACUUM frequency.

Partial index-only scans are still possible if some of the involved pages are marked all-visible. But if the heap has to be visited anyway, the access method "index scan" is a bit cheaper. So if too many pages are currently dirty, Postgres will switch to the cheaper index scan altogether. The Postgres Wiki again:

As the number of heap fetches (or "visits") that are projected to be needed by the planner goes up, the planner will eventually conclude that an index-only scan isn't desirable, as it isn't the cheapest possible plan according to its cost model. The value of index-only scans lies wholly in their potential to allow us to elide heap access (if only partially) and minimise I/O.

Categories

sorting - How does PostgreSQL perform ORDER BY with a b-tree index on the field?

sorting - How does PostgreSQL perform ORDER BY with a b-tree index on the field?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags