sql - Slow simple update query on PostgreSQL database with 3 million rows

Question

Welcome To Ask or Share your Answers For Others

sql - Slow simple update query on PostgreSQL database with 3 million rows

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

sql - Slow simple update query on PostgreSQL database with 3 million rows

I am trying a simple UPDATE table SET column1 = 0 on a table with ~3 million rows on Postegres 8.4 but it is taking forever to finish. It has been running for more than 10 min. now in my last attempt.

Before, I tried to run a VACUUM and ANALYZE commands on that table and I also tried to create some indexes (although I doubt this will make any difference in this case) but none seems to help.

Any other ideas?

Thanks, Ricardo

Update:

This is the table structure:

CREATE TABLE myTable
(
  id bigserial NOT NULL,
  title text,
  description text,
  link text,
  "type" character varying(255),
  generalFreq real,
  generalWeight real,
  author_id bigint,
  status_id bigint,
  CONSTRAINT resources_pkey PRIMARY KEY (id),
  CONSTRAINT author_pkey FOREIGN KEY (author_id)
      REFERENCES users (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT c_unique_status_id UNIQUE (status_id)
);

I am trying to run UPDATE myTable SET generalFreq = 0;

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T00:08:52+0000

I have to update tables of 1 or 2 billion rows with various values for each rows. Each run makes ~100 millions changes (10%). My first try was to group them in transaction of 300K updates directly on a specific partition as Postgresql not always optimize prepared queries if you use partitions.

Transactions of bunch of "UPDATE myTable SET myField=value WHERE myId=id"
Gives 1,500 updates/sec. which means each run would take at least 18 hours.
HOT updates solution as described here with FILLFACTOR=50. Gives 1,600 updates/sec. I use SSD's so it's a costly improvement as it doubles the storage size.
Insert in a temporary table of updated value and merge them after with UPDATE...FROM Gives 18,000 updates/sec. if I do a VACUUM for each partition; 100,000 up/s otherwise. Cooool.
Here is the sequence of operations:

CREATE TEMP TABLE tempTable (id BIGINT NOT NULL, field(s) to be updated,
CONSTRAINT tempTable_pkey PRIMARY KEY (id));

Accumulate a bunch of updates in a buffer depending of available RAM When it's filled, or need to change of table/partition, or completed:

COPY tempTable FROM buffer;
UPDATE myTable a SET field(s)=value(s) FROM tempTable b WHERE a.id=b.id;
COMMIT;
TRUNCATE TABLE tempTable;
VACUUM FULL ANALYZE myTable;

That means a run now takes 1.5h instead of 18h for 100 millions updates, vacuum included. To save time, it's not necessary to make a vacuum FULL at the end but even a fast regular vacuum is usefull to control your transaction ID on the database and not get unwanted autovacuum during rush hours.

Categories

sql - Slow simple update query on PostgreSQL database with 3 million rows

sql - Slow simple update query on PostgreSQL database with 3 million rows

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags