Postgres: Why is my delete query using up all available disk space?
I have a postgres table set up like so:
Table "public.facts"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-------------+--------------------------+-----------+----------+-----------------------------------+----------+--------------+-------------
id | integer | | not null | nextval('facts_id_seq'::regclass) | plain | |
value | character varying(100) | | | | extended | |
measured_at | timestamp with time zone | | not null | | plain | |
received_at | timestamp with time zone | | | | plain | |
written_at | timestamp with time zone | | | | plain | |
is_numeric | boolean | | | | plain | |
source_id | character varying(100) | | | | extended | |
site_id | integer | | | | plain | |
Indexes:
"facts_measured_at_idx" btree (measured_at DESC)
"facts_source_id_measured_at_idx" btree (source_id, measured_at)
"idx_source_id_measured_at" btree (source_id, measured_at)
Foreign-key constraints:
"facts_site_id_fkey" FOREIGN KEY (site_id) REFERENCES site(id)
"facts_source_id_fkey" FOREIGN KEY (source_id) REFERENCES source(id)
Triggers:
ts_insert_blocker BEFORE INSERT ON facts FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Child tables: _timescaledb_internal._hyper_1_100_chunk,
_timescaledb_internal._hyper_1_101_chunk,
_timescaledb_internal._hyper_1_102_chunk,
_timescaledb_internal._hyper_1_103_chunk,
_timescaledb_internal._hyper_1_104_chunk,
_timescaledb_internal._hyper_1_105_chunk,
_timescaledb_internal._hyper_1_106_chunk,
_timescaledb_internal._hyper_1_107_chunk,
_timescaledb_internal._hyper_1_108_chunk,
_timescaledb_internal._hyper_1_109_chunk,
_timescaledb_internal._hyper_1_110_chunk,
_timescaledb_internal._hyper_1_111_chunk,
_timescaledb_internal._hyper_1_112_chunk,
_timescaledb_internal._hyper_1_113_chunk,
_timescaledb_internal._hyper_1_114_chunk,
_timescaledb_internal._hyper_1_115_chunk,
_timescaledb_internal._hyper_1_116_chunk,
_timescaledb_internal._hyper_1_117_chunk,
_timescaledb_internal._hyper_1_118_chunk,
_timescaledb_internal._hyper_1_119_chunk,
Recently a bug was introduced that caused an infinite loop and many thousands of duplicate rows were written. I want to delete those duplicate rows and I'm using the following query:
DELETE FROM facts a USING facts b WHERE a.id > b.id AND a.source_id = b.source_id AND a.measured_at = b.measured_at AND a.value = b.value;
Disk usage prior to running the query:
[email protected]:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 32G 0 32G 0% /dev
tmpfs 6.3G 844K 6.3G 1% /run
/dev/nvme0n1p1 49G 8.5G 40G 18% /
tmpfs 32G 44K 32G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/loop0 29M 29M 0 100% /snap/amazon-ssm-agent/2012
/dev/loop1 98M 98M 0 100% /snap/core/10126
/dev/loop2 98M 98M 0 100% /snap/core/10185
/dev/nvme1n1p1 3.9T 181G 3.5T 5% /mnt <-- database on this volume
tmpfs 6.3G 0 6.3G 0% /run/user/1000
disk usage 3 days after executing the query (still running):
[email protected]:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 32G 0 32G 0% /dev
tmpfs 6.3G 836K 6.3G 1% /run
/dev/nvme0n1p1 49G 8.5G 40G 18% /
tmpfs 32G 44K 32G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/loop0 29M 29M 0 100% /snap/amazon-ssm-agent/2012
/dev/loop1 98M 98M 0 100% /snap/core/10126
/dev/loop2 98M 98M 0 100% /snap/core/10185
/dev/nvme1n1p1 3.9T 3.2T 527G 86% /mnt <-- database on this volume
tmpfs 6.3G 0 6.3G 0% /run/user/1000
Nothing is being written to or read from the database while this query is executing. Most of the extra storage is in /mnt/postgres/12/main/base/pgsql_tmp
. Why does it take >3TB to delete less than 200GB?
Please let me know if there's any other info I can provide.
Answers 1
It turns out all the extra child tables/chunks created by timescaledb were making this query run wild; running
EXPLAIN
as Schwern suggested was what keyed me into this. So here's what I ended up doing: