Mar 18, 2011
Nathan Thom

Preventing filling the temporary tablespace by killing slow queries


I had a PostgreSQL database crash overnight due to the $PGDATA file system running out of disk space. Looking at the disk growth, some process had steadily increasing disk usage for hrs and consumed 170GB before dying and taking out several other overnight batch processes.

The log file contained:

EST:ERROR:  could not write block 22455034 of temporary file: No space left on device
EST:HINT:  Perhaps out of disk space?
EST:STATEMENT:  SELECT bunch of columns
          FROM big_ass_table a, another_huge_table b
          WHERE useless_criteria_that_filters_nothing
          AND no_table_joins
          GROUP BY lets_throw_a_group_by_in_here_too_just_to_fuck_it_up_some_more

The offending query was traced back to a single power user running an ad-hoc query from their desktop. Looking at the explain plan of that query revealed some rather excessive temp file usage due to cartesian joins on multi-GB tables.

So here is a way for a single user to crash the entire PostgreSQL database cluster. As a DBA this makes me shudder. What are my options to prevent this happening again:

1. Find some parameters that can avoid this. The only parameter remotely relevant was the temp_buffers which only governs space usage in memory.
2. Use a separate temporary tablespace. I could create a separate file system for a new tablespace, but that isn’t something I can do quickly on 40 different servers. I can’t point the new tablespace to an existing file system as there are no tablespace quotas in PostgreSQL and it would just fill up that file system if given the chance.
3. Write script to kill end user queries that run longer than reasonable before they have a chance to fill up the disk.

I can’t investigate all this without comparing to Oracle so here goes :) All Oracle temp files are stored in one or more temporary tablespaces, each of which have a fixed size. They can be made to autoextend in size but the DBA always has control over maximum space used. If a query uses all the allocated space for that user/tablespace, their query errors and rolls back. Other users are not affected (unless they were unlucky enough to try and grab temp space from the same tablespace just when it filled up). Most importantly – the database stays UP.

Naturally, I ended up implementing choice #3 – automatically kill slow user queries. I have done similar with Oracle in the past to minimise performance problems and it is a very effective solution. It also helps to train the end users into writing more efficient queries ;)

The basic steps are:

1. Connect to a database cluster and query for slow queries.

psql -U postgres postgres -Atc \
	"select procpid, datname, usename, client_addr, current_query
	from pg_stat_activity where current_query <> '<IDLE>'
	and client_addr::varchar like '%.140.%'
	and current_timestamp-query_start > interval '1 hr'" > /tmp/$$.out

2. For each query, send an alert email with details of who is running what then, if you’re brave enough, kill it.

while read LINE; do
	KILL=`echo $LINE | awk -F\| '{print $1}'`
	DATABASE=`echo $LINE | awk -F\| '{print $2}'`
	USER=`echo $LINE | awk -F\| '{print $3}'`
	IP=`echo $LINE | awk -F\| '{print $4}'`
	SQL=`echo $LINE | awk -F\| '{print $5}'`
	MSG="Killing $USER ($IP) in $DATABASE running $SQL"

	/var/lib/dba/alert.sh "ALERT: Killing query" "$MSG"

	`kill $KILL`

done < /tmp/$$.out

You can download the full script kill_old_queries.sh here.

Related posts:

  1. Killing a query
  2. PostgreSQL pg_xlog filling up disk
  3. Queries Running for Negative Time
  4. Top 10 Missing PostgreSQL Features
  5. PostgreSQL Database Capacity Planning 1 – Disk Space
  6. PostgreSQL Log Monitoring and Alerting
  7. Impacts of changing tablespaces on bloat
  8. Ident authentication failed issue in pg_hba.conf

1 Comment

Leave a comment

Article Series

Kick Ass PostgreSQL Books

Kick Ass Oracle Books

I've read lots of Oracle books, but these are by far the best I've encountered:

Categories