Dec 6, 2010
Nathan Thom

Parallel analyze for PostgreSQL


I often need to copy a PostgreSQL database from one server to another, which is easy to do with pg_dump. However when you load the data into a new PostgreSQL database via pg_restore, it will not automatically calculate the table statistics and there might be very bad performance until the autovacuum process eventually kicks in (might be hours).

select relname, last_analyze, last_autoanalyze from pg_stat_all_tables
where last_analyze is not null;

So you need to run it manually afterwards:

\c db
ANALYZE;

And wait forever if it’s a large database. Even though it only samples the big tables it still seems to run very slow.
Running multiple analyze commands in separate sessions doesn’t work well – they both run over the same tables and seem to do much of the same amount of work even if a table has just been analyzed. Solution? Script it to run in parallel over different tables :) The key to this is to create a separate analyze statement per table and then run each table-level analyze in parallel via a script:

NUM_PROCESSES=4
SCHEMA=''

if [ "$SCHEMA" != "" ] ; then
   SCHEMA="and schemaname = '$SCHEMA' "
fi

analyze_table() {
   SQL=$1
   psql $DB -c "$SQL"
   echo >&4
}

# Select the tables that require stats and put them in a file /tmp/$$.lst
psql $DB -At -o /tmp/$$.list -c "select 'analyze verbose ' || s.schemaname || '.' ||
   s.relname || ';' from pg_stat_all_tables s, pg_class c, pg_namespace n \
   where c.relname = s.relname and s.schemaname = n.nspname
   and n.oid = c.relnamespace and s.last_analyze is null and s.last_autoanalyze is null
   and s.schemaname != 'pg_toast' $SCHEMA order by c.relpages desc"

mkfifo /tmp/$$.pipe; exec 4<>/tmp/$$.pipe

# Read the file of tables and analyze each of them
LINE_NUM=0
while read TABLE
do
   # Allow multiple tables to be analyzed in parallel (as background tasks)
   if [ $LINE_NUM -lt $NUM_PROCESSES ] ; then
      analyze_table "$TABLE" &
   else
      while read; do
         break
      done <&4
      analyze_table "$TABLE" &
   fi
   let LINE_NUM=$LINE_NUM+1
done < /tmp/$$.list

wait

rm /tmp/$$.list
rm /tmp/$$.pipe

I use a pipe to communicate between the primary shell and the background processes and run up to $NUM_PROCESSES in parallel. The $SCHEMA is an optional argument to limit the analyze to one schema.

Of course if you have one huge table taking up the whole database, this won’t help you much.

Download the db_analyze.sh script.

Related posts:

  1. Finding missing PostgreSQL stats
  2. Experimenting with TOAST Compression and Performance
  3. PostgreSQL Database Capacity Planning 1 – Disk Space
  4. PSQL Not Running First Statement In Script
  5. Creating and dropping a read only user
  6. Compare pg_dump and gzip compression

Leave a comment

Article Series

Kick Ass PostgreSQL Books

Kick Ass Oracle Books

I've read lots of Oracle books, but these are by far the best I've encountered:

Categories