Dec 8, 2010
Nathan Thom

PostgreSQL Log Monitoring and Alerting


Something I really like in Oracle’s Enterprise Manager is the ability to see any errors/warnings being generated in the alert log.

With PostgreSQL I felt blind to what was going on so wanted to build some kind of log monitoring tool. I came across Tail_N_Mail which is a nifty little perl script to do exactly what I wanted.

The script will tail the end of the database log, remembering where it got up to last time it ran (I schedule it every 5 mins). For each line in the log it will then apply filters based on your config file, and any lines left at the end will be sent to you via email.

I call it like so:

perl tail_n_mail status

Where the status file contains all the stuff I’m not likely to change and gets automatically updated by the script after it runs (it adds the LASTFILE and OFFSET lines itself):

## Config file for the tail_n_mail program
## This file is automatically updated
## Last updated: Thu Sep 30 09:10:08 2010

MAILSUBJECT: (HOST) Postgres logfile errors (NUMBER)

FILE: /var/lib/dba/pgdata/pg_log/postgresql-%Y-%m-%d_000000.log
LASTFILE: /var/lib/dba/pgdata/pg_log/postgresql-2010-12-06_000000.log
OFFSET: 53729588
INHERIT: config

The config file contains all the matching rules. It first applies the INCLUDE lines then the EXCLUDE lines:

## Config file for the tail_n_mail program
## This file is automatically updated
## Last updated: Thu Sep 30 09:05:02 2010
EMAIL: alertme@domain.com

INCLUDE: ERROR:
INCLUDE: FATAL:
INCLUDE: PANIC:
INCLUDE: temporary file: path
INCLUDE: SIGHUP
INCLUDE: archive command failed
INCLUDE: could not be archived
INCLUDE: transaction ID wrap limit
INCLUDE: must be vacuumed within
INCLUDE: terminated
INCLUDE: Segmentation fault
INCLUDE: is missing

EXCLUDE: syntax
EXCLUDE: unrecognized configuration parameter
EXCLUDE: parameter ".+" cannot be changed now
EXCLUDE: outside the valid range for parameter
EXCLUDE: out of range
EXCLUDE: canceling statement due to user request
EXCLUDE: cannot be matched
EXCLUDE: cannot drop
EXCLUDE: current transaction is aborted
EXCLUDE: relation ".+" already exists
EXCLUDE: returned record type does not match expected record type
EXCLUDE: at character
EXCLUDE: duplicate key
EXCLUDE: must not return
EXCLUDE: cannot cast
EXCLUDE: has no field
EXCLUDE: does not exist
EXCLUDE: value too long
EXCLUDE: must have an alias
EXCLUDE: must appear in the
EXCLUDE: constraint
EXCLUDE: temporary file
EXCLUDE: must be type
EXCLUDE: not allowed in
EXCLUDE: invalid regular expression
EXCLUDE: there is no parameter
EXCLUDE: IMMUTABLE
EXCLUDE: already exists

One thing I found really annoying was that Postgres seems to treat this log file as a developer’s resource rather than a DBA’s. For example:

OMFG there’s an ERROR:

ERROR: canceling statement due to user request
ERROR: argument of WHERE must be type boolean, not type integer
ERROR: subquery has too many columns

Meh, just a LOG:

LOG: archive command failed with exit code 1
LOG: checkpoints are occurring too frequently (12 seconds apart)
LOG: received SIGHUP, reloading configuration files

Related posts:

  1. Change parameter via pg_ctl
  2. PostgreSQL pg_xlog filling up disk
  3. Setting Initial PostgreSQL Parameters
  4. PostgreSQL Database Capacity Planning 1 – Disk Space
  5. Preventing filling the temporary tablespace by killing slow queries
  6. Top 10 Missing PostgreSQL Features

Leave a comment

Article Series

Kick Ass PostgreSQL Books

Kick Ass Oracle Books

I've read lots of Oracle books, but these are by far the best I've encountered:

Categories