Browsing articles from "December, 2010"
Dec 23, 2010
Nathan Thom
Comments Off

PostgreSQL pg_xlog filling up disk


Let me walk you through an issue I just had.

A server had just been setup to start running a 100gb data mart which gets completely rebuilt every night. The first night, it had crashed and filled up two file systems, one of which I know holds all the database files. Lets take a look at the $PGDATA directory to see where the space is:

[postgres@host data]$ du -sk *
4 PG_VERSION
96757204 base
396 global
8 pg_clog
4 pg_hba.conf
4 pg_ident.conf
9204 pg_log
16 pg_multixact
16 pg_subtrans
0 pg_tblspc
0 pg_twophase
64684508 pg_xlog
20 postgresql.conf
4 postmaster.opts

So we have base and pg_xlog. I expect base to be that big as it contains all the data files, but pg_xlog should be tiny. Let’s see how many files there are in there:

[postgres@host pg_xlog]$ ls -l | wc –l
3950

Hmmm, seems a tad big. Let’s check some parameters to make sure I haven’t done anything dumb:

checkpoint_segments = 64

Ideally, there should be only 129 files (2*64+1). So why are there so many more? The files here are cleared out when a checkpoint occurs (after creating checkpoint_segments files or checkpoint_timeout is reached), or when they are archived somewhere else. But archive_mode should be disabled in this ser…. oh. Crap.Turns out that second file system that filled up was storing archived log files. The parameter archive_mode was on when it should have been off. We’ll remove all the archived log files and try again.

pg_ctl start

Nope, not starting. Log file:

EST:LOG: database system was not properly shut down; automatic recovery in progress
EST:LOG: redo starts at 1C/6CFDDF58
EST:FATAL: could not extend relation 1663/16385/21862: No space left on device
EST:HINT: Check free disk space.
EST:CONTEXT: xlog redo page_update: rel 1663/16385/21862; tid 28730/19; block number 8687
EST:LOG: startup process (PID 20128) exited with exit code 1
EST:LOG: aborting startup due to startup process failure

It’s trying to do some instance recovery because it did not shut down cleanly last time, but does not have enough free disk space on the database file system to apply the WAL redo changes.This could get tricky as the only files on there are the database ones. First I tried moving the pg_log files, but they were tiny and had no effect. That left pg_xlog. I moved the oldest xlog files to the /tmp file system and tried to start it up again.

EST:LOG: database system was interrupted while in recovery at 2010-12-23 08:51:01 EST
EST:HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery.
EST:LOG: could not open file "pg_xlog/000000010000001C000000D2" (log file 28, segment 210): No such file or directory
EST:LOG: invalid primary checkpoint record
EST:LOG: could not open file "pg_xlog/000000010000001C000000D2" (log file 28, segment 210): No such file or directory
EST:LOG: invalid secondary checkpoint record
EST:PANIC: could not locate a valid checkpoint record
EST:LOG: startup process (PID 20455) was terminated by signal 6: Aborted
EST:LOG: aborting startup due to startup process failure

Oops – one of the files I moved was needed. After a few attempts, I eventually got enough files back in place for it to start up correctly:

EST:LOG: database system was interrupted while in recovery at 2010-12-23 08:56:10 EST
EST:HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery.
EST:LOG: database system was not properly shut down; automatic recovery in progress
EST:LOG: redo starts at 1C/6CFDDF58
EST:LOG: could not read from log file 28, segment 209, offset 15663104: File exists
EST:LOG: redo done at 1C/D1EEFFA0
EST:LOG: last completed transaction was at log time 2010-12-23 02:34:32.273961+10
EST:LOG: database system is ready to accept connections

And we’re back. Checking the pg_xlog directory now, there are only 233 files. This is still more than 64*2+1 but much better. I then ran a checkpoint and checked again:

postgres=# checkpoint;
CHECKPOINT
postgres=# \q
[postgres@host pg_xlog]$ ls -l | wc –l
132

That’s better.

Pages:1234»

Article Series

Kick Ass PostgreSQL Books

Kick Ass Oracle Books

I've read lots of Oracle books, but these are by far the best I've encountered:

Categories