PostgreSQL pg_xlog filling up disk
Let me walk you through an issue I just had.
A server had just been setup to start running a 100gb data mart which gets completely rebuilt every night. The first night, it had crashed and filled up two file systems, one of which I know holds all the database files. Lets take a look at the $PGDATA directory to see where the space is:
[postgres@host data]$ du -sk * 4 PG_VERSION 96757204 base 396 global 8 pg_clog 4 pg_hba.conf 4 pg_ident.conf 9204 pg_log 16 pg_multixact 16 pg_subtrans 0 pg_tblspc 0 pg_twophase 64684508 pg_xlog 20 postgresql.conf 4 postmaster.opts
So we have base and pg_xlog. I expect base to be that big as it contains all the data files, but pg_xlog should be tiny. Let’s see how many files there are in there:
[postgres@host pg_xlog]$ ls -l | wc –l 3950
Hmmm, seems a tad big. Let’s check some parameters to make sure I haven’t done anything dumb:
checkpoint_segments = 64
Ideally, there should be only 129 files (2*64+1). So why are there so many more? The files here are cleared out when a checkpoint occurs (after creating checkpoint_segments files or checkpoint_timeout is reached), or when they are archived somewhere else. But archive_mode should be disabled in this ser…. oh. Crap.Turns out that second file system that filled up was storing archived log files. The parameter archive_mode was on when it should have been off. We’ll remove all the archived log files and try again.
pg_ctl start
Nope, not starting. Log file:
EST:LOG: database system was not properly shut down; automatic recovery in progress EST:LOG: redo starts at 1C/6CFDDF58 EST:FATAL: could not extend relation 1663/16385/21862: No space left on device EST:HINT: Check free disk space. EST:CONTEXT: xlog redo page_update: rel 1663/16385/21862; tid 28730/19; block number 8687 EST:LOG: startup process (PID 20128) exited with exit code 1 EST:LOG: aborting startup due to startup process failure
It’s trying to do some instance recovery because it did not shut down cleanly last time, but does not have enough free disk space on the database file system to apply the WAL redo changes.This could get tricky as the only files on there are the database ones. First I tried moving the pg_log files, but they were tiny and had no effect. That left pg_xlog. I moved the oldest xlog files to the /tmp file system and tried to start it up again.
EST:LOG: database system was interrupted while in recovery at 2010-12-23 08:51:01 EST EST:HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery. EST:LOG: could not open file "pg_xlog/000000010000001C000000D2" (log file 28, segment 210): No such file or directory EST:LOG: invalid primary checkpoint record EST:LOG: could not open file "pg_xlog/000000010000001C000000D2" (log file 28, segment 210): No such file or directory EST:LOG: invalid secondary checkpoint record EST:PANIC: could not locate a valid checkpoint record EST:LOG: startup process (PID 20455) was terminated by signal 6: Aborted EST:LOG: aborting startup due to startup process failure
Oops – one of the files I moved was needed. After a few attempts, I eventually got enough files back in place for it to start up correctly:
EST:LOG: database system was interrupted while in recovery at 2010-12-23 08:56:10 EST EST:HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery. EST:LOG: database system was not properly shut down; automatic recovery in progress EST:LOG: redo starts at 1C/6CFDDF58 EST:LOG: could not read from log file 28, segment 209, offset 15663104: File exists EST:LOG: redo done at 1C/D1EEFFA0 EST:LOG: last completed transaction was at log time 2010-12-23 02:34:32.273961+10 EST:LOG: database system is ready to accept connections
And we’re back. Checking the pg_xlog directory now, there are only 233 files. This is still more than 64*2+1 but much better. I then ran a checkpoint and checked again:
postgres=# checkpoint; CHECKPOINT postgres=# \q [postgres@host pg_xlog]$ ls -l | wc –l 132
That’s better.
Article Series
Kick Ass PostgreSQL Books
Recent Posts
Kick Ass Oracle Books
Popular Posts
- Top 10 Missing PostgreSQL Features 487 view(s)
- PostgreSQL vs Oracle Differences #4 - Shared Memory Usage 371 view(s)
- PostgreSQL vs Oracle Differences #3 - System Resources 331 view(s)
- pg_restore: [archiver] unsupported version (1.12) in file header 289 view(s)
- PostgreSQL - Dropping a template database 265 view(s)
- PostgreSQL Database Capacity Planning 1 - Disk Space 148 view(s)
- PostgreSQL vs Oracle Differences #1 - Clusters 145 view(s)
- PostgreSQL vs Oracle Differences #2 - Support 99 view(s)
Archives
- March 2012 (1)
- September 2011 (1)
- August 2011 (2)
- July 2011 (3)
- June 2011 (10)
- May 2011 (10)
- April 2011 (7)
- March 2011 (10)
- February 2011 (5)
- January 2011 (1)
- December 2010 (4)
- November 2010 (2)
- October 2010 (3)
Tags
Categories
- Linux (23)
- Oracle (6)
- PostgreSQL (55)
- Uncategorized (1)
- Windows (7)



