PostgreSQL Database Capacity Planning 1 – Disk Space
This is the first post in what will be a small series on database capacity planning. While equally applicable to all databases, this discussion will focus on PostgreSQL. This first post will cover setting up and reporting on database disk space growth. Later posts will cover server health and various database performance metrics.
All scripts I reference here can be found via the links at the bottom of this post.
Overview
I love statistics. One of the first things I do when setting up an environment is to make sure I have plenty of database health and performance metrics.
In Oracle, this is done automatically by the AWR processes and has been getter better with every new version. PostgreSQL does not have many built in statistics and what’s there is purely cumulative so you can’t see trends. If you want any really useful information you have to write some additional data capture tools that periodically log data points. There may be some third party ones already(?), but the wheel is never quite round enough for me.
I have 46 PostgreSQL servers which I need to monitor, over several different networks. I have one central management server which has access to all of them and appropriate rules are setup in every pg_hba.conf file to allow connectivity from this management server. There are about 250 databases on these servers which I monitor, so setting them all up consistently is incredibly important (I have written some posts about this: logging, installation, parameters).
On this central server I created a database called “performance” where I store all these additional statistics in a schema called “stats”. This schema includes data collected from various sources such as:
• PostgreSQL server up/down times
• Server health and performance
• Database connections, performance, locks
• Database and disk sizes
• Batch process/report runtimes
Each metric is collected at an appropriate frequency and retained for as long as I require. For example, I currently check database disk space usage once a day and keep it forever while I record database up/down status every minute but aggregate it after a few months. I tailor the details based on the environment and server/database being monitored.
Don’t worry too much about how much disk you will need to store all this data, as it is really insignificant compared to what you are typically monitoring. For example, the database_sizes table I mention below stores one row per database per day (approx 250 databases), and after 8 months consumes only 3MB. I would suggest you start recording anything and everything you think might be useful, and then later you can start reducing frequencies, aggregating old data, or purging as you see fit at the time.
Once you have a database that stores all of these stats, you have an incredibly valuable resource. I regularly use this data for:
• Automated performance/health monitoring and alerting
• Trend Analysis
• Compare two different timeframes to find changes/impacts of changes
• Capacity planning and forecasting (This is an awesome database forecasting book for Oracle, but the concepts apply equally to PostgreSQL. I was completely blown away by the possibilities after reading this.)
• Historical performance analysis (i.e. wtf went wrong last night?!)
• Web based GUI with real time performance graphs (flot or pChart are very nice to use)
• Weekly capacity/performance summary emails to management with nice graphical charts
Monitoring Disk Space Utilisation
In PostgreSQL there are four levels of disk usage I’m interested in:
• OS file system sizes
• Database sizes
• Tablespace sizes
• Table and Index sizes
You may also want schema sizes, but since I log individual table sizes (for a small subset of databases) I could just aggregate that up to calculate schema sizes easily.
The file system disk sizes are most likely already being monitored, perhaps by a separate storage/infrastructure team via some other system such as Zenoss or even a third party facilities management provider. However, I find that keeping my own copy of these stats is very handy for capacity planning, avoiding a round trip through the team bureaucracy and ending up with a screenshot of a graph which is not in any usable format.
OS File System Sizes
To monitor the file systems, I run the following command:
ssh -n $HOST "df -mx tmpfs -x iso9660 -x nfs | grep -v Filesystem"
I exclude temp and CD file system types which are often at 100% used, and I also exclude NFS to avoid recording the same disks multiple times from different servers. An example from one of my servers:
$ df -mx tmpfs -x iso9660 -x nfs | grep -v Filesystem /dev/sda5 19376 7250 11127 40% / /dev/sda7 64960 12704 48904 21% /opt /dev/sda6 9689 205 8984 3% /tmp /dev/sda3 19376 3177 15199 18% /var /dev/sda1 244 21 210 9% /boot /dev/mapper/mpath24 255875 250916 4960 99% /dbprd1/tablespc/db_corporate /dev/mapper/mpath23 501681 355255 146427 71% /dbprd1/tablespc/db_rawdata /dev/mapper/mpath22 51189 24188 27002 48% /var/lib/pgsql /dev/mapper/mpath21 255875 50282 205594 20% /dbprd2/archlogs
I then use awk to pull out the size, used and mount name fields to store in my disks table.
performance=> \d disks Table "stats.disks" Column | Type | Modifiers -----------+-----------------------------+----------- timestamp | timestamp without time zone | host | character varying(30) | mount | character varying(50) | size | integer | used | integer |
You can also use this script to send alerts if the file systems are too full (as my mon_disks.sh script does).
Database Sizes
To monitor database sizes, I run the following command:
select pg_database.datname, pg_database_size(pg_database.datname) AS size
FROM pg_database where datname not in ('template0','template1','postgres')
This uses the pg_database_size function which is in all versions of databases I use (8.1 to 9.0). I exclude the common three databases that are in every cluster, because they are of no interest to me.
I actually wrap the above SELECT in a COPY statement that pipes the output directly into the database_sizes table in my performance database as follows:
ssh -n $HOST ". ~/.bash_profile; psql -U postgres postgres -c \
\"copy ( select localtimestamp, '$HOST', pg_database.datname, pg_database_size(pg_database.datname) AS size FROM pg_database where datname not in ('template0','template1','postgres') ) to stdout\" " \
| psql -U stats performance -c "copy stats.database_sizes from stdin"
The table structure is:
performance=> \d stats.database_sizes Table "stats.database_sizes" Column | Type | Modifiers -----------+-----------------------------+----------- timestamp | timestamp without time zone | host | character varying(30) | database | character varying(64) | size | bigint |
Tablespace Sizes
To monitor individual tablespace sizes, I run the following command:
select spcname, pg_tablespace_size(oid) from pg_tablespace
This uses the pg_tablespace_size function which is in all versions of databases I use (8.1 to 9.0).
As with the database sizes, I wrap the above SELECT in a COPY statement that pipes the output directly into the database_sizes table in my performance database.
performance=> \d stats.tablespace_sizes
Table "stats.tablespace_sizes"
Column | Type | Modifiers
-----------------+-----------------------------+-----------
timestamp | timestamp without time zone |
host | character varying(30) |
tablespace_name | character varying(64) |
size | bigint |
Of course, if you don’t use multiple tablespaces this is not of much use to you.
Table and Index Sizes
For critical databases you might want to keep stats at the table level. I only do this for my most important data warehouse where nightly loads sometimes create large spikes of data and this allows me to quickly find out which tables are to blame.
Note that this script relies on accurate table statistics – i.e. autovacuum should be turned on, or regularly scheduled vacuums should be running to update the stats.
The primary query for this script is:
select n.nspname, c.relname, c.relkind, c.relpages, c.reltuples
from pg_class c, pg_namespace n
where c.relnamespace = n.oid
and n.nspname not in ('information_schema','pg_catalog')
and relkind != 'S' and relpages > 1
I ignore sequences, and am only interested in tables and indexes with more than a single page. As usual, I wrap this in a COPY statement to pipe it directly into the table_sizes table:
performance=> \d stats.table_sizes Table "stats.table_sizes" Column | Type | Modifiers -----------+-----------------------------+----------- timestamp | timestamp without time zone | host | character varying(30) | database | character varying(64) | schema | name | relname | name | relkind | character(1) | pages | bigint | rows | real |
Reporting – Example SQL
If you’re not sure what to do with this wealth of data, here are some suggestions along with some SQL examples to work with the table definitions I have described above. All of these queries run on PostgreSQL 8.3, so no awesome windowing functions have been used.
Biggest Databases
List your top 20 biggest databases across all monitored servers/databases.
select b.database, max(b.size) from ( select distinct database from database_sizes where timestamp > localtimestamp - interval '1 day' ) a, database_sizes b where a.database=b.database and b.timestamp > localtimestamp - interval '1 day' group by b.database order by 2 desc limit 20
Daily database size
Show a particular database’s size and rate of growth each day for the last X days.
select to_char(a.timestamp,'YYYYMMDD') as day, a.host, a.database, round(sum(a.size)/1024/1024) as "Size MB" , round((sum(a.size)-sum(b.size))/1024/1024) as "MB/day" from database_sizes a, database_sizes b where a.host = b.host and a.database = b.database and to_char(b.timestamp,'YYYYMMDD') = to_char(a.timestamp - interval '1 day','YYYYMMDD') and a.timestamp > current_timestamp - interval ' days' and a.host = '<HOST>' and a.database = '<DATABASE>' group by to_char(a.timestamp,'YYYYMMDD'), a.host, a.database, to_char(b.timestamp,'YYYYMMDD') order by 1,2
Example output:
day | host | database | Size MB | MB/day ----------+--------------+----------+---------+-------- 20110531 | epplin01-dwh | staging | 454730 | 4220 20110601 | epplin01-dwh | staging | 456756 | 2026 20110602 | epplin01-dwh | staging | 456851 | 95 20110603 | epplin01-dwh | staging | 456879 | 28 20110604 | epplin01-dwh | staging | 456897 | 19 20110605 | epplin01-dwh | staging | 457017 | 119 20110606 | epplin01-dwh | staging | 457090 | 74
Fastest growing databases
Show the databases with the highest average daily growth rate, calculated over the last 30 days. It will also show the forecasted database size in 30 days, 1 year and 3 years into the future at the current rate of growth.
select to_char(y.timestamp,'YYYYMMDD') as day, x.host, x.database, y.size/1024/1024 as "Current MB", x."Avg MB/day"
, y.size/1024/1024+30*x."Avg MB/day" as "Est +1mon MB", y.size/1024/1024+365*x."Avg MB/day" as "Est +1yr MB"
, y.size/1024/1024+3*365*x."Avg MB/day" as "Est +3yrs MB"
from (
-- get average daily growth rate
select x.host, x.database, round(avg(mbpd)) as "Avg MB/day" from (
-- get daily growth rate over the last month
select a.timestamp, a.host, a.database, (sum(a.size)-sum(b.size))/1024/1024 as mbpd
from database_sizes a, database_sizes b
where a.host = b.host
and a.database = b.database
and to_char(b.timestamp,'YYYYMMDD') = to_char(a.timestamp - interval '1 day','YYYYMMDD')
and a.timestamp > current_timestamp - interval '30 days'
group by 1, a.host, a.database
) x
group by x.host, x.database
) x, database_sizes y
where x.host = y.host
and x.database = y.database
and to_char(y.timestamp,'YYYYMMDD') = (
select to_char(max(z.timestamp),'YYYYMMDD') from database_sizes z
where z.host=y.host and z.database=y.database
)
and y.timestamp > current_timestamp - interval '1 day'
order by 5 desc
limit 20;
Example output:
day | host | database | Current MB | Avg MB/day | Est +1mon MB | Est +1yr MB | Est +3yrs MB ----------+----------------+---------------------+------------+------------+--------------+-------------+-------------- 20110606 | applin02-repo | corporate | 366002 | 3837 | 481112 | 1766507 | 4567517 20110606 | applin02-asx | asx_v2c_12 | 56306 | 1803 | 110396 | 714401 | 2030591 20110606 | applin02-asx | asx_v2c_13 | 44073 | 1377 | 85383 | 546678 | 1551888 20110606 | applin02-asx | asx_v2c_11 | 41073 | 1217 | 77583 | 485278 | 1373688 20110606 | epplin01-dwh | staging | 457090 | 780 | 480490 | 741790 | 1311190 20110606 | btplin01-asx | avmdaily_new | 73762 | 434 | 86782 | 232172 | 548992 20110606 | btplin03-etl | property_nz | 23959 | 397 | 35869 | 168864 | 458674 20110606 | btplin02-etl | dev_staging | 318648 | 381 | 330078 | 457713 | 735843 20110606 | btplin03-etl | property_au_support | 169143 | 194 | 174963 | 239953 | 381573 20110606 | epplin01-pos | property_au | 169482 | 192 | 175242 | 239562 | 379722 20110606 | applin03-postg | property_au | 169659 | 189 | 175329 | 238644 | 376614 20110606 | applin01-postg | property_au | 169392 | 189 | 175062 | 238377 | 376347 20110606 | asplin01-postg | property_au | 171896 | 188 | 177536 | 240516 | 377756 20110606 | asplin02-postg | property_au | 168959 | 180 | 174359 | 234659 | 366059 20110606 | btplin03-etl | property_au | 91892 | 137 | 96002 | 141897 | 241907 20110606 | epplin01-postg | asxdaily | 95988 | 132 | 99948 | 144168 | 240528 20110606 | epplin02-pos | rpp | 19492 | 123 | 23182 | 64387 | 154177 20110606 | epplin01-dwh | rawdata | 63975 | 76 | 66255 | 91715 | 147195 20110606 | btplin03-etl | rpp_dev_master | 12955 | 71 | 15085 | 38870 | 90700 20110606 | bpvlin03-pos | content | 47589 | 68 | 49629 | 72409 | 122049
Highest utilised file systems
Show the most used file systems, their rate of growth, and forecasted sizes in 30 days, 1 year and 3 years into the future at the current rate of growth.
select to_char(y.timestamp,'YYYYMMDD') as day, x.host, x.mount, y.size as "Size MB"
, round(100*y.used/y.size) as "Used %", x."Avg MB/day"
, round(100*((y.used+30*x."Avg MB/day")/y.size)) as "Est +1mon %"
, round(100*((y.used+365*x."Avg MB/day")/y.size)) as "Est +1yr %"
, round(100*((y.used+3*365*x."Avg MB/day")/y.size)) as "Est +3yrs %"
from (
-- get average daily growth rate
select x.host, x.mount, round(avg(mbpd)) as "Avg MB/day" from (
-- get daily growth rate over the last month
select a.timestamp, a.host, a.mount, (sum(a.used)-sum(b.used)) as mbpd
from disks a, disks b
where a.host = b.host
and a.mount = b.mount
and a.mount not in ('/', '/tmp')
--and a.host='epplin01-dwh'
and to_char(b.timestamp,'YYYYMMDD') = to_char(a.timestamp - interval '1 day','YYYYMMDD')
and a.timestamp > current_timestamp - interval '30 days'
group by 1, a.host, a.mount
) x
group by x.host, x.mount
having avg(mbpd) > 0
) x, disks y
where x.host = y.host
and x.mount = y.mount
and to_char(y.timestamp,'YYYYMMDD') = (
select to_char(max(z.timestamp),'YYYYMMDD') from disks z
where z.host=y.host and z.mount=y.mount
)
and y.timestamp > current_timestamp - interval '1 day'
order by 5 desc
limit 20;
Example output:
day | host | mount | Size MB | Used % | Avg MB/day | Est +1mon % | Est +1yr % | Est +3yrs % ----------+----------------------------+------------------------------+---------+--------+------------+-------------+------------+------------- 20110606 | applin03-postg | /opt | 235548 | 90 | 212 | 93 | 123 | 189 20110606 | applin01-postg | /opt | 243045 | 81 | 242 | 85 | 118 | 191 20110606 | epplin01-dwh | /dbprd1/tablespc/staging_tab | 307075 | 80 | 433 | 85 | 132 | 235 20110606 | brvlin01-pos | /backups | 100792 | 79 | 104 | 83 | 118 | 193 20110606 | btplin02-etl | /wholedsk2 | 511950 | 78 | 456 | 81 | 111 | 176 20110606 | btplin01-asx | /opt | 828646 | 77 | 1179 | 82 | 129 | 233 20110606 | epplin01-dwh | /datawarehouse | 501681 | 71 | 551 | 75 | 112 | 192 20110606 | epplin01-pos | /opt | 279637 | 70 | 216 | 73 | 98 | 155 20110606 | epplin03-dwh | /opt | 64960 | 69 | 0 | 69 | 69 | 69 20110606 | epplin03-postg | /opt | 157671 | 67 | 0 | 67 | 67 | 67 20110606 | bpplin01-olt.rpdata.net.au | /var | 129219 | 63 | 3 | 63 | 64 | 66 20110606 | epplin01-postg | /opt | 270682 | 63 | 483 | 69 | 129 | 259 20110606 | asplin02-postg | /opt | 548464 | 58 | 263 | 60 | 76 | 111 20110606 | bpvlin01-jira4 | /opt | 20153 | 55 | 77 | 67 | 195 | 474 20110606 | bpplin01-gis | /backup | 1140462 | 53 | 1587 | 58 | 104 | 206 20110606 | bpvlin03-pos | /backup | 1140462 | 53 | 1608 | 58 | 105 | 208 20110606 | epplin01-olt | /var | 129219 | 49 | 22 | 50 | 55 | 68 20110606 | applin02-asx | /opt | 473953 | 33 | 5896 | 71 | 488 | 1396 20110606 | bdplin01-gis | /aerial-photos | 274101 | 32 | 3 | 33 | 33 | 34 20110606 | epplin02-gis | /dbprd | 155272 | 32 | 0 | 32 | 32 | 32
What tables changed in the last 24 hrs?
For a specific database that has table level monitoring, list the tables that changed (according to their stats) in order of the largest growth.
select to_char(a.timestamp,'YYYYMMDD') as day, to_char(b.timestamp,'YYYYMMDD') as prevday , a.database, a.schema, a.relname, a.pages, a.rows, a.pages-b.pages as delta_pages , a.rows-b.rows as delta_rows from stats.table_sizes a, stats.table_sizes b where a.host = b.host and a.database = b.database and a.schema = b.schema and a.relname = b.relname and to_char(b.timestamp,'YYYYMMDD') = ( select max(to_char(c.timestamp,'YYYYMMDD')) from stats.table_sizes c where b.host = c.host and b.database = c.database and b.schema = c.schema and b.relname = c.relname and to_char(c.timestamp,'YYYYMMDD') <= to_char(a.timestamp - interval '1 days','YYYYMMDD') ) and a.relkind = 'r' and b.pages-a.pages != 0 and a.timestamp > current_timestamp - interval '1 day' and a.host = '<HOST>' and a.database = '<DATABASE>' order by 8 desc
Example:
day | prevday | database | schema | relname | pages | rows | delta_pages | delta_rows ----------+----------+----------+-----------+--------------------------------------------+---------+-------------+-------------+------------- 20110601 | 20110531 | staging | history | advertisement_pits | 5193827 | 1.18786e+08 | 625636 | 1.0395e+07 20110601 | 20110531 | staging | history | staging_update_advertisement | 1389812 | 1.48156e+08 | 62775 | 6.69901e+06 20110601 | 20110531 | staging | delta | vg_photo_trans | 452995 | 5.14313e+07 | 5462 | 377000 20110601 | 20110531 | staging | principal | property_photograph_resource | 607742 | 4.90559e+07 | 4919 | 398500 20110601 | 20110531 | staging | history | staging_update_property_attribute | 1287741 | 1.04467e+08 | 3440 | 278000 20110601 | 20110531 | staging | principal | advertisement | 2300927 | 4.77944e+07 | 2788 | 25700 20110601 | 20110531 | staging | principal | property_photo_resource_prop | 321052 | 5.03951e+07 | 2407 | 377300 20110601 | 20110531 | staging | principal | advertisement_sales_details | 206285 | 3.43265e+07 | 2309 | 190900 20110601 | 20110531 | staging | history | staging_update_owner | 809901 | 9.71881e+07 | 2234 | 268096 ...
The Scripts
I use three separate scripts to monitor disk usage, so each item can be run on a separate schedule.
mon_disks.sh – file system sizes
mon_size.sh – database and tablespace sizes
mon_tables.sh – table/index sizes
Grab them all from my scripts download page.
So if you are a DBA (of any flavour), do you have these stats and can you provide them at a moment’s notice? If not, what’s your excuse!
If you are eagerly waiting for the second instalment in this series, you could always subscribe to my feed
Related posts:
- PostgreSQL pg_xlog filling up disk
- Finding missing PostgreSQL stats
- Mixing client and server PSQL versions
- Experimenting with TOAST Compression and Performance
- PostgreSQL Quick Health Check
- Preventing filling the temporary tablespace by killing slow queries
- Compare pg_dump and gzip compression
- Queries Running for Negative Time
2 Comments
Leave a comment
Article Series
Kick Ass PostgreSQL Books
Recent Posts
Kick Ass Oracle Books
I've read lots of Oracle books, but these are by far the best I've encountered:
Popular Posts
- Top 10 Missing PostgreSQL Features 487 view(s)
- PostgreSQL vs Oracle Differences #4 - Shared Memory Usage 371 view(s)
- PostgreSQL vs Oracle Differences #3 - System Resources 331 view(s)
- pg_restore: [archiver] unsupported version (1.12) in file header 289 view(s)
- PostgreSQL - Dropping a template database 265 view(s)
- PostgreSQL Database Capacity Planning 1 - Disk Space 148 view(s)
- PostgreSQL vs Oracle Differences #1 - Clusters 145 view(s)
- PostgreSQL vs Oracle Differences #2 - Support 99 view(s)
Archives
- March 2012 (1)
- September 2011 (1)
- August 2011 (2)
- July 2011 (3)
- June 2011 (10)
- May 2011 (10)
- April 2011 (7)
- March 2011 (10)
- February 2011 (5)
- January 2011 (1)
- December 2010 (4)
- November 2010 (2)
- October 2010 (3)
Tags
Categories
- Linux (23)
- Oracle (6)
- PostgreSQL (55)
- Uncategorized (1)
- Windows (7)




If you are really interested in statistics check out splunk.
Very interesting. I’ve not understood why all the enterprise level DB monitoring stuff that is purchased but never actually used, when something like your scripts cover 95% of a DBA’s needs.
Refreshing. Thanks for sharing.