May 16, 2011
Nathan Thom

PostgreSQL vs Oracle Differences #4 – Shared Memory Usage


One of the first things that threw me when I switched from Oracle to PostgreSQL was how it read data from disk and cached it. Here I will briefly explain the relevant settings and how they work.

Note in terminology: Oracle buffer = PostgreSQL page

Parameters
There are two key memory parameters you must set and I will explain before continuing.

1. shared_buffers = How much memory is used for caching shared buffers that have been read from disk. It is allocated on startup and cannot be changed without a restart of the entire cluster. It is equivalent to the buffer cache in Oracle (the db_cache_size parameter).

2. effective_cache_size = How much memory is available for the OS to cache disk reads. This is only used by the query planner when generating execution plans, to estimate how likely a page will be in cache. Note that it does not actually cause PostgreSQL to allocate any memory, it is merely an FYI parameter.

Oracle’s total SGA size (sga_target parameter) would be equivalent to shared_buffers + effective_cache_size. The total shared_buffers + effective_cache_size should not exceed the amount of physical RAM available.

There are also other parameters such as work_mem and maintenance_work_mem which determine how much memory is allocated for per-user memory areas, and are similar to Oracle’s PGA settings.

So what effect do these parameters actually have on a server? For example, on a server with:
• 32GB RAM
shared_buffers = 8GB
effective_cache_size = 24GB

$ ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x0052e2c1 1146880    postgres  600        8865316864 96

No surprise, it has allocated 8GB of shared memory.

Buffer Cache
PostgreSQL uses OS calls which make use of OS level disk caching. It does NOT use direct access to disks like Oracle does (except in some cases with writing the WAL logs which can be adjusted via the wal_sync_method parameter). The PostgreSQL database cache uses the clock-sweep algorithm which is similar to Oracle’s touch count algorithm where basically the more frequently a page is referenced the less likely it is to be evicted. I believe most operating systems use some sort of LRU algorithm which is suitable to typical OS usage.

Having a dedicated buffer cache for the database makes a lot of sense as it can use algorithms designed to maximise database performance. For example, it knows that scanning entire tables into the cache might not always be a good idea as it can trash existing more popular data buffers.

However since the OS also keeps its own cache of disk blocks, there will often be redundant data stored in memory. For this reason, it is generally recommended not to give PostgreSQL more than about 25% to 30% of the total RAM for its shared buffer. One popular recommendation is that using more than about 8GB for shared buffers starts to hit scalability issues, but I have one specialised server with 32GB shared_buffer and 8GB effective_cache_size which works very well. So a compromise is required between giving the database more memory and reducing the wastage caused by redundantly storing the same data in two separate caches. This can be initially confusing to Oracle DBA’s who are used to giving the database every available scrap of memory to do with as it pleases.

Checkpoints
Checkpoints in PostgreSQL can be a problem with larger shared buffers settings. Every time it runs, it needs to scan the entire shared buffer looking for dirty pages to flush to disk. The checkpoints do not happen very often (every 5 mins by default, governed by checkpoint_timeout and checkpoint_segments) which means they may have a lot of work to do each time they run, depending on how busy the system is. However, given that it needs to scan the entire buffer each time, running it too frequently would cause a lot of overhead. By comparison, Oracle checkpoints occur no less than every 3 seconds and use several data structures to make finding the dirty buffers very fast (e.g. the Write/Dirty List).

Changes in recent PostgreSQL versions provide ways to reduce the large periodic hits in performance and try to stretch the I/O load out over a longer period of time to give a continual trickle of activity – in effect, achieving what Oracle does. In my 8.3 databases, I have had to increase the checkpoint_timeout and checkpoint_segments parameters on the databases that receive very heavy nightly write activity to avoid too frequent checkpoints – in some cases, leaving an hour in between checkpoints to prevent batch processing issues. PostgreSQL will helpfully log a warning for you, which may look something like (yeah I was seriously trashing this database!):

LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (1 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (1 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".

The risk of increasing the time between checkpoints is that in the event of a database crash, the recovery time will be so much longer as it needs to recreate all the missing data from the WAL logs. It can also cause disk space issues if you don’t allow for a potentially huge pg_xlog directory as the WAL log files build up in between checkpoints.

Monitoring
In the early days of Oracle there was a lot of emphasis on monitoring the cache hit ratio. Now, performance analysis is done mostly via the wait interface which is far superior. In PostgreSQL, we still have to rely on ratios to figure out what is happening.

Table level buffer cache hit ratio:

select t.schemaname, t.relname, heap_blks_hit*100/(heap_blks_hit+heap_blks_read) as BCHR
from pg_statio_user_tables t
where t.heap_blks_read > 0

Database level buffer cache hit ratio:

SELECT d.datname, pg_database_size(d.datname), SUM(pg_stat_get_db_blocks_hit(d.oid)) / SUM(pg_stat_get_db_blocks_fetched(d.oid)) AS hit_rate
FROM pg_database d
GROUP BY d.datname
HAVING SUM(pg_stat_get_db_blocks_fetched(d.oid)) > 0

From these we can see how often the tables/databases are being accessed directly from the PostgreSQL shared buffer cache, but we can’t get any useful information from the OS disk cache.

There is a contrib module called pg_buffercache which allows you to see exactly what is in the database buffer cache, but it should not be used on a busy database as it will lock out the memory contents while it reads them. This could be useful to see which pages are the most popular as it shows you the usage count of each page which is an indication of how popular it is. Personally I have not used it for anything other than experimentation within a testing database.

I’m not sure why the design decision was made to let PostgreSQL rely on the underlying OS to cache disk accesses. For sure it makes for a far simpler database implementation, but I wonder if there was some other reason. If not, perhaps they will eventually add more support for direct access and bypassing the OS cache which will then reduce the duplication of cached data and allow for more database specific caching algorithms to improve performance. On the other hand OS’s are getting smarter and RAM cheaper, so perhaps it’s better to not overcomplicate the code and let the OS handle the problem.

Other posts in this series:
PostgreSQL vs Oracle Differences #3 – System Resources
PostgreSQL vs Oracle Differences #2 – Support
PostgreSQL vs Oracle Differences #1 – Clusters

Related posts:

  1. PostgreSQL vs Oracle Differences #3 – System Resources
  2. PostgreSQL vs Oracle Differences #1 – Clusters
  3. PostgreSQL vs Oracle Differences #2 – Support
  4. Setting Initial PostgreSQL Parameters
  5. Top 10 Missing PostgreSQL Features
  6. PostgreSQL Log Monitoring and Alerting
  7. PostgreSQL Quick Health Check
  8. Change parameter via pg_ctl

3 Comments

  • Also, PostgreSQL doesn’t have the shared pool to cache object definitions and parsed SQL statements. That is why nothing like V$SQL_PLAN is possible in PgSQL.

    • In Postgres, you use EXPLAIN to get the SQL plan, same as Oracle:

      SQL> explain select count(*) from person;

      QUERY PLAN
      ——————————————————————————–
      Aggregate (cost=167694.15..167694.16 rows=1 width=0)
      -> Seq Scan on person (cost=0.00..163431.32 rows=1705132 width=0)

      EXPLAIN completed.

      • v$sql_plan shows the cached execution plans for all cursors in the shared pool, not just the one you are running on the command line.

Leave a comment

Article Series

Kick Ass PostgreSQL Books

Kick Ass Oracle Books

I've read lots of Oracle books, but these are by far the best I've encountered:

Categories