PostgreSQL Query Optimization Second Edition Summary

Disclaimer: Views expressed in this software engineering blog are personal and do not represent my employer. Readers are encouraged to verify information independently.

Introduction

Hello, this is a summary of the book PostgreSQL Query Optimization Second Edition after reading mysql now that I am professionally working with postgresql I want to learn more about optimizing queries in postgresql.

These types of summaries are not really for audiences but for me to remember what I read and to be able to reference them later. AI tools these days are so good at this but I like to do this type of summaries for my own learning. Actually each chapter in the book have a summary so :)

Also this is not replacement of the book, I may skip stuff that I already know or I don’t find interesting.

Chapter 1: Why Optimize?

In this chapter the authors claims that we should optimize the queries the first time we write them, and that we should not wait until the database is slow to optimize the queries. How much performance is good performance? the authors claim that the answer is that it depends on the application and the users. they recommend a framework to optimize queries which is: SMART

Specific
Measurable
Achievable
Relevant
Time-bound

it’s good to keep in mind that performance from user perspective is the end-to-end time from the user request to the response. Not just the time it takes to execute the query. It’s good to keep an eye on performance even if we optimized a query today, because the data changes and the query might not be optimal anymore. So we should keep monitoring the performance of the queries with alerts and automated monitoring tools. All optimizations related to other databases don’t apply to postgresql, because postgresql do somethings differently than other databases. For example postgresql doesn’t have hints to the optimizer, postgresql developers didn’t do that because their intention was to write a good optimizer that doesn’t need hints. And they did a good job at that, the postgresql optimizer is one of the best optimizers out there.

Chapter 2: Theory: Yes, We Need It!

This chapter cover some basics from relational theory ( same covered in my university courses, so will not cover it here )

In order to produce query results, the database system must first parse the query, compile and transfer, then optimize it, and finally execute it.

The database engine parse SQL queries into a logical plan transform the results then chose algorithms to implement the logical plan and finally execute the plan. These operations are derived from relational theory.

Chapter 3: Even More Theory: Algorithms

Algorithm Cost Models

Database systems use cost models to estimate query execution time
Costs are typically measured in:
- I/O operations (disk reads/writes)
- CPU processing time
- Memory usage
- Network communication

Data Access Algorithms

Storage Structures

Data is stored in pages/blocks on disk
Tables are collections of pages
Buffer cache is used to keep frequently accessed pages in memory

Full Scan

Reads all pages of a table sequentially
Best when:
- Large portion of table needed
- No useful indexes available
- Table is small
Worst when:
- Only few rows needed
- Useful indexes exist

Index-Based Table Access

Uses index to locate specific rows
Two-step process:
1. Search index to find row locations
2. Fetch actual rows from table
Effective when:
- Small portion of table needed
- Index matches query conditions
- Index is selective

Index-Only Scan

Gets all needed data directly from index
No table access required
Most efficient when:
- All required columns are in index
- Index is well-maintained (minimal bloat)

Index Structures

B-Tree Indexes

Most common index type
Balanced tree structure
Efficient for:
- Equality searches
- Range queries
- Sorting
Properties:
- Self-balancing
- Logarithmic search time
- Good for both reads and writes

Other Index Types

Hash indexes
GiST (Generalized Search Tree)
GIN (Generalized Inverted Index)
BRIN (Block Range Index)

Combining Relations (Joins)

Nested Loops

Simple but can be inefficient
Three variations:
1. Simple nested loops
2. Index nested loops
3. Block nested loops
Best for:
- Small tables
- When inner table has good indexes

Hash-Based Algorithms

Builds hash table of smaller relation
Probes hash table with larger relation
Efficient for:
- Large datasets
- Equality joins
Memory intensive

Sort-Merge Algorithm

Sort both relations
Merge sorted results
Efficient for:
- Already sorted data
- Range conditions
- Large datasets that don’t fit in memory

Performance Comparisons

Algorithm choice depends on:
- Data size
- Available indexes
- Query conditions
- Available resources
- Data distribution

Chapter 4: Understanding Execution Plans

How an Optimizer Builds an Execution Plan

To build a plan, the optimizer uses transformation rules, heuristics, and cost-based optimization algorithms.

Reading Execution Plans

To obtain the execution plan for a query we prepend EXPLAIN to the query.

EXPLAIN
SELECT f.flight_no, f.actual_departure, count(passenger_id) passengers
from flight f
JOIN booking_leg bl ON bl.flight_id = f.flight_id
JOIN passenger p ON p.booking_id=bl.booking_id
WHERE f.departure_airport = 'JFK' AND f.arrival_airport = 'ORD' AND f.actual_departure between '2023-08-10' and '2023-08-13'
GROUP BY f.flight_id, f.actual_departure

Finalize GroupAggregate  (cost=594251.47..594253.03 rows=1 width=24)
  Group Key: f.flight_id
  ->  Gather Merge  (cost=594251.47..594253.01 rows=2 width=24)
        Workers Planned: 2
        ->  Partial GroupAggregate  (cost=593251.45..593252.76 rows=1 width=24)
              Group Key: f.flight_id
              ->  Sort  (cost=593251.45..593251.88 rows=173 width=20)
                    Sort Key: f.flight_id
                    ->  Parallel Hash Join  (cost=247265.49..593245.02 rows=173 width=20)
                          Hash Cond: (p.booking_id = bl.booking_id)
                          ->  Parallel Seq Scan on passenger p  (cost=0.00..312940.97 rows=8810097 width=8)
                          ->  Parallel Hash  (cost=247265.35..247265.35 rows=11 width=20)
                                ->  Parallel Hash Join  (cost=14301.16..247265.35 rows=11 width=20)
                                      Hash Cond: (bl.flight_id = f.flight_id)
                                      ->  Parallel Seq Scan on booking_leg bl  (cost=0.00..212763.30 rows=7695530 width=8)
                                      ->  Parallel Hash  (cost=14301.15..14301.15 rows=1 width=16)
                                            ->  Parallel Seq Scan on flight f  (cost=0.00..14301.15 rows=1 width=16)
                                                  Filter: ((actual_departure >= '2023-08-10 00:00:00+03'::timestamp with time zone) AND (actual_departure <= '2023-08-13 00:00:00+03'::timestamp with time zone) AND (departure_airport = 'JFK'::bpchar) AND (arrival_airport = 'ORD'::bpchar))
JIT:
  Functions: 30
  Options: Inlining true, Optimization true, Expressions true, Deforming true

a plan contains the expected number of output rows and the expected average width of output row, which are calculated from database statistics, and estimates of costs, which are calculated based on the previously obtained estimates and configuration parameters. The values of costs include the accumulated cost of all pervious operations. There are two cost estimations for each operation: the first shows the cost needed to produce the first row of output, while the second estimates the cost of the complete result. The execution of a plan starts from the leaves and ends at the root. This means that the operation that is executed first will be on the line that has the rightmost offset.

What Is Going On During Optimization?

Determining the possible orders of operations
Determining the possible execution algorithms for each operation
Comparing the costs of different plans
Selecting the optimal execution plan

Why Are There So Many Execution Plans to Choose From?

possible execution plans stems from the multiple combinations of:

Table join ordering
Join algorithms (nested loops, hash joins, etc.)
Data access methods (indexes, full scans)

With just 3 tables, considering 3 join algorithms and 12 possible join orders, there are already 108 possible plans. Including data retrieval methods increases this to thousands.

PostgreSQL optimizes this by:

Using the optimality principle - sub-plans of optimal plans must also be optimal
Building optimal plans incrementally from smallest sub-plans
Setting a geqo_threshold parameter to limit exhaustive search
Using heuristics for queries with many joins

While heuristics help performance by reducing the search space, they may occasionally miss the absolute optimal plan. However, PostgreSQL will still find the best plan among those considered.

How Execution Costs Are Calculated

Execution plan costs depend on:

Algorithm cost formulas
Statistical data on tables/indexes and value distributions
System settings and parameters

The optimizer uses these factors to calculate costs:

Cost formulas from different algorithms (covered in Chapter 3)
Table/result set sizes
Configurable system parameters

Key points about execution costs:

No single “best plan” exists for all similar queries
Slight differences in statistics can change plan selection
Statistics change based on data modifications
Plans may differ for nearly identical queries based on:
- Value distributions
- Table statistics
- Current system state

For example, if we have many entries with country US and only 1 in CZ, a query filtering on different values (‘US’ vs ‘CZ’) can result in completely different execution plans, because:

Queries returning many rows may favor full table scans
Queries returning few rows may prefer index access

Regular statistics updates via ANALYZE are critical for optimal plan selection, as outdated statistics can lead to suboptimal execution plans.

The Optimizer’s Limitations

Despite the PostgreSQL optimizer’s sophistication, it can produce suboptimal execution plans for several reasons:

Cost Estimation Limitations

Mathematical models assume uniform data distribution
Real data rarely follows uniform distribution
Even advanced formulas are approximations

Statistics Limitations

Histograms help with stored data selectivity
Cannot use histograms for intermediate results
Intermediate result estimation errors are common

Algorithmic Limitations

Complex queries may exceed exact optimization capability
Heuristics may exclude optimal plans
Approximate algorithms used for complex cases

These limitations mean human intervention is sometimes needed, as humans can:

Observe actual system behavior
Apply contextual knowledge
Guide the optimizer with additional information

The optimizer works well in most cases, but understanding its limitations helps identify when and how to assist it in finding better execution plans.

Chapter 5: Short Queries and Indexes

What Makes a Query “Short”?

short query is not the query with the least number of lines, but the query that runs over a table and returns a small number of rows. So most OLTP queries are short queries. On the other hand, long queries are the queries that run over a table and return a large number of rows. These queries are usually used in OLAP systems.

Choosing Selection Criteria

Hence, short queries return a small number of rows from a bigger set, indexes are very important for short queries. The index is used to find the rows that match the selection criteria, and then the rows are fetched from the table.

Index Selectivity

The lower of the selectivity of the index, the better the index is. Let’s say we have a table with 1000 rows and we have an index on a column that has 100 unique values. The selectivity of the index is 100/1000 = 0.1 If the selectivity of the index is high, a sequential scan is better than using the index.

Unique Indexes and Constraints

An index is unique if for each indexed value there is exactly one matching row in the table. These have the best selectivity. Unique constraints in PostgreSQL allow for NULL values.

Indexes and Non-equal Conditions

B-tree indexes they can support searches by equality, greater than, less than, and between conditions: all the searches that require comparison and ordering.

Indexes and Column Transformations

A column transformation occurs when the search criteria are on some modifications of the values in a column. For example, lower(last_name) (converting the last_name value to lowercase) and update_ts::date (casting timestamp with time zone to date) are column transformations. if there is an index on last name

CREATE INDEX account_last_name ON account (last_name);

the following search won’t be able to take advantage of the index:

SELECT * FROM account
WHERE lower(last_name)='ayn';

a way to solve this is cover all cases

SELECT * FROM account
WHERE last_name='ayn'
      OR last_name='Ayn'
      OR last_name ='AYN'

or a better solution is to create a functional index

CREATE INDEX account_lower_last_name ON account (lower(last_name));

another example on time

SELECT *
FROM records
WHERE created_at::date
BETWEEN '2023-08-17' AND '2023-08-18'

this casting from timestamp to date will not be able to take advantage of the index on scheduled_departure another transformation is using COALESCE function, which is used to return the first non-NULL value in a list of values.

SELECT *
FROM records
WHERE coalesce(created_at, updated_at)
BETWEEN '2025-01-19' AND '2025-01-21'

In this case using OR is better than using COALESCE, so the index can be used.

SELECT * FROM records
WHERE (created_at BETWEEN '2025-01-19' AND '2025-01-21')
OR (created_at IS NULL AND updated_at BETWEEN '2025-01-19' AND '2025-01-21')

Indexes and the like Operator

The B-Trees indexes don’t support searching with the LIKE operator.

SELECT * FROM records
WHERE column LIKE 'A%'

a way to solve this is to use AND

SELECT * FROM account
WHERE column >= 'Aa' AND column <= 'Az'

a better solution is to use pattern search indexes

CREATE INDEX records_column_pattern ON records (column text_pattern_ops);

rules about character ordering, formatting, and similar things that vary by language and country. And that is determined by the LOCALE of the database server.

Using Multiple Indexes

If the query needs to use multiple indexes, an in-memory bitmap is created to combine the results of the indexes. And allows the optimizer to use multiple indexes in a query.

Compound Indexes

Compound index is the index created on multiple columns. In general, an index on (X,Y,Z) will be used for searches on X, (X,Y), and (X,Y,Z) and even (X,Z) but not on Y alone and not on YZ.

Using Indexes for Data Retrieval

when the index has all the data required by the query, a index-only scan is used.

CREATE INDEX idx_name
ON records (column1, column2, column3);

SELECT column1, column2, column3
FROM records
WHERE column1 = 1 AND column2 = 2;

Covering Indexes

Covering index is the index that contains all the columns required by the query.

CREATE INDEX idx_name
ON records (column1, column2)
INCLUDE (column3);

SELECT column1, column2, column3
FROM records
WHERE column1 = 1 AND column2 = 2;

the INCLUDE clause is used to add columns to the index that are not part of the index key but are required by the query. Difference between covering index and index-only scan is that the covering index contains all the columns required by the query, while the index-only scan contains all the columns required by the query and the index. In the index-only altho we are not filtering on column3, it’s still included in the index. As more columns are required by the query, covering indexes become more compact than index-only scans.

Partial Indexes

Partial index is the index that contains only a subset of the rows in the table.

CREATE INDEX idx_name
ON records (column1)
WHERE column2 = 1;

SELECT *
FROM records
WHERE column1 = 1;

Indexes and Join Order

Mainly the optimizer will join first the tables with smaller number of rows, but if there is an index on a column that is used in the join condition, the optimizer will join the table that has the index first. If two indxes on the tables have the same selectivity, the optimizer will join the smaller first.

When Are Indexes Not Used

Avoiding Index Usage

we avoid using indexes if the table is small and fits into main memory, or if the table is large and the query returns a large number of rows.

Why Does PostgreSQL Ignore My Index?

PostgreSQL ignore indexes when the planner estimates that the cost of using the index is higher than the cost of a sequential scan.

To Build or Not to Build

Before when the hardware was expensive, there was advice that don’t create indexes unless you need them, because they take up space and slow down the write operations. But now with cheap hardware and fast disks, it’s not a concern anymore.

The recommendation in OLTP systems, is to always have the indexes in place specially partial indexes, covering indexes where it makes sense. redundant indexes should be removed. To find the redundant indexes, we can use the pg_stat_all_indexes to see which indexes are not used.

Indexes and Short Query Scalability

As the tables grows, the query performance will degrade, because the indexes will not be able to fit into memory anymore. So what worked well in the past, may not work well in the future, so we should keep monitoring the performance of the queries. And address the performance issues as they arise.

Chapter 6: Long Queries and Full Scans

Which Queries Are Considered Long?

A query is considered long when query selectivity is high for at least one large table

Long Queries and Full Scans

In long queries we favor full scans over indexes, because in this case the indexes will require more I/O. So ideally we should not use any index on the table that have a large number of rows.

Long Queries and Hash Joins

nested_loops_cost(nl,R,S)=size(R) * size(S)+size(R)*size(S)/size(JA)
hash_join_cost(hash,R,S)=size(R)+size(S)+size(R)*size(S)/size(JA)

Hash join is our best bet here, if the hash join is possible so it will be picked by optimizer. Hash join works best when first argument is in memory, so we can tune the memory in case of long queries.

Sometimes, merge sort is used if the table have unique pre-sorted values.

Long Queries and the Order of Joins

In short queries we should execute the joins that have best indexes first, but in long queries we should execute the joins that have the smallest number of rows first.

What Is a Semi-join?

A semi-join between two tables R and S returns rows from table R for which there is at least one row from table S with matching values in the joining columns. We can see this semi-join in the optimizer plan.

Semi-joins and Join Order

Semi-joins never increase the site of the set, so we need to make sure they are executed first.

What Is an Anti-join?

An anti-join between two tables R and S returns rows from table R for which there are no rows from table S with a matching value in the joining column. only the NOT EXISTS version guarantees the anti-join in the execution plan

Semi- and Anti-joins Using the JOIN Operator

An anti-join cannot create duplicates, which means that an OUTER JOIN with subsequent filtering of NULL values can be used.

SELECT t.x
FROM table t
LEFT OUTER JOIN table2 t2 USING (x)
WHERE t2.x IS NULL

When Is It Necessary to Specify Join Order?

Mostly long queries are used in OLAP systems, and in OLAP systems the tables that we join can be many, the optimizer will try to optimize as long the number of joins is less than join_collapse_limit which is 8 by default, if the number of joins is more than that, the optimizer will not try to optimize the joins. there is no limit to this value, but the higher the value the more time the optimizer will take to optimize the query. it takes N! time to optimize N joins.

Grouping: Group First, Select Last

Group by should be executed first to make the intermediate result set smaller, then the select should be executed.

Using SET Operations

Often, we can Use EXCEPT instead of NOT EXISTS and NOT IN. Use INTERSECT instead of EXISTS and IN. Use UNION instead of complex selection criteria with OR.

Avoiding Multiple Scans

When retrieving multiple attributes from an entity-attribute-value table, join to the table only once and use FILTER in the aggregate function MAX() in SELECT list to return the appropriate values in each column.

SELECT
   x.field_1,
   x.field_2,
   coalesce(max(y.field_3)
        FILTER (WHERE y.field_4 ='value_1'),'')
                                            AS result_1,
   coalesce(max(y.field_3)
        FILTER (WHERE y.field_4 ='value_2'),'')
                                           AS result_2,
   coalesce(max(y.field_3)
        FILTER (WHERE y.field_4 ='value_3'),'')
                                            AS result_3
FROM table_1 x
JOIN table_2 y USING (id)
WHERE y.id < 7000000
      AND x.id < 7000000
GROUP BY 1,2;

Chapter 7: Long Queries: Additional Techniques

Temporary Tables and CTEs

Temporary Tables

to create temporary table we add TEMP to the normal table creation.

CREATE TEMP TABLE temp_table

these temporary tables are only available for the current session, once session closed they will be destroyed. Temporary tables are not good because: 1- Indexes: on original tables will not be reused here. 2- Statistics: on table are not available until you do ANALYZE on table 3- Disk Space: they are stored on disk so for OLAP overusing them will take space 4- Excessive I/O: because they are written on disk, so read/writing will take time.

Common Table Expressions (CTEs)

CTEs are defined with WITH cte_name AS (query) main query

WITH cte_query AS(
  SELECT ...
)
SELECT x
FROM cte_query
WHERE x = 10

before PostgreSQL 12 this was the same as temp table but after that it got changed to be an optimization fence it will execute the query once and cache it, also it will not be counted in join_collapse_limit So we can use as many of them as we need. When used we can see in plan word MATERIALIZED which means it’s cached.

Views: To Use or Not to Use

Views are not physical tables, what is stored in database is the definition of the view. So each time the parser sees the view it will expand it to the query. This can cause some performance issues, because some outer filters can be pushed down to the view. So it’s hard to optimize the view.

A good usage of views if we need to give access to a set of the data to a group of users, so we can create a view for them. Like that they don’t have direct access to the tables used in the view. But they don’t have any performance improvements.

Materialized Views

Materialized views store the result of the query once it’s called, it doesn’t reflect the up-to-date view of the data. It has to manually refreshed. They behave after creation as a normal table that can’t be updated nor deleted from, we can index them and optimizer will treat them as tables. to refresh the materialized view we can use REFRESH MATERIALIZED VIEW view_name during the refresh we can’t access the view. If we need to do so we can use CONCURRENTLY REFRESH MATERIALIZED VIEW view_name but it will take more time.

Should I Create a Materialized View?

it depends, if the query is complex and the data is not changing frequently, then it’s a good idea to create a materialized view.

Do Materialized Views Need to Be Optimized?

yes just like normal table, we need to optimize the materialized view using same techniques.

Dependencies

if the table that the materialized view is based on is changed, the materialized view will be invalidated and we need to refresh it.

Partitioning

When we partition a table, we split it into smaller tables, each table is called a partition. The parent table is virtual, but all the partitions are physical tables.

Does Partitioning Improve Performance?

it depends, if the query is on the partition key, then it will improve the performance.

Why Create a Partitioned Table?

The DROP command is executed significantly faster than bulk DELETE and does not require subsequent vacuuming. A typical use case is a table partitioned on date ranges (e.g., partition per month), a new partition is added, and the oldest one is dropped at the end of every month. We can also deattach the partition and attach it to another table. For example we can move partition to archive table.

Parallelism

PostgreSQL 10+ introduced parallel query execution capability, configured via max_parallel_workers and max_parallel_workers_per_gather parameters. While parallelism generated excitement as a potential performance silver bullet, expectations should be tempered.

The concept splits query workload across multiple processors/cores. However, some query portions must run sequentially, and parallelization adds synchronization overhead. This makes it most beneficial for processing large amounts of data, particularly:

Massive table scans
Hash joins
Long, complex queries

Short queries typically see minimal benefit from parallelization. While parallel execution of different queries may improve overall throughput, this differs from parallelizing individual queries.

One gotcha is that the optimizer may sometimes replace an index-based access with parallel table scan due to cost estimation issues. In these cases, parallel execution could actually be slower than sequential processing.