Introduction
Hello, this is a summary of the book PostgreSQL Query Optimization Second Edition after reading mysql now that I am professionally working with postgresql I want to learn more about optimizing queries in postgresql.
These types of summaries are not really for audiences but for me to remember what I read and to be able to reference them later. AI tools these days are so good at this but I like to do this type of summaries for my own learning. Actually each chapter in the book have a summary so :)
Also this is not replacement of the book, I may skip stuff that I already know or I don’t find interesting.
Chapter 1: Why Optimize?
In this chapter the authors claims that we should optimize the queries the first time we write them, and that we should not wait until the database is slow to optimize the queries. How much performance is good performance? the authors claim that the answer is that it depends on the application and the users. they recommend a framework to optimize queries which is: SMART
- Specific
- Measurable
- Achievable
- Relevant
- Time-bound
it’s good to keep in mind that performance from user perspective is the end-to-end time from the user request to the response. Not just the time it takes to execute the query. It’s good to keep an eye on performance even if we optimized a query today, because the data changes and the query might not be optimal anymore. So we should keep monitoring the performance of the queries with alerts and automated monitoring tools. All optimizations related to other databases don’t apply to postgresql, because postgresql do somethings differently than other databases. For example postgresql doesn’t have hints to the optimizer, postgresql developers didn’t do that because their intention was to write a good optimizer that doesn’t need hints. And they did a good job at that, the postgresql optimizer is one of the best optimizers out there.
Chapter 2: Theory: Yes, We Need It!
This chapter cover some basics from relational theory ( same covered in my university courses, so will not cover it here )
In order to produce query results, the database system must first parse the query, compile and transfer, then optimize it, and finally execute it.
The database engine parse SQL queries into a logical plan transform the results then chose algorithms to implement the logical plan and finally execute the plan. These operations are derived from relational theory.
Chapter 3: Even More Theory: Algorithms
Algorithm Cost Models
- Database systems use cost models to estimate query execution time
- Costs are typically measured in:
- I/O operations (disk reads/writes)
- CPU processing time
- Memory usage
- Network communication
Data Access Algorithms
Storage Structures
- Data is stored in pages/blocks on disk
- Tables are collections of pages
- Buffer cache is used to keep frequently accessed pages in memory
Full Scan
- Reads all pages of a table sequentially
- Best when:
- Large portion of table needed
- No useful indexes available
- Table is small
- Worst when:
- Only few rows needed
- Useful indexes exist
Index-Based Table Access
- Uses index to locate specific rows
- Two-step process:
- Search index to find row locations
- Fetch actual rows from table
- Effective when:
- Small portion of table needed
- Index matches query conditions
- Index is selective
Index-Only Scan
- Gets all needed data directly from index
- No table access required
- Most efficient when:
- All required columns are in index
- Index is well-maintained (minimal bloat)
Index Structures
B-Tree Indexes
- Most common index type
- Balanced tree structure
- Efficient for:
- Equality searches
- Range queries
- Sorting
- Properties:
- Self-balancing
- Logarithmic search time
- Good for both reads and writes
Other Index Types
- Hash indexes
- GiST (Generalized Search Tree)
- GIN (Generalized Inverted Index)
- BRIN (Block Range Index)
Combining Relations (Joins)
Nested Loops
- Simple but can be inefficient
- Three variations:
- Simple nested loops
- Index nested loops
- Block nested loops
- Best for:
- Small tables
- When inner table has good indexes
Hash-Based Algorithms
- Builds hash table of smaller relation
- Probes hash table with larger relation
- Efficient for:
- Large datasets
- Equality joins
- Memory intensive
Sort-Merge Algorithm
- Sort both relations
- Merge sorted results
- Efficient for:
- Already sorted data
- Range conditions
- Large datasets that don’t fit in memory
Performance Comparisons
- Algorithm choice depends on:
- Data size
- Available indexes
- Query conditions
- Available resources
- Data distribution
Chapter 4: Understanding Execution Plans
How an Optimizer Builds an Execution Plan
To build a plan, the optimizer uses transformation rules, heuristics, and cost-based optimization algorithms.
Reading Execution Plans
To obtain the execution plan for a query we prepend EXPLAIN to the query.
EXPLAIN
SELECT f.flight_no, f.actual_departure, count(passenger_id) passengers
from flight f
JOIN booking_leg bl ON bl.flight_id = f.flight_id
JOIN passenger p ON p.booking_id=bl.booking_id
WHERE f.departure_airport = 'JFK' AND f.arrival_airport = 'ORD' AND f.actual_departure between '2023-08-10' and '2023-08-13'
GROUP BY f.flight_id, f.actual_departure
Finalize GroupAggregate (cost=594251.47..594253.03 rows=1 width=24)
Group Key: f.flight_id
-> Gather Merge (cost=594251.47..594253.01 rows=2 width=24)
Workers Planned: 2
-> Partial GroupAggregate (cost=593251.45..593252.76 rows=1 width=24)
Group Key: f.flight_id
-> Sort (cost=593251.45..593251.88 rows=173 width=20)
Sort Key: f.flight_id
-> Parallel Hash Join (cost=247265.49..593245.02 rows=173 width=20)
Hash Cond: (p.booking_id = bl.booking_id)
-> Parallel Seq Scan on passenger p (cost=0.00..312940.97 rows=8810097 width=8)
-> Parallel Hash (cost=247265.35..247265.35 rows=11 width=20)
-> Parallel Hash Join (cost=14301.16..247265.35 rows=11 width=20)
Hash Cond: (bl.flight_id = f.flight_id)
-> Parallel Seq Scan on booking_leg bl (cost=0.00..212763.30 rows=7695530 width=8)
-> Parallel Hash (cost=14301.15..14301.15 rows=1 width=16)
-> Parallel Seq Scan on flight f (cost=0.00..14301.15 rows=1 width=16)
Filter: ((actual_departure >= '2023-08-10 00:00:00+03'::timestamp with time zone) AND (actual_departure <= '2023-08-13 00:00:00+03'::timestamp with time zone) AND (departure_airport = 'JFK'::bpchar) AND (arrival_airport = 'ORD'::bpchar))
JIT:
Functions: 30
Options: Inlining true, Optimization true, Expressions true, Deforming true
a plan contains the expected number of output rows and the expected average width of output row, which are calculated from database statistics, and estimates of costs, which are calculated based on the previously obtained estimates and configuration parameters. The values of costs include the accumulated cost of all pervious operations. There are two cost estimations for each operation: the first shows the cost needed to produce the first row of output, while the second estimates the cost of the complete result. The execution of a plan starts from the leaves and ends at the root. This means that the operation that is executed first will be on the line that has the rightmost offset.
What Is Going On During Optimization?
- Determining the possible orders of operations
- Determining the possible execution algorithms for each operation
- Comparing the costs of different plans
- Selecting the optimal execution plan
Why Are There So Many Execution Plans to Choose From?
possible execution plans stems from the multiple combinations of:
- Table join ordering
- Join algorithms (nested loops, hash joins, etc.)
- Data access methods (indexes, full scans)
With just 3 tables, considering 3 join algorithms and 12 possible join orders, there are already 108 possible plans. Including data retrieval methods increases this to thousands.
PostgreSQL optimizes this by:
- Using the optimality principle - sub-plans of optimal plans must also be optimal
- Building optimal plans incrementally from smallest sub-plans
- Setting a geqo_threshold parameter to limit exhaustive search
- Using heuristics for queries with many joins
While heuristics help performance by reducing the search space, they may occasionally miss the absolute optimal plan. However, PostgreSQL will still find the best plan among those considered.
How Execution Costs Are Calculated
Execution plan costs depend on:
- Algorithm cost formulas
- Statistical data on tables/indexes and value distributions
- System settings and parameters
The optimizer uses these factors to calculate costs:
- Cost formulas from different algorithms (covered in Chapter 3)
- Table/result set sizes
- Configurable system parameters
Key points about execution costs:
- No single “best plan” exists for all similar queries
- Slight differences in statistics can change plan selection
- Statistics change based on data modifications
- Plans may differ for nearly identical queries based on:
- Value distributions
- Table statistics
- Current system state
For example, if we have many entries with country US and only 1 in CZ, a query filtering on different values (‘US’ vs ‘CZ’) can result in completely different execution plans, because:
- Queries returning many rows may favor full table scans
- Queries returning few rows may prefer index access
Regular statistics updates via ANALYZE are critical for optimal plan selection, as outdated statistics can lead to suboptimal execution plans.
The Optimizer’s Limitations
Despite the PostgreSQL optimizer’s sophistication, it can produce suboptimal execution plans for several reasons:
- Cost Estimation Limitations
- Mathematical models assume uniform data distribution
- Real data rarely follows uniform distribution
- Even advanced formulas are approximations
- Statistics Limitations
- Histograms help with stored data selectivity
- Cannot use histograms for intermediate results
- Intermediate result estimation errors are common
- Algorithmic Limitations
- Complex queries may exceed exact optimization capability
- Heuristics may exclude optimal plans
- Approximate algorithms used for complex cases
These limitations mean human intervention is sometimes needed, as humans can:
- Observe actual system behavior
- Apply contextual knowledge
- Guide the optimizer with additional information
The optimizer works well in most cases, but understanding its limitations helps identify when and how to assist it in finding better execution plans.
Chapter 5: Short Queries and Indexes
What Makes a Query “Short”?
short query is not the query with the least number of lines, but the query that runs over a table and returns a small number of rows. So most OLTP queries are short queries. On the other hand, long queries are the queries that run over a table and return a large number of rows. These queries are usually used in OLAP systems.
Choosing Selection Criteria
Hence, short queries return a small number of rows from a bigger set, indexes are very important for short queries. The index is used to find the rows that match the selection criteria, and then the rows are fetched from the table.
Index Selectivity
The lower of the selectivity of the index, the better the index is. Let’s say we have a table with 1000 rows and we have an index on a column that has 100 unique values. The selectivity of the index is 100/1000 = 0.1 If the selectivity of the index is high, a sequential scan is better than using the index.
Unique Indexes and Constraints
An index is unique if for each indexed value there is exactly one matching row in the table. These have the best selectivity. Unique constraints in PostgreSQL allow for NULL values.
Indexes and Non-equal Conditions
B-tree indexes they can support searches by equality, greater than, less than, and between conditions: all the searches that require comparison and ordering.
Indexes and Column Transformations
A column transformation occurs when the search criteria are on some modifications of the values in a column. For example, lower(last_name) (converting the last_name value to lowercase) and update_ts::date (casting timestamp with time zone to date) are column transformations. if there is an index on last name
CREATE INDEX account_last_name ON account (last_name);
the following search won’t be able to take advantage of the index:
SELECT * FROM account
WHERE lower(last_name)='ayn';
a way to solve this is cover all cases
SELECT * FROM account
WHERE last_name='ayn'
OR last_name='Ayn'
OR last_name ='AYN'
or a better solution is to create a functional index
CREATE INDEX account_lower_last_name ON account (lower(last_name));
another example on time
SELECT *
FROM records
WHERE created_at::date
BETWEEN '2023-08-17' AND '2023-08-18'
this casting from timestamp to date will not be able to take advantage of the index on scheduled_departure
another transformation is using COALESCE
function, which is used to return the first non-NULL value in a list of values.
SELECT *
FROM records
WHERE coalesce(created_at, updated_at)
BETWEEN '2025-01-19' AND '2025-01-21'
In this case using OR is better than using COALESCE, so the index can be used.
SELECT * FROM records
WHERE (created_at BETWEEN '2025-01-19' AND '2025-01-21')
OR (created_at IS NULL AND updated_at BETWEEN '2025-01-19' AND '2025-01-21')
Indexes and the like Operator
The B-Trees indexes don’t support searching with the LIKE
operator.
SELECT * FROM records
WHERE column LIKE 'A%'
a way to solve this is to use AND
SELECT * FROM account
WHERE column >= 'Aa' AND column <= 'Az'
a better solution is to use pattern search indexes
CREATE INDEX records_column_pattern ON records (column text_pattern_ops);
rules about character ordering, formatting, and similar things that vary by language and country. And that is determined by the LOCALE
of the database server.
Using Multiple Indexes
If the query needs to use multiple indexes, an in-memory bitmap is created to combine the results of the indexes. And allows the optimizer to use multiple indexes in a query.
Compound Indexes
Compound index is the index created on multiple columns. In general, an index on (X,Y,Z) will be used for searches on X, (X,Y), and (X,Y,Z) and even (X,Z) but not on Y alone and not on YZ.
Using Indexes for Data Retrieval
when the index has all the data required by the query, a index-only scan
is used.
CREATE INDEX idx_name
ON records (column1, column2, column3);
SELECT column1, column2, column3
FROM records
WHERE column1 = 1 AND column2 = 2;
Covering Indexes
Covering index is the index that contains all the columns required by the query.
CREATE INDEX idx_name
ON records (column1, column2)
INCLUDE (column3);
SELECT column1, column2, column3
FROM records
WHERE column1 = 1 AND column2 = 2;
the INCLUDE
clause is used to add columns to the index that are not part of the index key but are required by the query.
Difference between covering index and index-only scan is that the covering index contains all the columns required by the query, while the index-only scan contains all the columns required by the query and the index.
In the index-only altho we are not filtering on column3, it’s still included in the index.
As more columns are required by the query, covering indexes become more compact than index-only scans.
Partial Indexes
Partial index is the index that contains only a subset of the rows in the table.
CREATE INDEX idx_name
ON records (column1)
WHERE column2 = 1;
SELECT *
FROM records
WHERE column1 = 1;
Indexes and Join Order
Mainly the optimizer will join first the tables with smaller number of rows, but if there is an index on a column that is used in the join condition, the optimizer will join the table that has the index first. If two indxes on the tables have the same selectivity, the optimizer will join the smaller first.
When Are Indexes Not Used
Avoiding Index Usage
we avoid using indexes if the table is small and fits into main memory, or if the table is large and the query returns a large number of rows.
Why Does PostgreSQL Ignore My Index?
PostgreSQL ignore indexes when the planner estimates that the cost of using the index is higher than the cost of a sequential scan.
To Build or Not to Build
Before when the hardware was expensive, there was advice that don’t create indexes unless you need them, because they take up space and slow down the write operations. But now with cheap hardware and fast disks, it’s not a concern anymore.
The recommendation in OLTP systems, is to always have the indexes in place specially partial indexes, covering indexes where it makes sense.
redundant indexes should be removed. To find the redundant indexes, we can use the pg_stat_all_indexes
to see which indexes are not used.
Indexes and Short Query Scalability
As the tables grows, the query performance will degrade, because the indexes will not be able to fit into memory anymore. So what worked well in the past, may not work well in the future, so we should keep monitoring the performance of the queries. And address the performance issues as they arise.
Chapter 6: Long Queries and Full Scans
Which Queries Are Considered Long?
A query is considered long when query selectivity is high for at least one large table
Long Queries and Full Scans
In long queries we favor full scans over indexes, because in this case the indexes will require more I/O. So ideally we should not use any index on the table that have a large number of rows.
Long Queries and Hash Joins
nested_loops_cost(nl,R,S)=size(R) * size(S)+size(R)*size(S)/size(JA)
hash_join_cost(hash,R,S)=size(R)+size(S)+size(R)*size(S)/size(JA)
Hash join is our best bet here, if the hash join is possible so it will be picked by optimizer. Hash join works best when first argument is in memory, so we can tune the memory in case of long queries.
Sometimes, merge sort is used if the table have unique pre-sorted values.
Long Queries and the Order of Joins
In short queries we should execute the joins that have best indexes first, but in long queries we should execute the joins that have the smallest number of rows first.
What Is a Semi-join?
A semi-join between two tables R and S returns rows from table R for which there is at least one row from table S with matching values in the joining columns. We can see this semi-join in the optimizer plan.
Semi-joins and Join Order
Semi-joins never increase the site of the set, so we need to make sure they are executed first.
What Is an Anti-join?
An anti-join between two tables R and S returns rows from table R for which there are no rows from table S with a matching value in the joining column. only the NOT EXISTS version guarantees the anti-join in the execution plan
Semi- and Anti-joins Using the JOIN Operator
An anti-join cannot create duplicates, which means that an OUTER JOIN with subsequent filtering of NULL values can be used.
SELECT t.x
FROM table t
LEFT OUTER JOIN table2 t2 USING (x)
WHERE t2.x IS NULL
When Is It Necessary to Specify Join Order?
Mostly long queries are used in OLAP systems, and in OLAP systems the tables that we join can be many, the optimizer will try to optimize as long the number of joins is less than join_collapse_limit
which is 8 by default, if the number of joins is more than that, the optimizer will not try to optimize the joins.
there is no limit to this value, but the higher the value the more time the optimizer will take to optimize the query. it takes N! time to optimize N joins.
Grouping: Group First, Select Last
Group by should be executed first to make the intermediate result set smaller, then the select should be executed.
Using SET Operations
Often, we can Use EXCEPT instead of NOT EXISTS and NOT IN. Use INTERSECT instead of EXISTS and IN. Use UNION instead of complex selection criteria with OR.
Avoiding Multiple Scans
When retrieving multiple attributes from an entity-attribute-value table, join to the table only once and use FILTER in the aggregate function MAX() in SELECT list to return the appropriate values in each column.
SELECT
x.field_1,
x.field_2,
coalesce(max(y.field_3)
FILTER (WHERE y.field_4 ='value_1'),'')
AS result_1,
coalesce(max(y.field_3)
FILTER (WHERE y.field_4 ='value_2'),'')
AS result_2,
coalesce(max(y.field_3)
FILTER (WHERE y.field_4 ='value_3'),'')
AS result_3
FROM table_1 x
JOIN table_2 y USING (id)
WHERE y.id < 7000000
AND x.id < 7000000
GROUP BY 1,2;
Chapter 7: Long Queries: Additional Techniques
Temporary Tables and CTEs
Temporary Tables
to create temporary table we add TEMP
to the normal table creation.
CREATE TEMP TABLE temp_table
these temporary tables are only available for the current session, once session closed they will be destroyed.
Temporary tables are not good because:
1- Indexes: on original tables will not be reused here.
2- Statistics: on table are not available until you do ANALYZE
on table
3- Disk Space: they are stored on disk so for OLAP overusing them will take space
4- Excessive I/O: because they are written on disk, so read/writing will take time.
Common Table Expressions (CTEs)
CTEs are defined with WITH cte_name AS (query) main query
WITH cte_query AS(
SELECT ...
)
SELECT x
FROM cte_query
WHERE x = 10
before PostgreSQL 12 this was the same as temp table but after that it got changed to be an optimization fence
it will execute the query once and cache it, also it will not be counted in join_collapse_limit
So we can use as many of them as we need.
When used we can see in plan word MATERIALIZED
which means it’s cached.
Views: To Use or Not to Use
Views are not physical tables, what is stored in database is the definition of the view. So each time the parser sees the view it will expand it to the query. This can cause some performance issues, because some outer filters can be pushed down to the view. So it’s hard to optimize the view.
A good usage of views if we need to give access to a set of the data to a group of users, so we can create a view for them. Like that they don’t have direct access to the tables used in the view. But they don’t have any performance improvements.
Materialized Views
Materialized views store the result of the query once it’s called, it doesn’t reflect the up-to-date view of the data. It has to manually refreshed.
They behave after creation as a normal table that can’t be updated nor deleted from, we can index them and optimizer will treat them as tables.
to refresh the materialized view we can use REFRESH MATERIALIZED VIEW view_name
during the refresh we can’t access the view. If we need to do so we can use CONCURRENTLY REFRESH MATERIALIZED VIEW view_name
but it will take more time.
Should I Create a Materialized View?
it depends, if the query is complex and the data is not changing frequently, then it’s a good idea to create a materialized view.
Do Materialized Views Need to Be Optimized?
yes just like normal table, we need to optimize the materialized view using same techniques.
Dependencies
if the table that the materialized view is based on is changed, the materialized view will be invalidated and we need to refresh it.
Partitioning
When we partition a table, we split it into smaller tables, each table is called a partition. The parent table is virtual, but all the partitions are physical tables.
Does Partitioning Improve Performance?
it depends, if the query is on the partition key, then it will improve the performance.
Why Create a Partitioned Table?
The DROP command is executed significantly faster than bulk DELETE and does not require subsequent vacuuming. A typical use case is a table partitioned on date ranges (e.g., partition per month), a new partition is added, and the oldest one is dropped at the end of every month. We can also deattach the partition and attach it to another table. For example we can move partition to archive table.
Parallelism
PostgreSQL 10+ introduced parallel query execution capability, configured via max_parallel_workers
and max_parallel_workers_per_gather
parameters. While parallelism generated excitement as a potential performance silver bullet, expectations should be tempered.
The concept splits query workload across multiple processors/cores. However, some query portions must run sequentially, and parallelization adds synchronization overhead. This makes it most beneficial for processing large amounts of data, particularly:
- Massive table scans
- Hash joins
- Long, complex queries
Short queries typically see minimal benefit from parallelization. While parallel execution of different queries may improve overall throughput, this differs from parallelizing individual queries.
One gotcha is that the optimizer may sometimes replace an index-based access with parallel table scan due to cost estimation issues. In these cases, parallel execution could actually be slower than sequential processing.