by Alicia Cozine, Anna Headley, Francis Kayiwa, and Trey Pendragon
In our previous post, we described how we decided to migrate a very large database using logical replication. In this post we will detail the steps we followed to make it happen.
Before the migration really began, we set up logical replication. This recreated our very large database in an upgraded version of PostgreSQL.
To set up logical replication, we started with the AWS Database Migration Guide, then expanded and adapted the steps to our local environment.
Here’s how we did it:
First set up the publisher.
host all all <IP_of_subscriber/32> md5
/etc/postgresql/<version>/main/postgresql.conf
to make the publisher keep detailed “write ahead logs” (WALs) to support logical replication:
wal_level = 'logical'
max_replication_slots = 10
max_wal_senders = 10
sudo systemctl restart postgresql
CREATE PUBLICATION <project_name>_publication FOR ALL TABLES;
Next, set up the subscriber (for us it was our new PostgreSQL 15 machine).
pg_dump --schema-only -d <database_name> -h <publisher_IP> -U <postgres_admin_user> -f /tmp/<database_name>-schema.sql --no-owner'
CREATE USER <database_user_name> WITH PASSWORD '<password>';
CREATE DATABASE <database_name> OWNER <database_user_name>;
psql -d <database_name> -U <postgres_admin_user> -f /tmp/<database_name>-schema.sql
CREATE SUBSCRIPTION <project_name>_subscription
CONNECTION 'host=<publisher_IP> port=5432 dbname=<database_name> user=<postgres_admin_user> password=<postgres_admin_password>' PUBLICATION <project_name>_publication WITH copy_data=true;
Once everything is set up, wait for full replication to happen. You can validate that replication is complete by generating row counts using psql in both databases and comparing those numbers:
WITH tbl AS
(SELECT table_schema,
TABLE_NAME
FROM information_schema.tables
WHERE TABLE_NAME not like 'pg_%'
AND table_schema in ('public'))
SELECT table_schema,
TABLE_NAME,
(xpath('/row/c/text()', query_to_xml(format('select count(*) as c from %I.%I', table_schema, TABLE_NAME), FALSE, TRUE, '')))[1]::text::int AS rows_n
FROM tbl
ORDER BY rows_n DESC;
Once replication was complete, we wanted to be sure our upgraded database had resilience built in, so we set up a warm standby in addition to our usual backup process.
You will want to modify the process below to suit your local environment. Make each step specific so you can copy and paste.
This is how we did it:
ssh user@old-database
ssh user@new-database
<database_name>
on both servers: sudo -u postgres psql -d <database_name>
WITH tbl AS
(SELECT table_schema,
TABLE_NAME
FROM information_schema.tables
WHERE TABLE_NAME not like 'pg_%'
AND table_schema in ('public'))
SELECT table_schema,
TABLE_NAME,
(xpath('/row/c/text()', query_to_xml(format('select count(*) as c from %I.%I', table_schema, TABLE_NAME), FALSE, TRUE, '')))[1]::text::int AS rows_n
FROM tbl
ORDER BY rows_n DESC;
@here We're now starting scheduled maintenance of Figgy, it will be going into read only mode for up to 3 hours. We'll send another message here when it's ready for regular use again.
SELECT
'SELECT SETVAL(' ||
quote_literal(quote_ident(sequence_namespace.nspname) || '.' || quote_ident(class_sequence.relname)) ||
', COALESCE(MAX(' ||quote_ident(pg_attribute.attname)|| '), 1) ) FROM ' ||
quote_ident(table_namespace.nspname)|| '.'||quote_ident(class_table.relname)|| ';'
FROM pg_depend
INNER JOIN pg_class AS class_sequence
ON class_sequence.oid = pg_depend.objid
AND class_sequence.relkind = 'S'
INNER JOIN pg_class AS class_table
ON class_table.oid = pg_depend.refobjid
INNER JOIN pg_attribute
ON pg_attribute.attrelid = class_table.oid
AND pg_depend.refobjsubid = pg_attribute.attnum
INNER JOIN pg_namespace as table_namespace
ON table_namespace.oid = class_table.relnamespace
INNER JOIN pg_namespace AS sequence_namespace
ON sequence_namespace.oid = class_sequence.relnamespace
ORDER BY sequence_namespace.nspname, class_sequence.relname;
This will output a new query which you should copy and paste and run.
DROP SUBSCRIPTION <project_name>_subscription;
)We've finished maintenance on Figgy, feel free to do whatever work you need to with the load balancer.
The migration itself took one hour and 13 minutes - much better than our original estimate of two days. Some of this time was spent looking for connections in the old database and reassuring ourselves that the application was really using the new database. All our earlier work paid off hugely.
We got longer-term benefits from the work we put in as well. Now we can spin up a new database cluster with confidence when we need one. We can use Ansible to run SQL commands. And we can tune the PostgreSQL cluster for our Figgy application, optimizing memory usage for our largest database.
In our next post we discuss how to set up a warm standby with streaming replication.