Zero-downtime schema migrations in Postgres using Reshape
About six months back, I wrote a post on how to use views in Postgres to encapsulate tables and create a versioned interface to a database. At the end of the post, I mentioned working on a tool to automate zero-downtime migrations using this technique and I’ve finally released something, albeit experimental. It’s open-source and called Reshape.
Like many other migration tools, Reshape is a CLI tool which works with files that define changes to the database schema. The big difference is that Reshape makes non-destructive changes, ensuring that clients that still expect the old schema continue to work. This means that the database can be accessed in parallel using the old and the new schema, and application and database changes can be deployed gradually without any downtime.
That’s just about enough talk, let’s see an example!
Creating a table
Say we need a table called
users with two columns:
id (auto-generated) and
name which contains the full name of the user. With Reshape, we can create that table with a migration like this one:
To apply it, we run
reshape migrate. Reshape will create the table but it will also create a Postgres schema for the migration. Inside of this schema, it will create a view which maps directly to the table. Our application will not be interacting with the actual table, instead it will perform all queries and updates against the view!
This also means that the application needs to tell the database which schema it wants to use, which can be done by setting the
search_path. We only need to do this once when the application connects to the database, and Reshape can generate the query for us using
reshape generate-schema-query. Thanks to this, our application can make queries just as if the schema didn’t exist.
Renaming a column
Our application is now running nicely, creating, reading and updating users, but then we realize something.
name is a bad name for our column, it really should be called
full_name. How are we supposed to change the name of the column without stopping our application and causing some downtime? That’s where Reshape helps.
We’ll create a new migration to change the name:
and then we run
reshape migrate again. What happens now is that Reshape doesn’t actually change the name of the column, instead it creates a new schema and new views for the migration and changes the name inside the view. This means that the currently deployed application will continue to work and we can deploy our new application in parallel.
Once we are done deploying and have stopped the old application instances, we finish up the migration by running
reshape complete. This will perform the actual rename of the column and remove any old schemas. If the deployment were to fail, we can run
reshape abort which will non-destructively reset any changes.
Replacing one column with two
Changing the name of a column is a very basic change but what if we need to make more complex changes, like altering columms or adding new ones? Reshape automates that as well.
To illustrate this, let’s say we no longer want to store the full name but instead we want to store it separately as
last_name (which is a bad idea). Reshape makes this as simple as a single migration which creates two new columns and removes the old one. The trick is that we specify how to transform between the new and old schema using
If we run
reshape migrate now, Reshape will once again not actually remove the
full_name column. Instead, it will automatically set up triggers that transform between
last_name columns and
full_name depending on if the data is changed using the old or new schema. For any existing rows, the new columns will be automatically populated without locking up the database. Locking up the database for too long could also cause downtime, so Reshape makes sure that doesn’t happen.
Once again, we can deploy our new application which expects
last_name in parallel with the old version. Once the deployment is finished, we run
reshape complete which will delete
full_name for real and remove the triggers.
Reshape is still experimental and a proof-of-concept but there are some potential features I’m excited about. One is migrations involving changes to multiple tables. For example, splitting up an array column into a separate table. Thanks to the amazing support for transactions, triggers and views in Postgres, there are a lot of possibilities.