Foad Moha

Foad Moha

Python Developer

© 2021

Changing a column datatype without loosing data using SQLAlchemy and Alembic

We want to create a migration to change the datatype of a column while keeping the data that already exists in the database. We can’t just change the column datatype in-place and (auto-)create a migration for that. That would corrupt the existing data in the database. Here is how to do the change the datatype while preserving the existing data.

Suppose we have a database table called products that has a column called serial_number. Previously this column was defined as a string. So a serial number in the database could look like this: "1234567". Now we would like this column to be of type integer instead.

We can start by updating our Product model and changing the serial_number column datatype from sa.String to sa.Integer.

from sqlalchemy import Column, Integer, String, Float

class Product(Base):
    __tablename__ = 'products'
    id = Column(Integer, primary_key=True, index=True)
    name = Column(String, nullable=True, unique=True)
    picture = Column(String, nullable=True)
    # serial_number = Column(String, nullable=True) <-- Before it was like this
    serial_number = Column(Integer, nullable=True)

“Wait a sec! But if we change the model how does SQLAlchemy read the existing data in the database?? Isn’t this the same as changing the datatype in-place?”

Exactly! That is why in our migration script we will create a shadow definition of the Product model. We will use this shadow model to talk to the db through our ORM. Since this new model acts as a replacement, we can fix our real Product model without worrying about its impact.

The magic happens in the migration script. Here is an overview of the steps we need to take in the upgrade() method:

  1. Create a shadow Product model class.
  2. Create a new column with the desired datatype. We could call it serial_number_. The name doesn’t matter as we will change it later.
  3. Copy and convert the data from the original column to the new column.
  4. Remove the original column.
  5. Rename the new column to the original column’s name.

The downgrade() method follows the same process but in reverse. Here is the migration script:

"""Change employee ID datatype from string to integer."""
import sqlalchemy as sa
from alembic import op, context
from sqlalchemy.ext.declarative import DeclarativeMeta, declarative_base
from sqlalchemy.orm import Session


Base: DeclarativeMeta = declarative_base()


# 1. create a shadow Product model
class Product(Base):
    __tablename__ = 'products'
    id = sa.Column(sa.Integer, primary_key=True, index=True)
    serial_number = sa.Column(sa.String, unique=True, nullable=True)
    serial_number_ = sa.Column(sa.Integer, unique=True, nullable=True)
    

def upgrade():
    # 2. create a new column called serial_number_
    with op.batch_alter_table('products', schema=None) as batch_op:
        batch_op.add_column(sa.Column('serial_number_', sa.Integer, nullable=True))

    # 3. copy and convert data from serial_number to serial_number_ column
    db = Session(bind=op.get_bind())
    for product in db.query(Product).all():
        product.serial_number_ = int(product.serial_number)
    db.commit()

    with op.batch_alter_table('products', schema=None) as batch_op:
        # 4. drop the serial_number column
        batch_op.drop_column('serial_number')
        # 5. rename serial_number_ column to serial_number and recreate the uniqueness constraint
        batch_op.alter_column('serial_number_', new_column_name='serial_number')


def downgrade():
    with op.batch_alter_table('products', schema=None) as batch_op:
        # rename serial_number to serial_number_
        batch_op.alter_column('serial_number', new_column_name='serial_number_')
        # create a new serial_number column
        batch_op.add_column(sa.Column('serial_number', sa.String, nullable=True))

    # copy and convert data from serial_number_ to serial_number column
    db = Session(bind=op.get_bind())
    for product in db.query(Product).all():
        product.serial_number = str(product.serial_number_)
    db.commit()

    # drop the serial_number_ column
    with op.batch_alter_table('products', schema=None) as batch_op:
        batch_op.drop_column('serial_number_')

We use Session(bind=op.get_bind()) to start a new session so that we can copy and convert the data.

Remember that any constraints you had on the original column (for example unique serial numbers, etc.) needs to be recreated for the new column. You can add those before step 3. We should also take care of the edge cases for example if serial number is not convertible to an integer. This was a very simple example to show you the steps involved so I tried to keep it simple. I hope you find it useful.