The strange case of ActiveRecord ConcurrentMigrationError

Andrea Longhi

30 Jan 2019 Development, Migration, Ruby On Rails, Solidus

Andrea Longhi

2 mins
The strange case of ActiveRecord ConcurrentMigrationError

A few weeks ago, after deploying some new code to production I received a notification about a strange new-to-me error:

ActiveRecord::ConcurrentMigrationError: Cannot run migrations because another migration process is currently running

The new deployed code consisted only of Solidus and Rails minor version upgrades, so at first I suspected some unanticipated bug/incompatibility. But the error had not happened either on CI or on the staging server, so I concluded that it was probably something more pertinent to the application environment.

Let's get back to the error message, which happens to be quite self-explanatory. I started to reason about what could have gone wrong: while staging and CI run on single Amazon instances, the production environment is based on multiple AWS t2.large instances, and the Rails deploy procedure is basically concurrent on each of them... that's just how Opsworks does it.

Image of AWS instances

So, during the deploy process one instance was still running migrations when also uhura tried to run them as well, eventually raising the error.

The error is generated by a Rails feature named Advisory locking added in 2015:

Attempting to run a migration while another one is in process will raise a ConcurrentMigrationError instead of attempting to run in parallel with undefined behavior. This could be rescued and the migration could exit cleanly instead. Perhaps as a configuration option? This error can be quite insidious, as the deploy will fail on that instance, which will retain the previous application code. This time we were quite lucky as the new code was still functionally the same as before, but in other cases this could have easily caused a few headaches.

Imagine having one instance of the application serving outdated pages, not showing new Solidus' features, or showing features that don't exist anymore. And this is not the worst case scenario... the instance could be throwing 500 server error to each visitor if something more serious was wrong.

So, in order to fix the error (or, better said, ignore it and live happily) we customized the migration rake as follows:

# lib/tasks/migrate_ignore_concurrent.rake
namespace :db do
	namespace :migrate do
		desc 'Run db:migrate but ignore ActiveRecord::ConcurrentMigrationError errors'
		task ignore_concurrent: :environment do
			begin
				Rake::Task['db:migrate'].invoke
			rescue ActiveRecord::ConcurrentMigrationError
				# Do nothing
			end
		end
	end
end

and run this instead of the usual one:

bundle exec rake db:migrate:ignore_concurrent

This solution is, as often happens in life, not without caveat.

Consider the rare case when the first instance that got the lock is running a long DB migration (I'm not judging here, but you should not have long running migrations) which eventually fails. All other instances that tried to acquire the lock would now be already running the new code which also probably requires the new DB... what a day! So, all in all, it's just a matter of tradeoffs, I chose mine :)

You may also like

Let’s redefine
eCommerce together.