jamiebarrow: 2015

Wednesday, December 2, 2015

Getting started with automating database changes using Flyway

Intro

For those of us that strive to get to a "One Click Deploy" - where we press a magical button that automatically deploys all of our changes - one of the stumbling blocks during deployment is if there is a relational database involved where schema and/or data changes still need to be run manually.

Sometimes this is unavoidable, but assuming it's not, how can we achieve a fully automated deployment when we still need to worry about the database? For the trolls out there, don't say "Just use NoSQL".

I spoke about the general idea behind database migration strategies at one of Entelect's dev days, and the concepts are still the same (see the end of this article for some references). Since then, there are two tools that I've become aware of: Liquibase and Flyway. This will go through using Flyway with SQL Server 2014.

Setup Flyway

Flyway is a Java based library, but also comes as a standalone command line tool. I think that means you don't need to install the Java Runtime Environment (JRE) to use it, but I already had it installed so haven't verified that yet.

Go to the Flyway command line page
Download flyway-commandline-3.2.1.zip
Unzip to C:\flyway-3.2.1

Since we're using SQL Server, we also need to make sure the right JDBC drivers are available for Flyway to use. Although the documentation does show support for using both the Microsoft and jTDS drivers, I found that the jTDS library doesn't support Windows Authentication mode for SQL Server 2014 (it might for older versions), and I had to download and set up the Microsoft driver in any case:

Go to the Microsoft JDBC driver download page
Download either sqljdbc_6.0.6629.101_enu.exe or the sqljdbc_6.0.6629.101_enu.tar.gz tarball (I chose the latter)
Unzip the tarball
Choose the right JAR depending on Java version and JDBC compliance, I went with sqljdbc42.jar, and copy this into C:\flyway-3.2.1\drivers
Since we need Windows Authentication, copy sqljdbc_auth.dll into a folder on the PATH

I copied it into C:\flyway-3.2.1 since I'll run the command line tool from that location

There are a few configuration options available, but for now we'll just need to setup our connection string and driver in C:\flyway-3.2.1\conf\flyway.conf:

flyway.url=jdbc:sqlserver://localhost;instanceName=sql2014;database=FlywayDemo;integratedSecurity=true
flyway.driver=com.microsoft.sqlserver.jdbc.SQLServerDriver

Technically you shouldn't need to specifiy the driver explicitly, but I prefer being explicit to avoid surprises. In the above connection, I'm connecting to a database FlywayDemo on a named instance called sql2014, and making use of integrated security (Windows Authentication).

You should now be able to run the command below to verify that Flyway can connect to the DB:

The above shows that it can connect to the database, and that there were no migrations found (we will start with that next). You shouldn't get any errors at this stage, though you may need to configure your database to allow TCP connections if it hasn't been configured to do so already:

Baseline

If you're lucky enough to work on a brand new project with a brand new database, you can just create an empty database, and each change to the database can be created as a new script, each run in order.

But most of us have an existing database we want to work from. For this demo, I'll just use a simple database with a single table as below:

We can use Flyway to baseline this version of the database, say, version 1.0:

This creates a new database table schema_version in the default schema of the database, and tags the database as version 1.0. We could create a backup of this database, and use it as our baseline database to work off of, and when another user restores the database backup, they can use flyway to inspect what version of the database they are using:

Database migration changes

With Flyway, each change to the database should be scripted as an incremental change. This allows us to migrate database changes forward from whatever current version our database is at. Flyway uses specifc conventions for database migration scripts, and we'll follow the default. Scripts are to be placed in the folder C:\flyway-3.2.1\sql, and follow the naming convention prefixVERSIONseparatorDESCRIPTIONsuffix, e.g. V1.1__Some description.sql.

Lets say we want to change our database structure to extract the Type column of the Person table into its own table. This would involve:

Creating a new table, PersonType
Inserting necessary data into PersonType
Adding a new PersonTypeId foreign key to Person
Updating the Person data for the new PersonTypeId
Dropping the old Type column on the Person table

We could lump all of these in a new V1.1 script, however the recommended practice is to split each DDL change into its own version, since some database vendors don't handle transactions very well with DDL statements, and Flyway tries to wrap each migration in a transaction so if any migration fails it doesn't leave the database in an inconsistent state.

Instead, we'll just make each change it's own script, by default:

V1.1__Create PersonType.sql
V1.2__Populate PersonType.sql
V1.3__Create Person.PersonTypeId.sql
V1.4__Create FK_Person_PersonType.sql
V1.5__Update Person.PersonTypeId.sql
V1.6__Drop Person.Type.sql
V1.7__Make Person.PersonTypeId NOT NULL.sql

After we've created these scripts in the sql folder, if we re-run flyway info, we will see that we now have pending changes that need to be applied to the baseline:

We can then use Flyway to migrate our database to include all of these changes:

You'll notice that the version and description columns are pulled from the filename convention. The checksum column is actually a checksum of the contents of the migration script. That is to avoid changes to scripts that were already run in - instead of changing existing scripts, prefer creating a new script with a new version to do the necessary change. If we did modify a script, and run validate, we should see the issue being highlighted:

Sometimes though, it may make sense to modify an existing script, as long as it's functionality doesn't change - lets say the script takes forever to run the migration, and we find a way to optimize it, then it makes sense to change the script later on. In this case, we can repair the metadata stored in the version table using the repair command:

Development time considerations

This way of working does mean you'll need to be more vigilant with how you make database changes. Changes should be done in small incremental changes. An issue when working on teams may be deciding on how to allocate version numbers - if we're all working in parallel, we may all choose to pick V1.1, whereas my changes might only be deployed after my colleague's changes, meaning his should be V1.1 and mine V1.2.

If you were using feature branches, then you could version your changes for your feature as V1, V2, etc. in a separate folder maybe, and only when merging with the common source repository, would you move the files into the sql migration folder with their actual version numbers, e.g. we merge the development branch, notice that someone has implemented V1.6, we can then decide to rename our scripts V1.7.1 and V1.7.2, etc.

Continuous Delivery

What this all enables is a truly continuous delivery option. This would be the typical flow:

Developer checks out latest version of the code
Developer gets latest database setup

either restore latest database backup, and migrate pending changes,
or rebuild the whole database using Flyway

Developer makes changes for their feature, both code and migration scripts
Developer pushes all changes to source control
Continuous Integration build runs

gets latest code
compiles
runs tests
gets Integration Test database setup as above in (2) using the tooling
runs integration tests

Continuous Deployment build runs

migrates database
deploys artifacts

This could be setup to happen automatically for every commit, maybe to deploy to a Quality Assurance or User Acceptance Testing environment (or both). This could even be done to drive Production environment deployments, though that may be more complicated, and may be better done by clicking a button to initiate the deployment rather than automatically deploying after a code change.

Wrap up

Continuous delivery is a part of the devops community that's getting a lot of focus in recent times. Hopefully this article has inspired someone to automate their own deployment process. For those completely new to the database side of this, I recommend reading the links below. For those that aren't new to this, have you tried this on an actual project, and if so maybe share your war stories in the comments! :)

References

Tooling

Flyway
Liquibase
Database Engine Scripting (sqlcmd and SQL Server PowerShell)
DbUp

Articles

Evolutionary Database Design, Martin Fowler & Pramod Sadalage
Three Rules for Database Work, K. Scott Allen
The Baseline, K. Scott Allen
Change Scripts, K. Scott Allen
Versioning Databases – Views, Stored Procedures, and the Like, K. Scott Allen
Versioning Database - Branching and Merging, K. Scott Allen

Tuesday, July 21, 2015

Visual Studio tip: Hide selected text

Today I noticed the menu item Edit > Outlining > Stop Hiding Current (CTRL+M CTRL+U), and wondered to myself what it does. Well, turns out you can hide selected areas of text and then later unhide them. But how do you hide them in the first place??

"The option to hide a selection only shows in the Outlining menu once you've selected text" - Captain Obvious

Thank you Captain Obvious!

So as an example, lets say we have one of the controllers for a sample ASP.NET MVC project, and just want to inspect it to see what it does, we see the below action with a long bunch of lines:

We go through the first few lines and realise that's all just code related to validation, followed by loading up an item and some more validation. We're not really interested in the validation just yet so we can just collapse it by selecting those lines, and going Edit > Outlining > Hide Selection (or if like me you like keyboard shortcuts, CTRL+M followed by CTRL+H). We now can see what the method's actually trying to do, and can easily ignore the validation code for now:

If later you do want to see the folded stuff, it's as easy as expanding the code again. It seems that VS remembers this custom code folding section for later on when I close and open the file, and even after closing VS itself (though I wouldn't rely on it to be there forever... I'm guessing it's stored in some local user settings file).

I've seen people use the #region directive a lot for this kind of thing - making the code easier to read - but then someone else who doesn't like #regions will delete it later, or someone will restructure it. The advantage of doing this is that it doesn't affect anyone else on your team.

To remove the custom code folded section, just do the opposite, Edit > Outlining > Stop Hiding Current (CTRL+M CTRL+U) ... H for hide, U for unhide.

Another use case for this would be doing code reviews, or maybe if you're presenting pieces of code that you don't want the audience to read and get too far ahead of the presentation.

Tuesday, May 26, 2015

DIFFERENCE in MS SQL

So, if I had two sets, A and B, and wanted to get all items that are in A but not B, as well as items in B but not A (i.e. the opposite of an intersection), I would have thought I could just do:

Maybe my memory is failing me, but I thought there was a standard SQL difference operator, but after briefly looking around it seems there isn't.

It is possible though, using:

EXCEPT
FULL OUTER JOIN

I just tried it with table variables, set up as below (seems we get the same result with actual tables, but table variables are easier for demo purposes :B)

EXCEPT way:

Quite verbose, but pretty easy to follow.

The FULL OUTER JOIN way is also pretty easy to follow:

This is probably less easy to maintain, because you have to specifically reference the columns to match on, as well as choose which set to select values from, as is being done in the case statement. This could probably become a bit unwieldly for a table with lots of columns, and hopefully you don't mess up the NULL checks.

I did try this out on a query on a customers database though, and found that the full outer join method can be faster. Below are the execution plans for the two queries:

So, if I'm reading that right, it's having to go through each table more than once in the EXCEPT case, and only once in the FULL OUTER JOIN, which can make quite the... DIFFERENCE... (sorry couldn't help but make that pun :D).

Friday, January 9, 2015

Removing NodeJS folder on Windows

When playing around with demo projects on NodeJS, and then later trying to throw them away and delete the folders, you've most likely encountered the error: The file name is too long. We'll ignore the question of if it's too long, how it got created in the first place...

To illustrate it, create a folder that will cause a long file path. I've had this problem with grunt-contrib-imagemin, so lets install that package too:

Now, if you try and delete this folder, you will get the pesky The file name is too long error:

Le sigh.

If you manually try and change into that directory, at some point you'll also get an issue where you can't even change into it because the filename is too long. But wait, if you look at the above, everything is using short file names except the first folder. Hmm, what if we try the short file name?

It works! Well, it did for me at least.