Aaronontheweb

Hacking .NET and Startups

How to Migrate Data between On-Premise SQL Server 2008 R2 and SQL Azure Without Getting a Migraine

November 28, 2011 17:33 by Aaronontheweb in Azure, SQL Server // Tags: , , // Comments (2)

I love my job at Microsoft, but there are some times when we simply make it really damn hard for people to do business with us. Migrating data from an on-premise SQL Server to SQL Azure is sadly one of those lapses where, for whatever reason, we’ve left people who aren’t full time SQL Server DBAs totally lost in the wilderness with a set of poorly documented and often dysfunctional tools.

After spending a couple of hours shaving yaks myself tonight trying to move a ~250mb data set (small, but not trivially small) from one SQL Azure database to another, I thought it would be best if I documented what I did to make it work.

Scenario: I needed to download an expensive dataset on one SQL Azure account and migrate it to another on a different SQL Azure subscription, but I needed to make and test some schema changes against the data set locally before I exported it back to its final resting place.

Seems simple enough, right? I just need to:

  1. Download the SQL Azure Database to my local system;
  2. Make a back-up of the dataset locally in case I screw up the other one during my schema changes;
  3. Finish making / testing / integrating schema changes;
  4. Deploy to SQL Azure on new subscription.

Sounds pretty easy to me, as it would any sane developer who’s had some experience working with the Microsoft server stack since 2000.

Rather than tell you the things I spent hours trying that don’t work I’ll explain what does.

Before you do any of this, make sure your SQL Azure Firewall has an exception for whichever development / backup machine you’re using.

Backing Up SQL Azure to On-Premise SQL Server 2008 R2: RedGate’s SQL Azure Backup

SQL Azure Backup from RedGate solved my first problem painlessly – currently SQL Azure Backup is free so grab a copy while you still can.

All you have to do is point it to your target database on SQL Azure and your target on-premise SQL Server database (screenshot taken from RedGate.)

That’s about as complicated as it gets.

Pushing from SQL Server 2008 R2 to SQL Azure: SQL Server Management Studio Data Export (Requires Magic)

One of the SQL Azure migration techniques Microsoft recommends is using the SQL Server Import and Export Wizard to push your data to SQL Azure. Unfortunately they leave one critical part out, which I will show you.

You begin by selecting your database in Management Studio (I’m using the Express edition.)

image

And for the second step you can leave all of the client / connection settings as-is for your on-premise SQL Server (shown below.)

image

And now we come to the part that the MSDN documentation totally left out… If you’re like me, you’ll naturally try to login using the SQL Server Native Client 10.0 datasource for your export target, because hey, makes sense right?

image

As it turns out, you need to use the .NET Framework Data Provider for SqlServer and change the following fields:

  1. Set Security –- Encrypt to true;
  2. Set Security – Password to your SQL Azure login password;
  3. Set Security – User ID to your SQL Azure user id without the @servername at the end;
  4. Set the Source – Data Source property to the servername.database.windows.net – no need to specify any of the TCP or port nonsense here; and finally
  5. Select the Source -- Initial Catalog to be [SQL Azure database you’ve already created on your service but haven’t necessarily set a schema for yet.]

Once you’ve cleared this hurtle it’s pretty much smooth sailing.

If you have any questions about this process or why I didn’t mention some of the available alternatives, go ahead and ask them in the comments and I’ll get back to you.

If you enjoyed this post, make sure you subscribe to my RSS feed!



MongoDB vs. SQL Server 2008: A .NET Developer’s Perspective

June 30, 2011 16:01 by Aaronontheweb in ASP.NET, MongoDB, SQL Server // Tags: // Comments (22)

One of the first projects I put together this year was Captain Obvious, a nifty little application that runs off of AppHarbor and ASP.NET MVC3. What made Captain Obvious special for me was that it was my first time using something other than a relational database1 in production – I chose MongoDB because it stands out to me as a lightweight, easy-to-work with store that’s easier to use for most CRUD applications. Since then I’ve gone on to build other projects which depend on Mongo.

What I’ve learned since is that MongoDB and SQL Server are tools that aren’t 100% interchangeable and are more situational than dogmatists make them out to be.

My goal in writing this is to help inform you on how you should decide to judge these two technologies as options for your ASP.NET / WebMatrix / WCF applications.

Relational Data Models vs. Document Models

The key to using Mongo or SQL Server effectively is understanding how the underlying data model works and how this impacts your ability to read / write what you want when you want to the datastore. Below are illustrations of the relational (SQL) model versus the document model.

mongo vs sql differences

 

In a relational model all of your information is expressed in the form of tables, all of which contain keys and some of which contain foreign keys to other tables. All of the information you read from and write to the database is expressed as either adding rows to these tables or combining their values based on some keys.

In a document model you have a relatively flat collection of items all identified by one primary key2, and instead of defining a relationship between two collections you simply embed one inside the other. So the relationship between the three tables in our relational model is expressed as a single, flat document in this model.

It is important to note here that there’s no schema that tells MongoDB what fields should or shouldn’t be expected in each document in the “Things” collection – in the relational universe each table is strongly, declaratively typed and every single item inserted into the row must conform to all of the constraints imposed by the relational database management system (RDBMS.) Anything goes in Mongo (although this can cause major problems, as we will see later.)

What Do Relational and Document Models Have in Common?

So what do these two data models have in common?

  1. Both support the notion of primary keys and indexes, and MongoDB can support multiple indices if needed;
  2. Both support queries and have models for sorting / limiting results;
  3. Both support the ability to reference other documents / tables (“wha? in Mongo? Yup.”)
  4. Both have a strong typing system; and
  5. Both support aggregation operations like SUM(), COUNT(), etc…

 

Seems pretty straightforward, right? But what’s up with documents being able to refer to each other?

As it turns out, implementing a database where every possible piece of information of interest to users is all embedded inside of its own distinct document comes with some drawbacks, so the creators of MongoDb added support for DbReference and the ability to do cross-collection references as well. A necessary evil in the name of practicality.

What’s Different between Relational and Document Models?

So what’s really different between the two models?

  1. Document models don’t have SQL – their query tools are extremely primitive in contrast (but the models are also much simpler and don't require sophisticated queries;)
  2. Fetching document references in the document model has to be done inside of separate queries3 whereas they can be done all in the same transaction in the relational model;
  3. Document models don’t have a schema – each document in a collection can have extra fields or fields with different type values, whereas all rows must conform to the same set of constraints in order to be inserted;
  4. In the document model types are associated with data upon assignment, rather than declared in advance;
  5. Document models have much more primitive tools for performing aggregations (edit: actually, not necessarily true for Mongo - it has built-in MapReduce which is powerful and sophisticated;) and
  6. Queries are defined on a per-collection basis in the document model, whereas a query in the relational model is an abstraction that simply refers to any number of related tables.

 

The picture that should be starting to form in your head of MongoDb vs. SQL Server at this point is a flat, simple system on one side and a multi-dimensional, rich system on the other. This is how I look at the two technologies in contrast to each other.

What’s Different about Developing .NET Apps Against Mongo and SQL Server 2008?

So at the end of the day, what's the bottom line for .NET developers who want to use Mongo or SQL Server in a web application? What are the REAL trade-offs?

Mongo IS the OR/M

The core strength of Mongo is in its document-view of data, and naturally this can be extended to a "POCO" view of data. Mongo clients like the NoRM Project in .NET will seem astonishingly similar to experienced Fluent NHibernate users, and this is no accident - your POCO data models are simply serialized to BSON and saved in Mongo 1:1. No mappings required.

I'm going to show you two similar pieces of source code I am using in production - one using NoRM and MongoDb on CaptainObvious and the other using Dapper and SQL Azure on XAPFest:

NoRM:

public IdeaResultSet GetIdeasByAuthor(int authorID, int offset = 0, int count = 10)
        {
            using (var db = Mongo.Create(ConnectionString))
            {
                var ideas =
                    db.GetCollection().AsQueryable().Where(x => x.AuthorReference.Id == authorID)
                    .Skip(offset)
                    .OrderByDescending(
                        x => x.DatePosted).Take(count).ToList();

                var totalIdeas = db.GetCollection().AsQueryable().Where(x => x.AuthorReference.Id == authorID).Count();

                //Fetch the referenced authors before we serve up the list again
                foreach (var idea in ideas)
                {
                    idea.Author = idea.AuthorReference.Fetch(() => db);
                }

                var resultSet = new IdeaResultSet { Ideas = ideas, 
                    PaginationValues = new PaginationTuple { 
                        MaxPages = PageCounterHelper.GetPageCount(totalIdeas, count), 
                        CurrentPage = PageCounterHelper.GetPageCount(offset, count) } 
                };

                return resultSet;
            }
        }

Dapper:

public IList<XfstApp> GetAppsByUser(string userName)
        {
            using (var conn = new SqlConnection(ConnectionString))
            {
                try
                {
                    conn.Open();

                    var appResults = conn
                        .Query(@"SELECT * FROM Apps
                                            INNER JOIN AppOwners ON AppOwners.AppName = Apps.AppName
                                            WHERE LOWER(AppOwners.UserName) = LOWER(@UserName)
                                            ORDER BY Apps.DateCreated ASC", new { UserName = userName });

                    //Return whatever we were able to collect
                    return appResults.Select(x => x.ToApp()).ToList();
                }
                catch (SqlException ex)
                {
                    TraceError(ex);
                    return new List<XfstApp>();
                }
                finally
                {
                    //Close the connection when we're finished, regardless of what happened
                    conn.Close();
                }
            }
        }

 

The amount of source code for these two technologies or the nature of it isn't wholly different.... What is drastically different is how I was thinking about the data when I was writing this code - I can save an instance of an Idea object to MongoDb on CaptainObvious without ever having created the collection first or defined a schema.

Whenever I want to look up an idea, I just pick one based off of a key value that I specify and I don't worry about any joins or anything (although I do have to load objects from the author collection if I need to display the author's name and contact info.)

In the SQL universe, I have to define my tables in advance and each time I want to extract an object, I have to think of it in terms of combined relationships between my tables and this requires a more thoughtful approach.

Mongo, in other words, lends itself to rapid application development whereas SQL Server has some innate friction built into any schema-based system.

In Mongo, the DBMS Isn't There to Protect Your Data's Integrity

One of the major advantages of a schema-based DBMS is that if the data a calling application tries to insert something that doesn't fit the schema into a row, the operation always fails. In Mongo, this isn't true - you can have one record in a collection with extra fields or fields of an odd type, and it can totally screw up the BSON serializer when it tries to process the collection (depending upon how flexible the serializer is.)

SQL users take this for granted, but when you have issues in Mongo along these lines they can be really frustrating to solve and difficult to debug.

In Mongo, Operations May Not Be Atomic

Operations are not atomic in Mongo by default, so all sorts of fun things can happen when you have multiple users changing properties on the same document. You can set an atomic flag to true, but even then operations still aren't really atomic (they're written from memory to disc in bulk.)

If you use Mongo and carry the same ACID assumptions that we learned on SQL Server, you might be in for a nasty surprise :p

Conclusion

Overall, the biggest difference between these two technologies is the model and how developers have to think about their data. Mongo is better suited to rapid application development, but in my opinion falls apart in scenarios where ACID-compliant systems are a must, like anything that goes anywhere near a financial transaction.

But, that's just my opinion :p


1typically I’ve only used SQL Server / MySQL in the past

2You can add indices to other fields in Mongo too, but that’s outside the scope of this article

3unless I am doing it wrong, which I may well be doing

If you enjoyed this post, make sure you subscribe to my RSS feed!



Search

About

My name is Aaron, I'm an entrepreneur and a .NET developer who develops web, cloud, and mobile applications.

I left Microsoft recently to start my own company, MarkedUp - we provide analytics for desktop developers, focusing initially on Windows 8 developers.

You can find me on Twitter or on Github!

Recent Comments

Comment RSS

Sign in