There are multiple ways you can load the data into SQL Azure DB. But I am going to write about loading data using BCP and SSIS. Refer the below article to find out the different ways to load the data to SQL Azure,
http://blogs.msdn.com/b/sqlcat/archive/2010/07/30/loading-data-to-sql-azure-the-fast-way.aspx
While writing this post i found that there is a simplified way to Import and Export the data from/to SQL azure is on it's way, http://blogs.msdn.com/b/sqlazure/archive/2011/03/24/10145578.aspx
http://blogs.msdn.com/b/sqlcat/archive/2010/07/30/loading-data-to-sql-azure-the-fast-way.aspx
While writing this post i found that there is a simplified way to Import and Export the data from/to SQL azure is on it's way, http://blogs.msdn.com/b/sqlazure/archive/2011/03/24/10145578.aspx
Loading data using BCP : BCP utility offers lots of flexibility to load data into SQL server and if you are familiar in using this against normal SQL Server DB then it’s almost to use it against SQL Azure.
Below is the requirement that we need to accomplish,
Create a DB and tables in SQL Azure and copy the data from flat files to the newly created tables. This has to be implemented in such way that script can be executed multiple times or can be executed against different SQL Azure instance.
1) Use batch file to create Azure DB which accepts the connection string as parameter using Sqlcmd utility. Refer sqlcmd utility to learn more about it,
Eg: sqlcmd -U loginname@servername –P ******** -S servername.database.windows.net -d master -N -m-1 -V 1 –Q "CREATE DATABASE YourDataBaseName (MAXSIZE = 1 GB)".
To connect to SQL Azure, the username has to be provided in loginname@ServerName notation and the server name has to be fully qualified name of the SQL Azure (can be copied from the Data Manager UI). I have used –N to encrypt the connection and set -V to 1 to capture if there is any error. –m can be used with –V to return values to stdout, -1 will return all error and informational messages to stdout. –Q to is execute single query with exit. This will be useful when you just want to execute a single query and exit sqlcmd. Sqlcmd parameters are case sensitive.
2) Using the Sqlcmd in the same batch file you can create the database objects by executing the sql script file with –i parameter.
Eg : sqlcmd -U loginname@servername –P ******** -S servername.database.windows.net –d newdatabase -N -i CreateObjects.sql
CreateObjects.sql file has the script to create tables, views, .etc.
Use the errorlevel which was captured from the previous command to execute this command. Incase if there was an error while creating the DB then there is no point in executing this step. If there is an error then the errorlevel variable will be set to the error # from stdout value from sqlcmd.
3) Once the database and the tables are created successfully you can use the bcputility to load the data. Refer bcputility to learn more about it,
Eg: bcp dbName.dbo.[TableName] in Filename. -U loginname@servername –P ******** -S servername.database.windows.net –c
Note that I didn’t specify a field delimiter or row delimiter \t and \r\n will be used by default. –c is used to perform this operation using char data type. In case if your input files are complicated than simple tabbed delimited file then use the format file to define the mapping b/w the flat file and the table. For this test I have exported records from the existing SQL Server as tabbed delimited file.
If you are able to execute these three commands successfully then you have inserted the data into new SQL Server DB from a flat file.
You can customize all these commands in a single batch file which accepts parameters from instead of hard coding and then the batch file can be reused to meet our requirement.
Loading Data from SSIS:
Loading data from SSIS is pretty simple as soon as you are able to create the connection with your source and destination. You have to use the user name as “Loginname@Servername” notation as you did while executing the sqlcmd or bcp commands to create a connection with SQL Azure DB.
1) Create the connection manager for Source and Destination components.
2) Create a dataflow task and map the source and destination.
3) The beauty of SSIS is, you can manipulate the data before you load it and you can have multiple sources unlike bcp which is restricted to only flat files.
4) You can use the Execute SQL Command task if you have to execute a query.
5) In case if you have to create the database and tables dynamically, you have to create the Package with all tables created to map the source and destination and then drop the database and set the delay in validation property for the package to true. Below is the screen shot my sample package,
Things to remember:
· Create Database should be executed separately since it will not run with any other statements.
· Create Database has to be run before we execute the create table script since we need to specify the newly created database in connection string. SQL Azure doesn’t support USE statement so we cannot change the database after connecting to master database.
· Use bcp if you have to load large amount of data where you don’t need to manipulate the source file. It’s faster than SSIS in my test.
· Use SSIS if you have to manipulate the input and if you have different sources. It takes little bit of time to initiate the connection and executing the package because of validation.
· In Execute SQL Command task you cannot pass database name as parameter to execute the “Create Database” query since it will be executed as multiline statement and it’s not supported in SQL Azure. Create Database statement should be the only statement which can be executed in SQL task. The best way to create Database dynamically is by using variable to store the SQL whole create database statement.
· You can execute the tasks parallel to get better performance however it depends on your network speed as well.
· In SSIS when you are working with huge amount of data or executing multiple tasks in parallel make sure you follow best practices. For Eg, changing the temp location, and buffer path from default, breaking executing tree.
· There is a high possibility that SSIS works perfectly for smaller amount of data in your test environment and it will fail to execute or simply hang production environment when you execute with large amount of data. It’s always better to test with the highest anticipated volume of data if you choose to use SSIS.
· As suggested in the initial article, try to split the input file for SSIS and use first row and last row attributes in bcp if you are dealing with larger data file.
· Last but the not the least, since we are connecting to the cloud there is high possibility that we can lose our network connection in the middle. So always have plan for restartability.
No comments:
Post a Comment