Moving data from Hive table in MS Hadoop into SQL Server using SSIS Server.

 

I am going to dedicate some blogs to the Microsoft distribution of the HADOOP. If you interested in learning more about it you can get more information from the https://www.hadooponazure.com/

Some people believe that the data magically appear in data warehouses or in the HADOOP cluster. Since I have different opinion on that, I simply decided to learn how can I move data between HADOOP and SQL and the next couple of post will be dedicated to this subject.

Here is the step by step description of how can you move data from the Hive table residing in the HADOOP cluster into the SQL Server using SSIS Package.

I am using here HADOOP cluster in the https://www.hadooponazure.com/for storing HIVE table and SQL Server 2012 RC0 as relational Data base and ETL tools.

Preparing environment

 

1. Hadoop service in the https://www.hadooponazure.com/ comes with the Hive sample table called HiveSampleTable. For the simplicity I will use this table. If you want to create your own, you can follow the steps in appendix, which describe how you can create your own source table

2. In order to access your data from ‘outside’ the Hadoop cluster, make sure that you have all necessary ports opened. In the hadoopazure.com, make sure that you opened the ODBC port like this:

clip_image002

3. You will need to use Hive ODBC driver to connect to the Hive Table in Hadoop. Instructions on how to install and Configure this driver could be found here: How To Connect Excel to Hadoop on Azure via HiveODBC

 

Creating and configuring SSIS Package

 

4. Now it is time switch to the SSIS environment and create the package for loading Hive data into SQL table. For doing that, start SQL Server Data Tools and create new Package.

5. Right click in the Connection Manager filed and select New Connection

clip_image003

6. Chose ODBC Connection Manager, click Add, in the Configure ODBC Connection Manager click New. In the new window , specify the Data Source name as well as your HADOOP credentials.

IMPORTANT: in the Login information you should provide the account information which you used while creating cluster in the hadooponazure.com

clip_image004

7. Create new Data Flow Task by dragging and dropping the Data Flow Task component from the SSIS Toolbox into the Control Flow field.

8. Drag and drop ADO Net Source from the SSIS Toolbox into he Data Flow field

9. Edit the Ado Net Source: specify connection manager that we just created and then chose the HiveSampleTable from the drop down list

clip_image006

10. Assuming that you already created target database on your SQL Server, create destination in Connection Manager

clip_image007

11. You can add then some more transformation, but we will skip it in this document. To complete the exercise, drag and drop the OLEDB Destination into the Data Flow field.

clip_image009

You can use SSIS functionality to create the destination table by clicking on the New button and this will generate Create statement for you.

12. To verify that everything works, you can try to debug SSIS package you have just created:

clip_image010

And see how the rows are moving

Done.

Appendix

Creating Hive table in Hadoop cluster:

1. Browse to the folder C:\Apps\dist\examples\hive\ CreateSampleTable.hql of your Hadoop cluster installation

2. Start the Hadoop Command Shell

3. In the command shell switch to hive

clip_image012

4. Copy the CreateSampleTable.hql Create Table statement and past it into the hive command prompt and execute the statement:

clip_image014

5. After execution completed successfully, load HiveSampleTable with the data. To do that, copy LOAD DATA statement into the have command prompt and execute this statement:

clip_image016

Advertisements

2 Responses to Moving data from Hive table in MS Hadoop into SQL Server using SSIS Server.

  1. pjsm68 says:

    You state that you need to create an ODBC connection manager and then use an ADO.NET source adapter. I think you mean to create an ADO.NET connection manager that uses the ODBC data provider.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: