Setting up multi-node HDP Cluster

As you may be already aware Hortonworks announced support of the HADOOP cluster on Windows platform.  In the project where we involved at the moment, we’ve got luck to  play  with the multi-node HDP cluster. Personally I felt as I am back in the beginning of the 90th with all the configurations and much  stuff which should run  from the command line. However, the HDP is really a lot of fun and  easy to setup and run. Here is the list of things you may need to know in order to save time setting up the HDP in your environment:

1. Even if it sounds very obvious, follow the documentation. Software listed in the SW requirements list should be installed and the parameters like JAVA_HOME have to be configured.

2. Pay attention to the Firewall ports which have to be opened! HDP needs a lot of ports, in some cases I was opening the range instead of the single ports (like, 50000-60000 instead of all the single ports in the 50K range).

3. Leave only one network card and disable the rest that you have on your box.

4. Disable IPv6. You can do it by executing following command for each interface:

netsh interface teredo set state disable
netsh interface 6to4 set state disabled
netsh interface isatap set state disabled

5. In the clusterproperties.txt make sure that you specify path for setting up HDP, pay attention – path with subfolders is not  allowed, no space in the folder name allowed. The documentation misleading here. Example of the correct string:

#Log directory

HDP_LOG_DIR=C:\HDP_LOG

#Data directory

HDP_DATA_DIR=C:\HDP_DATA

6. Same configuration file, #Hosts section – the FULL UNC path will not work! Use short  names instead, Like bellow:

NAMENODE_HOST=server02

7. Edit hosts file and add nodes names with their IP addresses there

8. Now you can run installation. Example of the setup sting:

C:\Windows\system32>msiexec /i "C:\HDP_SETUP\hdp-1.1.0-160.winpkg.msi" /lv "C:\HDP_SETUP\installer.log" HDP_LAYOUT="c:\HDP_SETUP\clusterproperties.txt" HDP_DIR="C:\hdp" DESTROY_DATA="Yes"

You will need to run this script on every node of the cluster!

9. In case if setup filed, make sure you run uninstall command before trying to re-run installation process.

msiexec /x "C:\HDP_SETUP\hdp-1.1.0-160.winpkg.msi" /lv "C:\HDP_SETUP\installer.log" DESTROY_DATA="no"

10. Now, before running setup you will need to enable Remote Scripting on every node. Looks like it doesn’t work correctly. Therefore even if setup was successful, you will see in the error log that something like this:

CAQuietExec:  WINPKG: rd /s /q G:\HadoopInstallFiles\HadoopPackages\hdp-1.1.0-winpkg\resources\hadoop-1.1.0-SNAPSHOT.winpkg

CAQuietExec:  powershell.exe : powershell.exe : powershell.exe : Stop-Service : Cannot find

CAQuietExec:  any service with

CAQuietExec:  At G:\HadoopInstallFiles\HadoopSetupTools\winpkg.ps1:118 char:9

CAQuietExec:  +     $out = powershell.exe -InputFormat none -Command "$command" 2>&1

CAQuietExec:  +            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

CAQuietExec:      + CategoryInfo          : NotSpecified: (powershell.exe …y service with 

CAQuietExec:     :String) , RemoteException

CAQuietExec:      + FullyQualifiedErrorId : NativeCommandError

11. On every node manually start the services by calling start_local_hdp_services.cmd

12. If everything is done right, you will be able to run a smoke test with no errors and enjoy the HDP cluster

Using SSIS in Azure solutions

We finished work on those papers some time before beginning of the 2013, however I just found time to refer to them from my blog.

Here we go:

Tips Tricks and Best Practices: SSIS Operational and Tuning Guide

and

SSIS Tips Tricks and Best Practices: SSIS for Azure and Hybrid Data Movement

Enjoy reading 🙂