Connect Cloudgene to a Hadoop Cluster¶
When you execute workflows with Hadoop steps, Cloudgene needs to know where the Namenode and Jobtracker are. These information are in the Hadoop config files in
$HADOOP_HOME/conf on your Namenode. If you start Cloudgene on the Namenode then all configuration files are already located on the machine.
Copy the Configuration Files from Cluster to Cloudgene¶
When you plan to run Cloudgene on a different server than the Namenode, then you have to copy the confioguration files from the cluster to your Cloudgene instance.
Copy the following configuration files from the
$HADOOP_HOME/conf on your Namenode to a directory on your Cloudgene instance (e.g.
Configure Cluster Connection¶
Next, you have to define the cluster and the path to its configuration files in your
cluster: # cluster name name: Test Cluster # the cluster type. Currently, the only cluster supported is hadoop. type: hadoop # configuration files from your cluster conf: /path/to/cloudgene/hadoop-conf # username which should be used for job execution user: hadoop
Validate Cluster Connection¶
You can use the
verify-cluster command to check if Cloudgene is able to establish a connection to your cluster:
You should see a message like
Hadoop cluster is ready to use.
Next, restart the Cloudgene webserver, open the Admin Panel and click on the Server tab. In the section Resources you will see if Cloudgene was able to establish a connection to your Hadoop cluster as well as all nodes were found: