Setup open source Distributed R on a three node cluster with R and execute tests on workers

————————————————————————————————————————

———————-              DISTRIBUTED R              ————————-

————————————————————————————————————————

References:

  1. Pre requisite packages
#Install dependencies
$ sudo yum install libtool zlib devel automake pkgconfig gcc c++ curl 
$ sudo yum install -y make gcc gcc-c++ libxml2-devel rsync

# Install R
$ curl -O http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
$ sudo rpm -i epel-release-latest-7.noarch.rpm
$ sudo yum update
$ sudo yum install R R-devel

2. Move the installation archive to target. copied from github https://github.com/vertica/DistributedR

scp -r disvert.tar disvert@10.0.0.5:/home/disvert/
ssh disvert@10.0.0.5

3. Remove any older version of the package and verify R installation

# Connect to the R console and make sure to remove any old versions
remove.packages('Rcpp')
remove.packages('RInside')

# Go to the source code of Distrib R and make clean 
make clean
whereis R
make distclean

# Remove any old installation
rm -rf /opt/hp/distributedR/

4. Update the environment for execution. This can be done towards the end of the installation.

Make sure you have password-less access to other nodes to your cluster nodes.

# Add the R runtime to the path bin just in case its installed separately 
ln -s /opt/disvert/R/bin/R /bin/R
ln -s /opt/disvert/R/bin/R /sbin/R
# Update the environment variables in ~/.bashrc file for the libraries and executables path
export R_HOME=/opt/disvert/R
export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.34.x86_64/jre
export DR_HOME=/opt/hp/distributedR
export PATH=/opt/disvert/bin:/opt/disvert/R/bin:$DR_HOME/bin:$PATH
export LD_LIBRARY_PATH=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.34.x86_64/jre/lib/amd64/server:$DR_HOME/lib:$DR_HOME/third_party/lib:$DR_HOME/third_party/lib/atomicio/:$LD_LIBRARY_PATH

5. Install the following from the $DR_HOME/third party lib folder of the github distribution.

Press tab to autocomplete the version as per the package archive name in the folder.

R CMD INSTALL Rcpp_
R CMD INSTALL RInside_
R CMD INSTALL XML_
R CMD INSTALL randomForest_
R CMD INSTALL chron_
R CMD INSTALL data.table_

6. Build dependencies. Go to the DistributedR-master/third_party/ directory and make -j4

make -j4 all

7. Build and install the actual code in the DistributedR-master

make -j4
make install

———————————————————-
8. Test the library execution in the R console
———————————————————-

library(distributedR)
distributedR_start() # start DR
distributedR_status()
B
getpartition(B) # collect darray data
distributedR_shutdown() # stop DR

————————————————————————————————————————
9. Cluster configuration for the nodes are available at /opt/hp/distributedR/conf/cluster_conf.xml
————————————————————————————————————————

  • node0001 = 10.0.0.3,/home/disvert
  • node0002 = 10.0.0.4,/home/disvert
  • node0003 = 10.0.0.5,/home/disvert

Following configuration is for the node0001 and will be replicated on other nodes with the server info configuration.

<MasterConfig>
 <ServerInfo>
 <Hostname>10.0.0.3</Hostname>
 <StartPortRange>50000</StartPortRange>
 <EndPortRange>50100</EndPortRange>
 </ServerInfo>
 <Workers>
 <Worker>
 <Hostname>10.0.0.3</Hostname>
 <StartPortRange>50000</StartPortRange>
 <EndPortRange>50100</EndPortRange>
 <Executors>0</Executors>
 <SharedMemory>0</SharedMemory>
 </Worker>
 <Worker>
 <Hostname>10.0.0.4</Hostname>
 <StartPortRange>50000</StartPortRange>
 <EndPortRange>50100</EndPortRange>
 <SharedMemory>0</SharedMemory>
 <Executors>15</Executors>
 </Worker>
 <Worker>
 <Hostname>10.0.0.5</Hostname>
 <StartPortRange>50000</StartPortRange>
 <EndPortRange>50100</EndPortRange>
 <SharedMemory>0</SharedMemory>
 <Executors>15</Executors>
 </Worker>
 </Workers>
</MasterConfig>

This will get you started on the distributed R tests. I hope such a cluster configuration will be handy for any data crunching that you may want to do with R.

Advertisements

One thought on “Setup open source Distributed R on a three node cluster with R and execute tests on workers

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s