Spring Rest API and Mongodb with GridFS

Reference: https://github.com/vishwakarmarhl


Hello draft of the Mongo DB based file data store service


There are two more branches apart from the master branch which have the file upload functionality

  1. Checkout the branch
  2. Pull the current branch
    git pull
  3. After making changes in any of them index and commit
    git add .
    git commit -m “Updated the code and added a message”
  4. Push changes to the github repository
    git push origin FileUploadGridFSSpring
    git push origin MultiPartFileUpload


Download : http://www.mongodb.org/dr/downloads.mongodb.org/win32/mongodb-win32-x86_64-2.4.5.zip/download

  1. Unzip the zip contents in C:\mongodb\
  2. create C:\data\db
  3. Execute the C:\mongodb\bin\mongod.exe –dbpath C:\data\db


//The following command simply pre-allocates a 2 gigabyte, uncapped collection named people. db.createCollection(“files”, { size: 2147483648 }) db.files.save( { fileId: ‘1235’, fileName: ‘V_XXX.EXE’, filePath: ‘/opt/storage/rhldata’, fileSizeInKB: 123342, fileExtensionType: ‘EXE’ })


  1. Start the mongod.exe standalone server
  2. Import the source as a maven project in Eclipse STS IDE
  4. Deploy on a tomcat instance to see the data from mongodb
  5. An alternate plugin for tomcat enables maven based initialization. mvn tomcat7:run
  6. Open up http://localhost:8088/RDataServe to checkout the grid for file data


Mongo Shell Cmds

show dbs show collections

//This command creates a collection named file with a maximum size of 5 megabytes and a maximum of 5000 documents. db.createCollection(“files”, { capped : true, size : 5242880, max : 5000 } )

//The following command simply pre-allocates a 2 gigabyte, uncapped collection named people. db.createCollection(“files”, { size: 2147483648 })

//Drop a collection capped db.files.drop()

//Insert db.files.insert( { _id: 1, fileId: 1234, fileName: ‘R_XXX.EXE’, filePath: ‘/opt/storage/rhldata’, fileSizeInKB: 123412, fileExtensionType: ‘EXE’ })

db.files.save( { fileId: ‘1235’, fileName: ‘V_XXX.EXE’, filePath: ‘/opt/storage/rhldata’, fileSizeInKB: 123342, fileExtensionType: ‘EXE’ })

//Query db.files.find({fileId:1234})

vRODBC setup for DistributedR and Vertica database cluster


In this particular case there is vertica database server installed with Vertica-R-Udtf setup. The R runtime is being provided from within vertica hence the environment variables changed as below.

Reference: http://www.vertica.com/wp-content/uploads/2014/06/vRODBC-Installation-Guide.pdf

1. Add the following variables in the .bashrc of root for library linking and source it.

export R_HOME=/opt/vertica/R
export DR_HOME=/opt/hp/distributedR
export LD_LIBRARY_PATH=/usr/local/lib/:/opt/vertica/lib64/:/usr/lib/jvm/java-1.6.0-openjdk-$DR_HOME/lib:$DR_HOME/third_party/lib:$DR_HOME/third_party/lib/atomicio/:$LD_LIBRARY_PATH

2. Install using the  DistributedR-master/third_party/unixODBC-2.3.1.tar.gz

$ tar -xvzf unixODBC-2.3.1.tar.gz
$ cd unixODBC-2.3.1
$ ./configure && make
$ make install

3. Install the vRODBC available in the DistributedR-master/ source folder


4. Create the config files for vRODBC

# Create a directory for the config files
$ mkdir /home/dbadmin/vRODBC/

5. Add the following content to this folder in vertica.ini

 DriverManagerEncoding = UTF-16
 ODBCInstLib = /usr/local/lib/libodbcinst.so
 ErrorMessagesPath = /opt/vertica/lib64
 LogLevel = 0
 LogPath = /tmp

6. Add the following content to this folder in odbc.ini

 Description = vRODBC test client
 Driver = /opt/vertica/lib64/libverticaodbc.so
 Database = test
 Servername =
 UserName = dbadmin
 Password = test
 Port = 5433
 ConnSettings =
 Locale = en_US

7. Use the following environment variables in the users .bashrc and source it

 export R_HOME=/opt/vertica/R
 export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-
 export DR_HOME=/opt/hp/distributedR
 export PATH=/opt/vertica/bin:/opt/vertica/R/bin:$DR_HOME/bin:$PATH
 export LD_LIBRARY_PATH=/usr/local/lib/:/opt/vertica/lib64/:/usr/lib/jvm/java-1.6.0-openjdk-$DR_HOME/lib:$DR_HOME/third_party/lib:$DR_HOME/third_party/lib/atomicio/:$LD_LIBRARY_PATH

 # ODBC Configurations
 export VERTICAINI=/home/dbadmin/vRODBC/vertica.ini
 export ODBCINI=/home/dbadmin/vRODBC/odbc.ini

8. Open the R console and test the connections using the following script

connect <- odbcConnect("Test")
segment <- sqlQuery(connect, "SELECT * FROM role")

Setup open source Distributed R on a three node cluster with R and execute tests on workers


———————-              DISTRIBUTED R              ————————-



  1. Pre requisite packages
#Install dependencies
$ sudo yum install libtool zlib devel automake pkgconfig gcc c++ curl 
$ sudo yum install -y make gcc gcc-c++ libxml2-devel rsync

# Install R
$ curl -O http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
$ sudo rpm -i epel-release-latest-7.noarch.rpm
$ sudo yum update
$ sudo yum install R R-devel

2. Move the installation archive to target. copied from github https://github.com/vertica/DistributedR

scp -r disvert.tar disvert@
ssh disvert@

3. Remove any older version of the package and verify R installation

# Connect to the R console and make sure to remove any old versions

# Go to the source code of Distrib R and make clean 
make clean
whereis R
make distclean

# Remove any old installation
rm -rf /opt/hp/distributedR/

4. Update the environment for execution. This can be done towards the end of the installation.

Make sure you have password-less access to other nodes to your cluster nodes.

# Add the R runtime to the path bin just in case its installed separately 
ln -s /opt/disvert/R/bin/R /bin/R
ln -s /opt/disvert/R/bin/R /sbin/R
# Update the environment variables in ~/.bashrc file for the libraries and executables path
export R_HOME=/opt/disvert/R
export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-
export DR_HOME=/opt/hp/distributedR
export PATH=/opt/disvert/bin:/opt/disvert/R/bin:$DR_HOME/bin:$PATH
export LD_LIBRARY_PATH=/usr/lib/jvm/java-1.6.0-openjdk-$DR_HOME/lib:$DR_HOME/third_party/lib:$DR_HOME/third_party/lib/atomicio/:$LD_LIBRARY_PATH

5. Install the following from the $DR_HOME/third party lib folder of the github distribution.

Press tab to autocomplete the version as per the package archive name in the folder.

R CMD INSTALL randomForest_
R CMD INSTALL data.table_

6. Build dependencies. Go to the DistributedR-master/third_party/ directory and make -j4

make -j4 all

7. Build and install the actual code in the DistributedR-master

make -j4
make install

8. Test the library execution in the R console

distributedR_start() # start DR
getpartition(B) # collect darray data
distributedR_shutdown() # stop DR

9. Cluster configuration for the nodes are available at /opt/hp/distributedR/conf/cluster_conf.xml

  • node0001 =,/home/disvert
  • node0002 =,/home/disvert
  • node0003 =,/home/disvert

Following configuration is for the node0001 and will be replicated on other nodes with the server info configuration.


This will get you started on the distributed R tests. I hope such a cluster configuration will be handy for any data crunching that you may want to do with R.

Secure an instance with SSH RSA based on a public and private key pair for access

Secure a public instance with SSH-2 RSA. This would be done for provisioning private key based SSH access to a user test on the remote machine.


A larger reference is available in the above link. I have tried the multi node password less test user setup.

  1. Create the user for secure access on the remote machine. In case its a cluster then on all the nodes of the cluster.
 $ adduser test
 $ passwd  test

2. Generate a public(id_rsa.pub)/private(id_rsa) KeyPair without a pass. The path shall be /home/test/.ssh

 $ ssh-keygen -t rsa

3. Add the public key string which looks something like below, in id_rsa.pub to the OpenSSH keys file at /home/test/.ssh/authorized_keys

ssh-rsa AAAAB3NzaC1 ... ... ...3PGVu4D+37RA0expQUJX1p++JtLlaw== rsa-key-20150623-test-user

4. Move the generated keys id_rsa, id_rsa.pub and authorized_keys to all the test nodes we want password-less access to.

Make sure the files are readable with the right permissions on the remote machines
 chmod 700 ~/.ssh
 chmod 600 ~/.ssh/*

5. Access the nodes from another machine with the private key in the .ssh folder

$ ssh -v test@ # should log in without a password

6. Share the key with another user in a ppk file for putty based access.
Move the private key to another system and use windows puttygen to load it and save the private key as a ppk file . Use this private key file in pageant to access this instance as team user

How to install NodeJS and use NVM for your node versioning with sailsjs framework for no reason at all

                     NodeJS (https://nodejs.org)

1.) Always Use NVM for using the right version of node installation

$ curl https://raw.githubusercontent.com/creationix/nvm/v0.23.3/install.sh | bash
$ wget -qO- https://raw.githubusercontent.com/creationix/nvm/v0.23.3/install.sh | bash

Usage of nvm for showing the list of remote and local node installation available

$ nvm list-remote
$ nvm install 0.12.0
$ nvm list

2.) Install node manually
$ wget http://nodejs.org/dist/latest/node-v0.12.0.tar.gz
$ tar -xvzf node-v0.12.0.tar.gz
$ cd node-v0.12.0/
$ sudo yum groups install “C Development Tools and Libraries”

$ ./configure
$ sudo make install

Verify Installation:

$ node –version
$ npm –version
3.) Setup Sails.js framework on node
$ http://sailsjs.org/
$ https://github.com/balderdashy/sails-docs

Since evryting works on node use nvm to source the installed node version for this sails project
$ nvm current
$ nvm ls
$ nvm use v0.12.0
$ npm -g install sails

$ sails –version

4.) Start the server
$ sails new ships
$ cd ships
$ sails lift

SailsJs frameworks and simplified management app has made the web development overly simple. The nvm system is more linux centric but I am sure there is something like https://github.com/coreybutler/nvm-windows which does the trick I suppose.

5.) Lets add some web API to the project. Adding more to Sails code
Generate a controller and model specifically. The routes are automagically taken care of.

$ sails generate controller user
$ sails generate model User name:string email:string type:string id:int

Generate controller and model as an api
$ sails generate api vendor

$ sails lift

Now the in-memory default ORM kick in and sets up the user and vendor model.

Will try to grow more on the sailsjs framework so that I can really make sue of something here. Lets hope I can convert this in some of my planned work.

Develop on Heroku #PaaS for overly simplified build, deployment, scaling and management of your NodeJS application. NodeJS is not the limit though

Heroku is a cloud platform as a service (PaaS) supporting several programming languages. The developers have added support for Java, Node.js, Scala, Clojure, Python and PHP in Heroku.


It seamlessly takes care of version, build, package, deployment, scaling and management of the application. A developer can just focus on the code and not worry about the nightmares of release and deployment. This is the screenshot of how this deployment would look like.

HerokuSettingPage This is the settings page where you can see the domain and application configuration for this deployment.


This is the activity and log page which shows the events log at the PaaS system being used for the application

A.) Heroku Account

1.) Create a free account at https://id.heroku.com/login without using a credit card.

B.) Heroku publish a nodsjs application

 1.) Setup Heroku toolbelt in order to access the Heroku PaaS interface. This is the client that shall be used for any communication and interaction with the Heroku system.

 2.) Get some NodeJs application that would be deployed on Heroku.
     $ git clone https://github.com/vishwakarmarhl/ngboilerplate-heroku.git

 3.) Create a heroku app and make sure there is no nested subfolder structure hiding the app.js, package.json, Procfile files in your code.
     $ heroku create --app ngbp-rhl
       Creating ngbp-rhl... done, stack is cedar
       http://ngbp-rhl.herokuapp.com/ | git@heroku.com:ngbp-rhl.git
     $ heroku config:set BUILDPACK_URL=https://github.com/heroku/heroku-buildpack-nodejs#diet -a ngbp-rhl
       Setting config vars and restarting ngbp-rhl... done, v5
       BUILDPACK_URL: https://github.com/heroku/heroku-buildpack-nodejs#diet
4.) Push the repository to heroku for deployment
     $ git push heroku master
       Fetching repository, done.
       Counting objects: 5, done.
       Delta compression using up to 8 threads.
       Compressing objects: 100% (3/3), done.
       Writing objects: 100% (4/4), 522 bytes | 0 bytes/s, done.
       Total 4 (delta 1), reused 0 (delta 0)
       -----> Node.js app detected
       PRO TIP: Specify a node version in package.json
       See https://devcenter.heroku.com/articles/nodejs-support
       -----> Defaulting to latest stable node: 0.10.32
       -----> Downloading and installing node
       -----> Exporting config vars to environment
       -----> Installing dependencies
       -----> Cleaning up node-gyp and npm artifacts
       -----> Building runtime environment
       -----> Discovering process types
       Procfile declares types -> web
       -----> Compressing... done, 5.5MB
       -----> Launching... done, v4
       http://ngbp-rhl.herokuapp.com/ deployed to Heroku
       To git@heroku.com:ngbp-rhl.git
       c47ec59..2298a1a master -> master
 5.) Scale and manage the application using the Personal App in web console
    $ heroku ps:scale web=1
      Scaling dynos... done, now running web at 1:1X.
    $ heroku open
    $ heroku logs --tail
 6.) Now finally not all of us would like to push our apps in the blackhole and wait for it to crash before debugging. 
     Here is foreman that can help you initialize the app locally and test it out. 
     Foreman comes  as a heroku toolkit bundle.

    $ foreman start web
      18:56:31 web.1  | started with pid 13824
      18:56:31 web.1  | Listening on 5000

This is just a glimpse of the simplified workflow followed in order to bring up a nodejs application on heroku. Hoever the dependencies are to be managed by the application and verified for successful deployment based on the log and management console.

Lets write some REST web services and play for the first time with python and bottle

In order to start writing the web-services lets take a look at how to setup python on windows.


GITHUB : https://github.com/vishwakarmarhl/pythonservices.git


Setup Python

Tutorial : https://pypi.python.org/pypi
1. Setup Python
     - Install Python : http://www.python.org/download/
     - Setup path variables
        export PYTHON_HOME=C:\Python27
        export PATH=$PYTHON_HOME:$PYTHON_HOME\Scripts:$PATH
     - Setup MinGW compiler for libraries
2. Install PIP
    $ python ez_setup.py
    $ python get-pip.py
3. Libraries
    a. Bottle and Flask
        pip install bottle
        pip install flask
    b. MySQLdb (http://blog.mysqlboy.com/2010/08/installing-mysqldb-python-module.html, http://sourceforge.net/projects/mysql-python/)
        Installed an executable version for windows from sourceforge
        $ easy_install MySQL-python
4. Create a database with the *.sql schema file
5. Configure with the db credentials and host name in config.cfg and execute from commandline
        $ py main.py
6.Test GET: http://localhost:8080/users/

Sample REST Service

This is where we can start discussing the web services code segments. We are using the bottle implementation for the service end points.

Considering an example for the GET service here to query all the users in the mysql database table data.user,

# Test using, curl -s http://localhost:8080/users
@route('/users', method='GET')
def getallusers():
mysql_cursor = getDbConnection()
args = 'SELECT `user_id`,`user_name`,`first_name`,`last_name`,`email`,`password`,`organization`,`enabled`,`phone` FROM `data`.`user`';
except MySQLdb.Error, e:
print "MySQL Error %d: %s" % ( e.args[0], e.args[1] )
result = json.dumps(results, default=lambda o: o.__dict__)
return result

The entry point to the code is main.py where DataService.py REST services are being imported and referenced on initialization. 

This is a sample app that can get you quickly started on the bottle based REST web service.