Tuesday, 23 December 2014

Using Elastic Search with mongodb

Eastic search(ES) is super duper fast. Once you integrate it with mongodb and start redirecting your queries their you will be able to scale much better.

But like every other thing out their ES also has a learning curve. And some times when you get stuck it is difficult to figure out whats going wrong.

We ran into some issues while working with ES so hear goes our experience and learning with ES.

I. Installations:

    A. Install mongodb : http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/

    B. Install Elastic Search: this is simple. just download the latest package of ES and extract it.

    C. Install mongo connector: pip install mongo-connector #maybe with sudo

         This application is used to pull data out of mongo and push to ES

    D. Install plugin head on ES: [Optional but very helpful plugin]
         (if elastic-search is installed then plugin may be found in this directory : /usr/share/elasticsearch/bin)

        elasticsearch/bin/plugin -install mobz/elasticsearch-head

II. Start Replication log on Mongodb:

    You dont need to actually set up replication on mongo. You just need to enable replication logs. This is because mongo connector uses mongo replication logs to read the changes coming in.

    A. edit mongodb config file
     
            sudo gedit /etc/mongod.conf

        add line

            replSet=rs0

    B. Restart mongo

            sudo service mongod restart

    C. Login to mongo

            mongo

        and run this command

           rs.initiate()

    Now the replication log is setup.

III. Start Elastic Search

    A. Goto ES directory

            cd elasticsearch/bin/
         
            or

            cd /usr/share/elasticsearch/bin/

    B. run elasticsearch

             ./elasticsearch

IV. Setup mongo Connector

    A. Open a new tab and enter the following command

            mongo-connector -m localhost:27017 -t localhost:9200 -d elastic_doc_manager --oplog-ts oplogstatus.txt

You ES is now setup. All indexes from your database will be automatically created

V. monitor all indexes [if you have installed head plugin]

        http://localhost:9200/_plugin/head/

    Index are going to be created with the following naming convention:
     
        Mongo-connector gives each MongoDB collection its own index in Elasticsearch. For example, documents from the collection kittens in the database animals will put into the animals.kittens index in Elasticsearch.

        index naming convention

VI. Querying Elastic search:

    A. You can use elastic search clients:
       
         Official list of supported clients : See Clients & Integrations

         http://www.elasticsearch.org/guide/

    B. Simple rest calls:
     
         You can simple make calls to ES over http

         ex: http://localhost:9200/animals.kittens/_search?pretty=1&q=Tom&size=2&fields=id,owner.name&sort=age

         Here may have to experiment and try out different combinations. And you have a very complicated structure then building a query may turn out to be difficult. So do it step by step.
         Partition the query into pieces and build them separately and figure a way to integrate them.

         eg: result = es.search(index="animals.kittens", body=
               {
                    "from" : 0,
                    "size" : 10,
                    "query" : {
                        "filtered" : {
                            "query": {
                                "query_string": {
                                    "query": "mouse",
                                    "fields": ["product"]
                                }
                            },
                            "filter" : {
                                "bool" : {
                                    "must" : [
                                        {
                                            "terms" : {
                                                "address.name" : ["mumbai", "pune"]
                                            }
                                        },
                                        {
                                            "range": {
                                                "cost.inr": {
                                                    "gte" : 0,
                                                    "lte" : 400
                                                }
                                            }
                                        }
                                    ],
                                    "must_not" : {
                                        "terms" : {"id" : ["43832jd0dskf09123yhjdhf012u3j"]}
                                    }
                                }
                            }
                        }
                    },
                    "sort": [
                        {
                            "rating.avg_ratting": {
                                "order": "desc"
                            }
                        }
                    ]
               })

       with the above query you can figure out what the structure of the data will be like. Play around with ES and if you get stuck then drop a comment or ask on Stack-overflow or Quora. :)

LEARNING:

I. To create a custom mapping :

    If you are planning to use ES on production for a big data set you will probably find the need to write your own mapping file. Know that mongo-connector creates the index with the name:

   http://localhost:9200/animals.kittens/

and the mapping is put under :

    http://localhost:9200/animals.kittens/string/

So to apply your own mapping you will need to do this:

    A. First clear the existing index by deleting it. [head plugin gifs you the feature of make a Delete curl call at the index'

    B. Create a new index

            curl -XPUT 'http://localhost:9200/animals.kittens/'

    C. Apply the mapping file

            curl -XPUT 'http://localhost:9200/animals.kittens/string/_mapping/' -d '
              {
                  "string": {
                      "properties": {
                          "owner": {
                              "properties": {
                                  "name": {
                                      "type": "string"
                                  },
                                  "address": {
                                      "type": "string"
                                  },
                                  "tell": {
                                      "type": "string"
                                  }
                              }
                          },
                          "breed": {
                              "type": "string"
                          },
                          "id": {
                              "type": "string",
                              "index": "not_analyzed"
                          }
                      }
                  }
              }'

         Not analyzed is very import for fields which might contain spaces or special characters like "#$@" etc..

II. DB-structure limitation:

A. To efficiently use ES you have to ensure a few consistencies in your database.

all top level properties must be present in all documents:

eg:

            [
                {
                    "name": "Vikash",
                    "dob": "23-11-1989",
                    "address": "Indiranagar, Bangalore"
                },
                {
                    "name": "Viki",
                    "dob": "22-10-1990"
                }
            ]

This is a bad idea because the absence of field address in the 2nd document may cause issues.

B. If you don't have a value for a particular field replace it will null but create the field.

                {
                    "name": "Viki",
                    "dob": "22-10-1990",
                    "address": null
                }

C. This is more a json rule than a db schema rule.

  values in an array should be of the same type and i dont mean just data type i mean in terms of meaning.

eg: if you need to store a expiration data in year and months separately

        "expiration_time":
        [
            1, //year
            6 //months
        ]

you should rather store it as a map/dictionary as
         "expiration_time":
        {
            "years" : 1,
            "months" :5
        }

D. Another json rule:

    keys that you use to store the dictionaries should not be generated randomly. Also if possible avoid rule based key generation. This will help you keep your database more structured.

     eg:

        {
            "january":
            {
                "sum": value
            },
            "december":
            {
                "sum": value
            }
        }
        // if you are not going to have all 12 months avg stored in there then i suggest use month as a paramet in the data
     
        {
            "aggregated_data":
            [
                {
                    "month": 1,
                    "sum": value
     
                },
                {
                    "month":12,
                    "sum": value
                }
            ]
        }

 

Here it will be much easier to select the data from the min month then it will be in the 1st case. Also you can sort by month much easily here.

To conclude: ES is really really really fast. I am in love with its speed. Hope you enjoy working with it too.

Bye for now.

Sunday, 9 February 2014

Running Android application on device from eclipse in Ubuntu/Linux

While for windows this process might be as simple as a click on the run button with debugging mode enabled on the device, with Ubuntu it needs a little configuration changes.

So, Here is how you can do it with Ubuntu/Linux:-

Step 1: Setup eclipse with Android development kit.

Download from here developer.android.com

Extract the zip file.

cd to the extracted drive and run eclipse.

Step 2: Create and android application.

how to do that : first android app

Step 3: connect your android device over USB, And enable USB debugging option (under Settings -> Developer options)

Step 4: Run the application from the run button on top.


if you get a pop up box like this, that means that eclipse is unable to access the USB device.

Step 5: Enable access to the USB device.

open terminal and run this command:
    lsusb

Bus 002 Device 001: ID 1abc:0abc Linux Foundation 2.0 root hub
Bus 001 Device 004: ID 1abc:0abc Ricoh Co., Ltd
Bus 001 Device 005: ID 1abc:0abc Foxconn / Hon Hai
Bus 001 Device 002: ID 1abc:0abc Intel Corp. Integrated Rate Matching Hub
Bus 001 Device 001: ID 1abc:0abc Linux Foundation 2.0 root hub
Bus 004 Device 001: ID 1abc:0abc Linux Foundation 3.0 root hub
Bus 003 Device 010: ID 0bb4:0cac HTC (High Tech Computer Corp.) 
Bus 003 Device 002: ID 1abc:0abc Lite-On Technology Corp.
Bus 003 Device 001: ID 1abc:0abc Linux Foundation 2.0 root hub
 
 
Here the vendor id is 0bb4, and the product id is : 0cac

Now open file

    sudo vim /etc/udev/rules.d/51-android.rules
 
And add the following lines:

    # fastboot protocol on manta (HTC Incredible S)
    SUBSYSTEM=="usb", ATTR{idVendor}=="<vendor id>", ATTR{idProduct}=="<product id>", MODE="0600", OWNER="viki"

Unplug the device and re-plug it.

Step6: Open Eclipse and re-run the application.
It should open up in the device.

And questions or suggestions please leave them in the comments below.

---
Thanks
viki

Sunday, 5 January 2014

Locating mysql sock file

Locating mysqld.sock file :

Its a simple step but may give you a hard time when you are a beginner.

Simply open : vim /etc/mysql/my.cnf

#Content of the file---

[client]
port            = 3306
socket          = /var/run/mysqld/mysqld.sock

here is the default port and default sock file

==========

I found the need for this file when i had to connect to mysql db from my rails app

to configure database.yml 

development:
    adapter: mysql
    encoding: utf8
    database: application
    username: appuser
    password: password
    socket: /var/run/mysqld/mysqld.sock

Hope you found what you were looking for
--Cheers