Sunday, November 17, 2013

ElasticSearch Getting Started Tutorial with working example

Introduction

ElasticSearch is an open-source and distributed search engine which is very much scalable and support good amount of enterprise Search use cases. It's build on top of Lucene(just like Apache Solr4). It support realtime time indexing and full text search. You can read about Elastic Search at:
http://www.elasticsearch.org/
It exposes a Java and an HTTP API, which can be used for indexing, searching and most of the configuration.

The very reason for writing this blog about ElasticSearch is that http://www.elasticsearch.org/ is more of a reference type and the there are no good quality and complete tutorial available. I had some struggle in terms of making it up and running beyond a basic hello world program. I am sharing my experiences so that it can save some time for audiences who would like to try out ElasticSearch(which is a very powerful suite of product).At the end of this tutorial - you will be having a very basic Elastic Search tutorial - up and running. I will be sharing the link from my PC.

So let's get started.

1) I am assuming that you Java already installed.
2) Download ElasticSearch from http://www.elasticsearch.org/download/. Again a lot of talks about using it in Linux and other non-windows environment, but i will be focusing more into Windows 7 Desktop environment. Please choose the installation package accordingly. For Windows - it's a Zip file - one can extract it into C:\elasticsearch-0.90.3\. Remember it's very much like installing Eclipse IDE.
3) I am new to Curl and cygwin and i wanted to cut short the time frame to learn it(as most of the command references on ElasticSearch.org are for non-Windows platform. You can install Curl from http://curl.haxx.se/download.html and cygwin from http://cygwin.com/install.html

Now let's test what we have upto now.
1) In Windows7 Desktop Environment, start command line and cd C:\elasticsearch-0.90.3\bin
2)Now execute elasticsearch.bat
This will start one of the ElasticSearch nodes on the localhost. You will see the logs somewhat like this
(Please don't worry if is it slightly different in your case as i have some plugins of Elastic Search and my node names etc...is going to be different that of yours)







3) Now test it in your browser



















If you get status 200 it means - everything is fine...Isn't it simple?
Let's look at each field of the JSON and see what it's about:

  • Ok: when it's true, it means that the request was successful.
  • Status: the HTTP error code that resulted from the request. 200 means OK.
  • Name: the name of our Elasticsearch instance. By default, it picks a random name from a huge list of names.
  • Version: The object here has a number field, which is the version of Elasticsearch you're currently running, and a snapshot_build field, which indicates if what you're running has been built from sources.
  • Tagline: this contains the first tagline of Elasticsearch: “You Know, for Search.”

4) Now let's install one of the elasticsearch plugin viz. ElasticSearch Head from http://mobz.github.io/elasticsearch-head/
It's very simple to install this plug-in.
cd C:\elasticsearch-0.90.3\bin
plugin -install mobz/elasticsearch-head
This will install elasticsearch-head plugin into your environment


Tutorial Sample

We will be developing a very simple application-Employees within a department so that we can focus on the functionality more than example complexity after all this blog is to help somebody jump start with ElasticSearch.
1) Now open up your cygwin window and enter this command
curl -XPUT 'http://localhost:9200/dept/employee/31' -d '{ "empname": "emp31"}'
where dept is a index and index type is employee and we are inputting 31st id for this index type.

You should see something like this in cygwin window






Go thru this output
========================================================================
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    91  100    70  100    21    448    134 --:--:-- --:--:-- --:--:--   500{"ok":true,"_index":"dept","_type":"employee","_id":"31","_version":1}
========================================================================
Following the above command - please enter some more records:
curl -XPUT 'http://localhost:9200/dept/employee/1' -d '{ "empname": "emp1"}'
curl -XPUT 'http://localhost:9200/dept/employee/2' -d '{ "empname": "emp2"}'
...
...
curl -XPUT 'http://localhost:9200/dept/employee/30' -d '{ "empname": "emp30"}'
Note : you need to increment the index counter and value of empname within the curly braces.

Once this is done - you have entered enough data for ElasticSearch and you can start searching your data using plugin head.


Let's test it out

Please type this into the browser:
http://localhost:9200/_plugin/head/
It should look something like this:



















This is the overview which talks about cluster health and various indexes. Our newly created index is also getting displayed as dept

Now click on the Structured Query tab



















In the Search dropdown - select "dept" and then click on "Search" button.
This will displays all the records.

To do a search for specific items
let's say searching for emp1, emp25 and emp7 keep on clicking on the right most "+" to add more into the search criteria as shown in the figure and then click on "Search". Please make sure leftmost is "should" and other search criteria items are as mentioned in the diagram.







Now you can play with this plugin and may be work on your search projects.

You can try out this working application running on my Desktop at:

Please get back to me in case of any questions or concerns.
Hope this can get you started on a very good Opensource Enterprise Search Product viz. "ElasticSearch".