//Karthik Srinivasan

Product Engineer, CTO & a Beer Enthusiast
Experiments, thoughts and scripts documented for posterity.

Quirky Personal Projects

LinkedIn

Email me

Elasticsearch - Zen, AWS Cluster Setup

Sep, 2014

In a cluster environment, multiple elasticsearch nodes/servers join to form a cluster where the shards are distributed and replicated among these servers but to the outside world it is presented as a single system. For elasticsearch to connect to different nodes, ES provides two discovery methods. One being Zen discovery and other being cloud based discovery via plugins for Azure, AWS and Google Compute engine.

Zen Discovery

#elasticsearch.yml
# Cluster name
cluster.name: es-test-cluster

#discovery
discovery.type: zen

# Minimum nodes alive to constitute an operational cluster
discovery.zen.minimum_master_nodes: 2

# Unicast Discovery
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [ "192.168.1.1", "192.168.1.2", "192.168.1.3" ]

#failure detection
discovery.zen.ping.fd.ping_interval: 60s
discovery.zen.ping.fd.ping_timeout: 60s
discovery.zen.ping.fd.ping_retries: 10
                
From the above snippet, it's pretty straightforward to understand that the discover.type is "zen" and the minimum number of nodes required to form a cluster is "2" and using "unicast" to find other hosts and provide some sort of recovery mechanism (fault detection) if a server went offline or some network problems. This is probably all that Zen discovery has to offer, simple and easy! More info at Zen discovery

AWS/EC2 Discovery

For EC2 discovery, we first need to install the cloud-aws plugin if not already installed
$ /usr/share/elasticsearch/bin/plugin -install elasticsearch/elasticsearch-cloud-aws/2.3.0
            
#elasticsearch.yml
# Cluster name
cluster.name: es-test-cluster

#discovery type
discovery.type: ec2

#optional -Region setting to discover nodes
cloud.aws.region: us-west-2

#optional - Security groups
discovery.ec2.groups: sg-xxxxxxx

#optional - to store aws attributes to the nodes - for node awareness
cloud.node.auto_attributes: true

#If not using IAM roles
cloud.aws.access_key: xxxxxxxxxxxxx
cloud.aws.secret_key: xxxxxxxxxxxxxxxxx
            
From the above config, the discovery type is ec2 and optionally given a region for the plugin to discover other nodes and security group. If there is no IAM role associated with the server, then AWS secret_key and access_key needs to be provided in-order for the plugin to query AWS for node information.

Having the node.auto_attributes set to true would add aws_availability_zone to the node attributes properties which helps in node awareness. What this means is that, given an index with replication factor of 1, ES uses this attribute to determine which node this particular shard is sitting on but makes sure the replicated shard is on a different box. More info at Shard Awareness
#running the following should provide you with region information in the attributes section
$ curl http://{ec2-esurl}/_nodes/process?pretty
{
  "cluster_name": "es-cluster-test",
  "nodes": {
    "Oaa7jVWNSeyHlmhyYCr32g": {
      "name": "Mister Fear",
      ...
      "attributes": {
        "aws_availability_zone": "us-west-2a",
        "master": "true"
      }
    },
    "QibV2okLRyuH0ti6bxvANA": {
      "name": "Blackheath",
      ...
      "attributes": {
        "aws_availability_zone": "us-west-2b",
        "master": "true"
      }
    },
    "bEL8b2-lRZ2Cif1nM4WmIQ": {
      "name": "Sayge",
      ...
      "attributes": {
        "aws_availability_zone": "us-west-2a",
        "master": "true"
      }
    }
  }
}
             
We can make the Elasticsearch node discovery a little bit faster by filtering the number of servers it needs to ping during the discovery process. This filter can achieved by using the ec2.tag if they are assigned to EC2 servers. In a enterprise environment where there are 100's of ec2 servers deployed on AWS, pinging every single one of them would take a very long time, this should help speed things up.
#elasticsearch.yml
discovery.ec2.tag.env: prod
More information for this EC2 discovery plugin at cloud-aws and various discovery at elasticsearch-dsicovery