Elasticsearch - Advanced settings and Tweaks

Sep, 2014

Now that we have Elasticsearch installed and confirmed working, we can start looking into more advanced settings, more of tweaking, to improve Elasticsearch performance. For most use cases, following three area's of Elasticsearch configuration needs to be addressed:

Memory configuration
Threadpool configuration
Data Store configuration

Memory configuration:

By default Elasticsearch assigns the minimum heap size of 256MB and 1GB maximum heap size. But in real world server environments with many gb in memory availablity, it;s always good to provide 50% of the server memory as a rule of thumb to Elasticsearch process. This setting can be set using: $ export ES_HEAP_SIZE=2048m But providing the heap size is just not enough as the memory can be swapped out by the OS. To prevent this we need to lock the process address space assigned to Elasticsearch. This can be done by adding the following line to elasticsearch.yml file and restarting elasticsearch:


    #elasticsearch.yml
bootstrap.mlockall: true

After starting Elasticsearch, you can see whether this setting was applied successfully by checking the value of mlockall in the output from this request:


                    $curl http://localhost:9200/_nodes/process?pretty
{
  "nodes": {
    "Oaa7jVWNSeyHlmhyYCr32g": {
      "name": "Mister Fear",
      "transport_address": "inet[/127.0.0.1:9300]",
      "host": "ip-127-0-0-1",
      "ip": "127.0.0.1",
      "version": "1.3.2",
      "build": "dee175d",
      "http_address": "inet[/127.0.0.1:9200]",
      "attributes": {
        "master": "true"
      },
      "process": {
        "refresh_interval_in_millis": 1000,
        "id": 3599,
        "max_file_descriptors": 100000,
        "mlockall": false
      }
    }
  }
}

But the mlockall is false. If you see that mlockall is false, then it means that the the mlockall request has failed. The most probable reason is that the user running Elasticsearch doesn't have permission to lock memory. This can be granted by running ulimit -l unlimited as root before starting Elasticsearch.
Note that you will always have to run ulimit -l unlimited before elasticsearch restart or else mlockall is set back to false, this is probably because the the User ESprocess is running on is not root

Threadpool Configuration:

Elasticsearch can holds several thread pools with a queue bound to each of these pools which allow pending requests to be held instead of discarded. For example, by default for index operation, it has a fixed thread pool size of # no of processors in the system and a queue_size of 200. So if there are more than 200 requests, the new requests are discarded and following exception is returned back to the client: EsRejectedExecutionException[rejected execution (queue capacity 200)..]
To overcome this limitation and increase the concurrency of elasticsearch processing messages, following setting are be tweaked:


    #elasticsearch.yml
#for search operation
threadpool.search.type: fixed
threadpool.search.size: 50
threadpool.search.queue_size: 200

#for bulk operations
threadpool.bulk.type: fixed
threadpool.bulk.size: 10
threadpool.bulk.queue_size: 100

#for indexing operations
threadpool.index.type: fixed
threadpool.index.size: 60
threadpool.index.queue_size: 1000

So if the use cases if primarily for searching i.e. more search operations than indexing operations, the threadpool for search can be increased and the threadpool for indexing can be much lower. Though queuing up thousands of messages is probably not a wise decision, so tweak responsibly. More information about threadpool size and configuration can be found at Elasticsearch Threadpool

ES by default assumes that you're going to use it mostly for searching and querying, so it allocates 90% of its allocated total HEAP memory for searching. This can be changed with the following settings. Note that implication of this setting can be significant as you are reducing the memory allocated for search purposes!


    #elasticsearch.yml
indices.memory.index_buffer_size: 30%
#above settings grants ES 30% of it's heap memory for index buffer purpose

More at Indices Module

Store and indices Configuration:

The store module allows you to control how index data is stored. The index can either be stored in-memory (no persistence) or on-disk (the default). Unless your data is temporary data using in-memory store is a bad idea as you will loose the data upon restart. For disk based storage, we need to have fast disk seeks if the data to be looked up is not in memory. The most optimal way is to use mmap fs which is basically memory mapped files.


    #elasticsearch.yml
index.store.type: mmapfs

More information regarding storage options can be found at Elasticsearch Store