Product Engineer, CTO & a Beer Enthusiast
Experiments, thoughts and scripts documented for posterity.
Sep, 2014
Data in Elasticsearch can be indexed without providing any information about it's content as ES accepts dynamic properties and ES detects if the property value is a string, integer, datetime, boolean etc. In this article, lets work on getting dynamic mapping setup the right way along with some commonly performed search operations.
$ curl -XPOST http://localhost:9200/keywords/keyword/61669 -d
'{
"keywordId": 61669,
"keywordText": "Massaging",
"keywordType": "Submitted"
}'
Given the above Json blob and indexing into the elasticsearch would result in the following mapping:
{
"keywords": {
"mappings": {
"keyword": {
"properties": {
"keywordId": {
"type": "long"
},
"keywordText": {
"type": "string"
},
"keywordType": {
"type": "string"
}
}
}
}
}
}
This is great that Elasticsearch automatically detected id to be long and text & type to be a string. But if you look carefully, the keywordText and KeywordType are set to default type of "analyzed". This means that those two fields are now available for partial text search. But I want keywordType to be "not_analyzed" as users would never partial text search it. To overcome this but preserve the dynamic nature of this index, we can create a Keywords Index with mapping provided for certain fields:
$ curl -XPUT http://localhost:9200/keywords -d
'{
"mappings": {
"keyword": {
"dynamic": "true",
"properties": {
"keywordType": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}'
As you can see from above, we have set dynamic to "true" but let the index know that if any field that matches "keywordType" to use a specific mapping rather instead of ES figuring it out for us.
$ curl -XPUT http://localhost:9200/keywords -d
'{
"settings": {
"index": {
"analysis": {
"analyzer": {
"analyzer_keyword": {
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
}
},
"mappings": {
"keyword": {
"dynamic": "true",
"properties": {
"keywordType": {
"type": "string",
"analyzer": "analyzer_keyword"
}
}
}
}
}'
We are basically using the "Keyword tokenizer" that Elasticsearch provides that makes it exact match search and "filter" of lowercase which automatically converts the input to lower case. More info on at Elasticsearch tokensizers
#mapping configuration from above
"mappings": {
"keyword": {
"_ttl" : { "enabled" : true, "default" : "5d" },
"dynamic": "true",
"_all": {
"enabled": false
},
...
}
OK great, now that _all field search is disabled but now since dynamic is turned on which means any new fields can automagically be indexed, I don't want elasticsearch to index any binary blob as it would consume too much memory; but rather just store it and not index it. For this, the updated mapping would look like:
#mapping configuration from above
...
"properties": {
"keywordType": {
"type": "string",
"analyzer": "analyzer_keyword"
},
"blob": {
"type": "string",
"enabled": false
}
}
...
Setting "enabled: false" lets elasticsearch know that this field should not be indexed for search purposes but would be part of the document result. So basically it's stored but not searchable.
#mapping configuration from above
...
"mappings": {
"keyword": {
"dynamic": "true",
"date_detection": false,
"dynamic_templates": [
{
"date_index": {
"mapping": {
"type": "date"
},
"match": ".*Date|date",
"match_pattern": "regex"
}
}
]
...
So basically, if any property that has either "date" or "Date" at it's ending then assume it's a DateTime object. For example "createDate" or "updateDate" would match the above template. Also as you may notice, "date_detection" is set to false.
#mapping configuration from above
...
"dynamic_templates": [
{
"date_index": {
"mapping": {
"type": "date"
},
"match": ".*Date|date",
"match_pattern": "regex"
}
},
{
"string_index": {
"mapping": {
"analyzer": "analyzer_keyword",
"type": "string"
},
"match": "*",
"match_mapping_type": "string"
}
}
]
...
So providing dynamic templates when the properties are unknown helps a lot and not have every single field "analyzed" which takes up too memory and extra processing time. The memory consumption analysis will be for another blog post.
#/etc/elasticsearch/templates/keywords_template.json
{
"keywords_template": {
"template": "keywords",
"order": 0,
"settings": {
"index.number_of_shards": 7,
"index.number_of_replicas": 1
},
"mappings": {
"keyword": {
"dynamic": "true",
"dynamic_templates": [
{
"disable_string_index": {
"mapping": {
"type": "string",
"index": "not_analyzed",
"enabled": false
},
"match": "*",
"match_mapping_type": "string"
}
}
],
"_all": {
"enabled": false
}
}
}
}
}