Elasticsearch Analyzer - Lowercase And Whitespace Tokenizer
How can I create a mapping that will tokenize the string on whitespace and also change it to lowercase for indexing? This is my current mapping that tokenizes by whitespace by I ca
Solution 1:
i managed to write a custom analyzer and this works...
"settings":{"analysis":{"analyzer":{"lowercasespaceanalyzer":{"type":"custom","tokenizer":"whitespace","filter":["lowercase"]}}}},"mappings":{"my_type":{"properties":{"title":{"type":"string","analyzer":"lowercasespaceanalyzer","tokenizer":"whitespace","search_analyzer":"whitespace","filter":["lowercase"]}}}}
Solution 2:
You have two options -
Simple Analyser
the simple analyser will probably meet your needs:
curl -XGET 'localhost:9200/myindex/_analyze?analyzer=simple&pretty' -d 'Some DATA'
{
"tokens" : [ {
"token" : "some",
"start_offset" : 0,
"end_offset" : 4,
"type" : "word",
"position" : 1
}, {
"token" : "data",
"start_offset" : 5,
"end_offset" : 9,
"type" : "word",
"position" : 2
} ]
}
To use the simple analyser in your mapping:
{"mappings":{"my_type":{"properties":{"title":{"type":"string","analyzer":"simple"}}}}}
Custom Analyser
Second option is to define your own custom analyser and specify how to tokenise and filter the data. Then refer to this new analyser in your mapping.
Post a Comment for "Elasticsearch Analyzer - Lowercase And Whitespace Tokenizer"