Skip to content Skip to sidebar Skip to footer

Elasticsearch Analyzer - Lowercase And Whitespace Tokenizer

How can I create a mapping that will tokenize the string on whitespace and also change it to lowercase for indexing? This is my current mapping that tokenizes by whitespace by I ca

Solution 1:

i managed to write a custom analyzer and this works...

"settings":{"analysis":{"analyzer":{"lowercasespaceanalyzer":{"type":"custom","tokenizer":"whitespace","filter":["lowercase"]}}}},"mappings":{"my_type":{"properties":{"title":{"type":"string","analyzer":"lowercasespaceanalyzer","tokenizer":"whitespace","search_analyzer":"whitespace","filter":["lowercase"]}}}}

Solution 2:

You have two options -

Simple Analyser

the simple analyser will probably meet your needs:

curl -XGET 'localhost:9200/myindex/_analyze?analyzer=simple&pretty' -d 'Some DATA' 
{
  "tokens" : [ {
    "token" : "some",
    "start_offset" : 0,
    "end_offset" : 4,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "data",
    "start_offset" : 5,
    "end_offset" : 9,
    "type" : "word",
    "position" : 2
  } ]
}

To use the simple analyser in your mapping:

{"mappings":{"my_type":{"properties":{"title":{"type":"string","analyzer":"simple"}}}}}

Custom Analyser

Second option is to define your own custom analyser and specify how to tokenise and filter the data. Then refer to this new analyser in your mapping.

Post a Comment for "Elasticsearch Analyzer - Lowercase And Whitespace Tokenizer"