Elasticsearch Analyzer - Lowercase And Whitespace Tokenizer

November 16, 2024 Post a Comment

How can I create a mapping that will tokenize the string on whitespace and also change it to lowercase for indexing? This is my current mapping that tokenizes by whitespace by I ca

Solution 1:

i managed to write a custom analyzer and this works...

"settings":{"analysis":{"analyzer":{"lowercasespaceanalyzer":{"type":"custom","tokenizer":"whitespace","filter":["lowercase"]}}}},"mappings":{"my_type":{"properties":{"title":{"type":"string","analyzer":"lowercasespaceanalyzer","tokenizer":"whitespace","search_analyzer":"whitespace","filter":["lowercase"]}}}}

Solution 2:

You have two options -

Simple Analyser

the simple analyser will probably meet your needs:

curl -XGET 'localhost:9200/myindex/_analyze?analyzer=simple&pretty' -d 'Some DATA' 
{
  "tokens" : [ {
    "token" : "some",
    "start_offset" : 0,
    "end_offset" : 4,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "data",
    "start_offset" : 5,
    "end_offset" : 9,
    "type" : "word",
    "position" : 2
  } ]
}

To use the simple analyser in your mapping:

{"mappings":{"my_type":{"properties":{"title":{"type":"string","analyzer":"simple"}}}}}

Custom Analyser

Second option is to define your own custom analyser and specify how to tokenise and filter the data. Then refer to this new analyser in your mapping.

javascript edu

Elasticsearch Analyzer - Lowercase And Whitespace Tokenizer

Solution 1:

Solution 2:

Simple Analyser

Custom Analyser

Post a Comment for "Elasticsearch Analyzer - Lowercase And Whitespace Tokenizer"