'Elasticsearch shows match with special character with only .raw

I started working on Elasticsearch few days back and I created some analyzers and mappings and have successfully inserted some data in it. The problem occurs when I try to query the data which has some special characters in it. Initially I was using standard analyzer, but after reading about some more options, I settled on whitespace because that tokenizes special characters as well. However, I still cannot query the data. But, if I put field.raw (where field is the actual property of the object), I get the results that I need. But, .raw bypasses the analyzers and I'm wondering whether it might defeat the purpose of it all. Since whitespace didn't work for me, I reverted to the standard one.

Here's the analyzer I built:

PUT demoindex
{
  "settings": {
    "analysis": {
      "filter": {
        "ngram": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 20,
          "token_chars": [
            "letter",
            "digit"
          ]
        },
        "splcharfilter": {
          "type": "pattern_capture",
          "preserve_original": true,
          "patterns": [
            "([?/-])"
          ]
        }
      },
      "analyzer": {
        "my_field_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "ngram",
            "splcharfilter"
          ]
        }
      }
    }
  }
}

Mapping I built:

PUT demoindex/_mapping
{
  "properties": {
    "name": {
      "type": "text",
      "analyzer": "my_field_analyzer",
      "search_analyzer": "simple",
      "fields": {
        "raw": {
          "type": "keyword"
        }
      }
    },
    "area": {
      "type": "text",
      "analyzer": "my_field_analyzer",
      "search_analyzer": "simple",
      "fields": {
        "raw": {
          "type": "keyword"
        }
      }
    }
  }
}

Query that doesn't work:

GET /demoindex/_search?pretty
{
  "from": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "area": {
              "value": "is - application"
            }
          }
        },
        {
          "term": {
            "name": {
              "value": "hem"
            }
          }
        }
      ]
    }
  },
  "size": 15
}

Query that WORKS:

GET /demoindex/_search?pretty
{
  "from": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "area.raw": {
              "value": "is - application"
            }
          }
        },
        {
          "term": {
            "name": {
              "value": "hem"
            }
          }
        }
      ]
    }
  },
  "size": 15
}

As you can notice, I had to use area.raw for it to match the content and return the document. Since name shouldn't have any of the special characters, it should be fine without .raw, but the other fields will have the special characters which might not be limited to -.

So, could someone please point out what I've done wrong or what I'm interpreting wrong? Or is there a better way to achieve this?

P.S: Version info

Elasticsearch : 7.10.1

Lucene : 8.7.0



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source