'Elasticsearch fielddata - should I use it?

Given an index with documents that have a brand property, we need to create a term aggregation that is case insensitive.

Index definition

Please note that the use of fielddata

PUT demo_products
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "product": {
      "properties": {
        "brand": {
          "type": "text",
          "analyzer": "my_custom_analyzer",
          "fielddata": true,
        }
      }
    }
  }
}

Data

POST demo_products/product
{
  "brand": "New York Jets"
}

POST demo_products/product
{
  "brand": "new york jets"
}

POST demo_products/product
{
  "brand": "Washington Redskins"
}

Query

GET demo_products/product/_search
{
  "size": 0,
  "aggs": {
    "brand_facet": {
      "terms": {
        "field": "brand"
      }
    }
  }
}

Result

"aggregations": {
    "brand_facet": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "new york jets",
          "doc_count": 2
        },
        {
          "key": "washington redskins",
          "doc_count": 1
        }
      ]
    }
  }

If we use keyword instead of text we end up the 2 buckets for New York Jets because of the differences in casing.

We're concerned about the performance implications by using fielddata. However if fielddata is disabled we get the dreaded "Fielddata is disabled on text fields by default."

Any other tips to resolve this - or should we not be so concerned about fielddate?

Solution 1:^[1]

You can lowercase the aggregation at query time if you use a script. It won't perform as well as a normalized keyword field, but is still quite fast in my experience. For example, your query would be:

GET demo_products/product/_search
{
  "size": 0,
  "aggs": {
    "brand_facet": {
      "terms": {
        "script": "doc['brand'].value.toLowerCase()"
      }
    }
  }
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	ShibbyUK

'Elasticsearch fielddata - should I use it?

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]