'Elasticsearch Token Position change

recently I am taking interest in Elasticsearch analyzer.I understand what is token graph,start_offset,end_offset,position and positionLength.

Index schema

PUT synonym_graph_index
{
"settings": {
  "number_of_replicas": 0,
  "analysis": {
    "analyzer": {
      "synonym_graph_analyzer":{
        "type":"custom",
        "tokenizer":"standard",
        "filter":["synonym_filter"]
      }
    },
    "filter": {
      "synonym_filter":
      {
        "type":"synonym_graph",
        "synonyms":["wi fi => wifi,hotspot,fast network"]
      }
    }
  }
}, 
"mappings": { 
  "properties": {
    "text_field": {
      "type": "text",
     "analyzer": "synonym_graph_analyzer"
    }
  }
}
}

I add a document in it.

POST synonym_graph_index/_analyze
{
  "analyzer": "synonym_graph_analyzer"
  , "text": "Airtel wi fi is up and down"
}

Result of analysis

{
  "tokens" : [
    {
      "token" : "Airtel",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "wifi",
      "start_offset" : 7,
      "end_offset" : 12,
      "type" : "SYNONYM",
      "position" : 1,
      "positionLength" : 2
    },
    {
      "token" : "hotspot",
      "start_offset" : 7,
      "end_offset" : 12,
      "type" : "SYNONYM",
      "position" : 1,
      "positionLength" : 2
    },
    {
      "token" : "fast",
      "start_offset" : 7,
      "end_offset" : 12,
      "type" : "SYNONYM",
      "position" : 1
    },
    {
      "token" : "network",
      "start_offset" : 7,
      "end_offset" : 12,
      "type" : "SYNONYM",
      "position" : 2
    },
    {
      "token" : "is",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "up",
      "start_offset" : 16,
      "end_offset" : 18,
      "type" : "<ALPHANUM>",
      "position" : 4
    },
    {
      "token" : "and",
      "start_offset" : 19,
      "end_offset" : 22,
      "type" : "<ALPHANUM>",
      "position" : 5
    },
    {
      "token" : "down",
      "start_offset" : 23,
      "end_offset" : 27,
      "type" : "<ALPHANUM>",
      "position" : 6
    }
  ]
}

to understand better i made table. test

By using above table i made the graph also. graph

the network token has change its position.Did it happen because i used standard tokenizer and it split fast network.And one more thing i would like to know that in some case positionlength is not mention.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source