'Elasticsearch: How to aggregate from nested documents up to each main document with nested filtering?
Is there a syntax to aggregate nested documents up to each main document that respects nested filtering? For example, how would I get back the youngest player for each team that has at least 2 years of experience in the example below? I'm using Elasticsearch 7.10.
PUT sample
{
"mappings": {
"dynamic": "strict",
"properties": {
"teamId": { "type": "keyword", "index": true, "doc_values": true },
"members": {
"type": "nested",
"properties": {
"memberId": { "type": "keyword", "index": true, "doc_values": true },
"age": { "type": "integer", "index": true, "doc_values": true },
"experience": { "type": "integer", "index": true, "doc_values": true }
}
}
}
}
}
PUT sample/_doc/1
{
"teamId" : "A"
, "members" :
[
{ "memberId" : "A1" , "age" : "11" , "experience" : "1" }
, { "memberId" : "A2" , "age" : "21" , "experience" : "2" }
, { "memberId" : "A3" , "age" : "31" , "experience" : "3" }
]
}
PUT sample/_doc/2
{
"teamId" : "B"
, "members" :
[
{ "memberId" : "B1" , "age" : "12" , "experience" : "1" }
, { "memberId" : "B2" , "age" : "22" , "experience" : "2" }
, { "memberId" : "B3" , "age" : "32" , "experience" : "3" }
, { "memberId" : "B4" , "age" : "42" , "experience" : "4" }
]
}
PUT sample/_doc/3
{
"teamId" : "C"
, "members" :
[
{ "memberId" : "C1" , "age" : "13" , "experience" : "1" }
, { "memberId" : "C2" , "age" : "23" , "experience" : "2" }
, { "memberId" : "C3" , "age" : "33" , "experience" : "3" }
]
}
I can get it to work without filtering as follows:
POST sample/_search?filter_path=aggregations.teams.buckets.key,aggregations.teams.buckets.members.*.value,aggregations.teams.buckets.members.*.hits.hits._source.*
{
"aggs": {
"teams": {
"terms": {
"field": "teamId",
"size": 10
},
"aggs": {
"members": {
"nested": {
"path": "members"
} ,
"aggs": {
"min_age": { "min": { "field": "members.age" } }
, "max_age": { "max": { "field": "members.age" } }
}
}
}
}
}
}
But when I add filtering as shown below, I can get the inner_hits to filter correctly, but the aggs does not consider it. Is there a way to get aggs to consider the nested filter?
POST sample/_search?filter_path=aggregations.teams.buckets.key,aggregations.teams.buckets.members.*.value,aggregations.teams.buckets.members.*.hits.hits._source.*,hits.hits.inner_hits.members.hits.hits._source.*
{
"aggs": {
"teams": {
"terms": {
"field": "teamId",
"size": 10
},
"aggs": {
"members": {
"nested": {
"path": "members"
} ,
"aggs": {
"min_age": { "min": { "field": "members.age" } }
, "max_age": { "max": { "field": "members.age" } }
}
}
}
}
}
, "query": {
"bool": {
"filter": [
{ "nested": {
"path": "members",
"query": {
"bool": {
"must": [
{ "range": { "members.experience": { "gte": 3 } } }
]
}
}
, "inner_hits" : { "size" : 100 , "sort" : [ { "members.memberId" : { "order" : "asc" } } ] }
}
}
]
}
}
}
Output from above:
{
"hits" : {
"hits" : [
{
"inner_hits" : {
"members" : {
"hits" : {
"hits" : [
{
"_source" : {
"experience" : "3",
"age" : "33",
"memberId" : "C3"
}
}
]
}
}
}
},
{
"inner_hits" : {
"members" : {
"hits" : {
"hits" : [
{
"_source" : {
"experience" : "3",
"age" : "32",
"memberId" : "B3"
}
},
{
"_source" : {
"experience" : "4",
"age" : "42",
"memberId" : "B4"
}
}
]
}
}
}
},
{
"inner_hits" : {
"members" : {
"hits" : {
"hits" : [
{
"_source" : {
"experience" : "3",
"age" : "31",
"memberId" : "A3"
}
}
]
}
}
}
}
]
},
"aggregations" : {
"teams" : {
"buckets" : [
{
"key" : "A",
"members" : {
"max_age" : {
"value" : 31.0
},
"min_age" : {
"value" : 11.0
}
}
},
{
"key" : "B",
"members" : {
"max_age" : {
"value" : 42.0
},
"min_age" : {
"value" : 12.0
}
}
},
{
"key" : "C",
"members" : {
"max_age" : {
"value" : 33.0
},
"min_age" : {
"value" : 13.0
}
}
}
]
}
}
}
Solution 1:[1]
When you use nested filters in the query, elasticsearch filters main documents and aggregates them but it does not filter nested documents while aggregation. You should add same filter into the aggregation. This query should work for you.
{
"aggs": {
"teams": {
"terms": {
"field": "teamId",
"size": 10
},
"aggs": {
"members": {
"nested": {
"path": "members"
},
"aggs": {
"filtered": {
"filter": {
"range": {
"members.experience": {
"gte": 3
}
}
},
"aggs": {
"min_age": {
"min": {
"field": "members.age"
}
},
"max_age": {
"max": {
"field": "members.age"
}
}
}
}
}
}
}
}
},
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "members",
"query": {
"bool": {
"must": [
{
"range": {
"members.experience": {
"gte": 3
}
}
}
]
}
},
"inner_hits": {
"size": 100,
"sort": [
{
"members.memberId": {
"order": "asc"
}
}
]
}
}
}
]
}
}
}
Solution 2:[2]
YD9's answer above is perfect for the question as asked, but I found that when I needed to add more nested criteria, I had to modify it slightly. Below is the syntax that will allow you to do that. It requires changing the "filter" pattern in YD9's query to a "filter/bool/filter" pattern.
GET sample/_search?filter_path=hits.total.value,aggregations.teams.buckets.key,aggregations.teams.buckets.members.*.*.value,hits.hits.inner_hits.members.hits.hits._source.*
{
"aggs": {
"teams": {
"terms": {
"field": "teamId",
"size": 10
},
"aggs": {
"members": {
"nested": {
"path": "members"
},
"aggs": {
"filtered": {
"filter": {
"bool": {
"filter" :
[
// These conditions must match what is in nested query filter below
{ "match_all" : {} }
, { "range": { "members.experience": { "gte": 2 } } }
// More criteria can be added below
]
}
},
"aggs": {
"min_age": { "min": { "field": "members.age" } }
}
}
}
}
}
}
},
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "members",
"query": {
"bool": {
"must": [
{ "match_all" : {} }
, { "range": { "members.experience": { "gte": 2 } } }
// More criteria can be added below
]
}
},
"inner_hits": {
"size": 100,
"sort": [
{
"members.memberId": {
"order": "asc"
}
}
]
}
}
}
]
}
}
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | YD9 |
| Solution 2 | JohnnyM |
