'AKS log query not resulting correct results
I am trying to find out the failed pods of just last 1hour in AKS. I have used azure provided sample query:
// List all the pods count with phase
// View pod phase counts based on all phases: Failed, Pending, Unknown, Running, or Succeeded.
// To create an alert for this query, click '+ New alert rule'
//Customize endDateTime, startDateTime to select different time range
let endDateTime = now();
let startDateTime = ago(1h);
let trendBinSize = 1m;
KubePodInventory
| where TimeGenerated < endDateTime
| where TimeGenerated >= startDateTime
| distinct ClusterName, TimeGenerated, _ResourceId
| summarize ClusterSnapshotCount = count() by bin(TimeGenerated, trendBinSize), ClusterName, _ResourceId
| join hint.strategy=broadcast (
KubePodInventory
| where TimeGenerated < endDateTime
| where TimeGenerated >= startDateTime
| distinct ClusterName, Computer, PodUid, TimeGenerated, PodStatus, _ResourceId
| summarize TotalCount = count(), //Calculating count for per pod status
PendingCount = sumif(1, PodStatus =~ 'Pending'),
RunningCount = sumif(1, PodStatus =~ 'Running'),
SucceededCount = sumif(1, PodStatus =~ 'Succeeded'),
FailedCount = sumif(1, PodStatus =~ 'Failed')
by ClusterName, bin(TimeGenerated, trendBinSize), _ResourceId
) on ClusterName, TimeGenerated, _ResourceId
| extend UnknownCount = TotalCount - PendingCount - RunningCount - SucceededCount - FailedCount
| project TimeGenerated, _ResourceId,
TotalCount = todouble(TotalCount) / ClusterSnapshotCount,
PendingCount = todouble(PendingCount) / ClusterSnapshotCount,
RunningCount = todouble(RunningCount) / ClusterSnapshotCount,
SucceededCount = todouble(SucceededCount) / ClusterSnapshotCount,
FailedCount = todouble(FailedCount) / ClusterSnapshotCount,
UnknownCount = todouble(UnknownCount) / ClusterSnapshotCount
But all the counts returned from the query example FailedCount are not of just last 1 hour, they are the number of total failed pods from the time when cluster was created.
I am not sure what is wrong in the query, if anyone has any suggestions pls reply.
Thanks in advance!!
Solution 1:[1]
I tried the below query and got the results only for 1 Hour.
Query used:
let endDateTime = now();
let startDateTime = 1h;
let trendBinSize = 1m;
KubePodInventory
| where TimeGenerated >= ago(startDateTime)
| distinct ClusterName, TimeGenerated, _ResourceId
| summarize ClusterSnapshotCount = count() by bin(TimeGenerated, trendBinSize), ClusterName, _ResourceId
| join hint.strategy=broadcast (
KubePodInventory
| where TimeGenerated >= ago(startDateTime)
| distinct ClusterName, Computer, PodUid, TimeGenerated, PodStatus, _ResourceId
| summarize TotalCount = count(), //Calculating count for per pod status
PendingCount = sumif(1, PodStatus =~ 'Pending'),
RunningCount = sumif(1, PodStatus =~ 'Running'),
SucceededCount = sumif(1, PodStatus =~ 'Succeeded'),
FailedCount = sumif(1, PodStatus =~ 'Failed')
by ClusterName, bin(TimeGenerated, trendBinSize), _ResourceId
) on ClusterName, TimeGenerated, _ResourceId
| extend UnknownCount = TotalCount - PendingCount - RunningCount - SucceededCount - FailedCount
| project TimeGenerated, _ResourceId,
TotalCount = todouble(TotalCount) / ClusterSnapshotCount,
PendingCount = todouble(PendingCount) / ClusterSnapshotCount,
RunningCount = todouble(RunningCount) / ClusterSnapshotCount,
SucceededCount = todouble(SucceededCount) / ClusterSnapshotCount,
FailedCount = todouble(FailedCount) / ClusterSnapshotCount,
UnknownCount = todouble(UnknownCount) / ClusterSnapshotCount
Reference: https://docs.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-log-query
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | MadhurajVadde-MT |

