'Geomesa: why stat method not worked?
My scheme
/geomesa-accumulo describe-schema -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder
INFO Describing attributes of feature 'SignalBuilder'
geo | Point (Spatio-temporally indexed) (Spatially indexed)
time | Date (Spatio-temporally indexed) (Attribute indexed)
cam | String (Attribute indexed) (Attribute indexed)
imei | String
dir | Double
alt | Double
vlc | Double
sl | Integer
ds | Integer
dir_y | Double
poi_azimuth_x | Double
poi_azimuth_y | Double
User data:
geomesa.attr.splits | 4
geomesa.feature.expiry | time(30 days)
geomesa.index.dtg | time
geomesa.indices | z3:7:3:geo:time,z2:5:3:geo,attr:8:3:time,attr:8:3:cam,attr:8:3:cam:time
geomesa.stats.enable | true
geomesa.table.partition | time
geomesa.z.splits | 4
geomesa.z3.interval | week
When I try to get count by stat methods it retuns 11:
./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='9f471340-dd70-4eca-a8dc-14553a4e708a'"
Estimated count: 11
but without cache:
./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='9f471340-dd70-4eca-a8dc-14553a4e708a'" --no-cache
INFO Running stat query...
Count: 1436
Why stats methods not worked properly and return only estimated value?
In redis it's all ok. The problem is only in accumulo.
** Question update:
I try to recalculate statistics
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-analyze -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder
INFO Running stat analysis for feature type SignalBuilder...
INFO Stats analyzed:
Total features: 11527
Bounds for geo: [ 37.598007, 55.736623, 38.661036, 56.9189592 ] cardinality: 10634
Bounds for time: [ 2022-01-30T15:13:58.706Z to 2022-02-09T14:16:03.000Z ] cardinality: 3779
Bounds for cam: [ 3fe961e1-91dd-4931-b82e-d04fcaf24c3e to f767f0fa-dac5-4571-aa47-1ea6bf6e2c82 ] cardinality: 6
INFO Use 'stats-histogram', 'stats-top-k' or 'stats-count' commands for more details
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='9f471340-dd70-4eca-a8dc-14553a4e708a'"
Estimated count: 14
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='3fe961e1-91dd-4931-b82e-d04fcaf24c3e'"
Estimated count: 0
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='3fe961e1-91dd-4931-b82e-d04fcaf24c3e'" --no-cache
INFO Running stat query...
Count: 2675
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-analyze -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder
INFO Running stat analysis for feature type SignalBuilder...
INFO Stats analyzed:
Total features: 11767
Bounds for geo: [ 37.598007, 55.736623, 38.661036, 56.9189592 ] cardinality: 10942
Bounds for time: [ 2022-01-30T15:13:58.706Z to 2022-02-09T14:17:41.000Z ] cardinality: 3841
Bounds for cam: [ 3fe961e1-91dd-4931-b82e-d04fcaf24c3e to f767f0fa-dac5-4571-aa47-1ea6bf6e2c82 ] cardinality: 6
INFO Use 'stats-histogram', 'stats-top-k' or 'stats-count' commands for more details
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "1=1"
Estimated count: Unknown
Re-run with --no-cache to get an exact count
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "1=1" --no-cache
INFO Running stat query...
Count: 11872
But it does not help (((. The geo-events continue to arrive to geomesa. But stats does not worked.
May by I'm not using stats-count properly. Stats-top-k shows gathered statistics.
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam like '3fe961e1-91dd-4931-b82e-d04fcaf24c3e'"
Estimated count: 0
~/bin/geomesa-accumulo_2.12-3.2.2/bin ./geomesa-accumulo stats-top-k -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder
Top values for 'geo':
unavailable
Top values for 'time':
unavailable
Top values for 'cam':
7c0cf8bc-e7e3-4023-8a00-a5f17bda3001 (2925)
9f471340-dd70-4eca-a8dc-14553a4e708a (2924)
f767f0fa-dac5-4571-aa47-1ea6bf6e2c82 (2922)
bfe55ad1-5b0a-405d-9ca9-3bed6aca9313 (2921)
3fe961e1-91dd-4931-b82e-d04fcaf24c3e (2920)
5798a065-d51e-47a1-b04b-ab48df9f1324 (2)
Top values for 'imei':
unavailable
Top values for 'dir':
unavailable
Top values for 'alt':
unavailable
Top values for 'vlc':
unavailable
Top values for 'sl':
unavailable
Top values for 'ds':
unavailable
Top values for 'dir_y':
unavailable
Top values for 'poi_azimuth_x':
unavailable
Top values for 'poi_azimuth_y':
unavailable
Or maybe the reason was in accumulo. When I try to get data from accumulo table. It returns
root@accumulo> scan -t myNamespace.geomesa_SignalBuilder_z3_geo_time_v7_02717
2022-02-09 17:55:12,909 [commands.ShellPluginConfigurationCommand] ERROR: Error: Could not determine the type of file "hdfs://10.200.217.27:9000/accumulo/classpath/myNamespace/[^.].*.jar".
2022-02-09 17:55:12,909 [shell.Shell] ERROR: Could not load the specified formatter. Using the DefaultFormatter
2022-02-09 17:55:12,929 [commands.ShellPluginConfigurationCommand] ERROR: Error: Could not determine the type of file "hdfs://10.200.217.27:9000/accumulo/classpath/myNamespace/[^.].*.jar".
\x01\x0A\x9Dt\x19\x84\xEF\xDD\xAF "5798a065-d51e-47a1-b04b-ab48df9f1324-1643555638706 d: [] \x03\x00\x0C\x02\x00\x1E\x000\x008\x00\\\x00g\x00o\x00w\x00\x7F\x00\x83\x00\x87\x00\x87\x00\x87\x00\x87\x00\x00\x0E\x00\x01\x01@CT\x9C\xD3\xE0\xBDE@Lu\xA0t\x7F-\xDE\x00\x00\x01~\xAB\x8C\xCD\xB25798a065-d51e-47a1-b04b-ab48df9f132\xB43333333333\xB1@f@\x00\x00\x00\x00\x00?\xF3\xAE\x14z\xE1G\xAE\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x01
\x01\x0A\x9Dt\x19\x84\xEF\xDD\xBD!\x065798a065-d51e-47a1-b04b-ab48df9f1324-1643555648706 d: [] \x03\x00\x0C\x02\x00\x1E\x000\x008\x00\\\x00g\x00o\x00w\x00\x7F\x00\x83\x00\x87\x00\x87\x00\x87\x00\x87\x00\x00\x0E\x00\x01\x01@CT\x9C\xD3\xE0\xBDE@Lu\xA0t\x7F-\xDE\x00\x00\x01~\xAB\x8C\xF4\xC25798a065-d51e-47a1-b04b-ab48df9f132\xB43333333333\xB1@f@\x00\x00\x00\x00\x00?\xF3\xAE\x14z\xE1G\xAE\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x01
Solution 1:[1]
Stats are gathered during ingestion, but are only written on a "best effort" basis (for example, if your ingest dies, stats may not be written). There are also code paths that don't update stats, for example if you disable them via system property or if you ingest through a bulk map/reduce job. In your particular case, it's hard to say why your stats don't match your data without a detailed description of everything you did to ingest it. However, if you want to re-calculate the cached statistics, you can always run the stats-analyze CLI command.
If you can re-create the issue, please feel free to file a ticket in the GeoMesa JIRA with the steps to re-create.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Emilio Lahr-Vivaz |
