'Calculating percentage of counts of a table in pyspark
I want to calculate the threshold limits counts of a table. For example the count of my tables is 100 as below -
spark.sql("""select count(*) from dev.my_table_metrics""").show(10,False)
+--------+
|count(1)|
+--------+
|100 |
+--------+
I want to derive a result as below wherw lower_limit is -5% of the count and upper limit is +5% of the count -
+--------+------------+-----------+
|count(1)|upper_limit |lower_limit|
+--------+------------+-----------+
|100 |105 |95 |
+--------+------------+-----------+
I tried using the percentile(100,5) function but running into errors as below.
"cannot resolve 'percentile(100, CAST(5 AS DOUBLE), 1L)' due to data type mismatch
Can someone please help me with this.
Thanks in advance
Solution 1:[1]
spark.sql("""select count(*), count(*)-(count(*)*(5/100)) as lower_limit , count(*)+(count(*)*(5/100)) as upper_limit from dev.my_table_metrics""").show(10,False)
+---------+-----------+-----------+
|count(1) |lower_limit|upper_limit|
+---------+-----------+-----------+
|100 |95 |105 |
+---------+-----------+-----------+
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | bunnylorr |
