'Calculating percentage of counts of a table in pyspark

I want to calculate the threshold limits counts of a table. For example the count of my tables is 100 as below -

spark.sql("""select count(*) from dev.my_table_metrics""").show(10,False)
+--------+                                                                      
|count(1)|
+--------+
|100     |
+--------+

I want to derive a result as below wherw lower_limit is -5% of the count and upper limit is +5% of the count -

+--------+------------+-----------+                                                                      
|count(1)|upper_limit |lower_limit|
+--------+------------+-----------+
|100     |105         |95         |
+--------+------------+-----------+

I tried using the percentile(100,5) function but running into errors as below. "cannot resolve 'percentile(100, CAST(5 AS DOUBLE), 1L)' due to data type mismatch

Can someone please help me with this.

Thanks in advance

Solution 1:^[1]

spark.sql("""select count(*), count(*)-(count(*)*(5/100)) as lower_limit , count(*)+(count(*)*(5/100)) as upper_limit  from dev.my_table_metrics""").show(10,False)
+---------+-----------+-----------+                                              
|count(1) |lower_limit|upper_limit|
+---------+-----------+-----------+
|100      |95         |105        |
+---------+-----------+-----------+

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	bunnylorr

'Calculating percentage of counts of a table in pyspark

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]