'How to increment/decrement column based on value of other column in Pyspark
I have audit log data in Pyspark that I am trying to figure out how many people are logged on to the system at any given time.
So for example the dataframe I am working has the following structure
time src_comp auth_orient
1 C1 LogOn
2 C1 LogOn
2 C1 LogOn
3 C2 LogOn
3 C1 LogOff
4 C1 LogOn
4 C2 LogOff
5 C1 LogOn
6 C2 LogOn
6 C3 LogOff
Then from my code I want to create another column that shows how many people are logged on, like below. I also don't want the loggedon column to ever go negative, so if someone logs out of a system that I wasn't tracking I don't want to decrement the column:
Expected Output:
time src_comp LoggedOn
1 C1 1
2 C1 2
3 C2 1
3 C1 1
3 C1 2
4 C2 0
5 C1 3
6 C2 1
6 C3 0
I have the following code that I tried and got an error.
auth_dataset.groupby(auth_dataset.time,auth_dataset.src_comp).agg(F.when(auth_dataset.strt=='LogOn',number+=1).otherwise(number-=1))
Solution 1:[1]
Try the following code. It sums up all values (1 and -1) and greatest(0,col) functions makes sure that it would never go negative.
auth_dataset.groupby('time','src_comp')\
.agg(F.sum(F.when(auth_dataset.auth_orient == 'LogOn', F.lit(1)).otherwise(F.lit(-1))).alias('LoggedOn'))\
.withColumn('LoggedOn', F.greatest(F.lit(0),'LoggedOn'))\
.sort('time','src_comp').show()
Solution 2:[2]
it Question not Answer
time src_comp auth_orient 1 C1 LogOn 2 C2 LogOn 2 C3 LogOn 3 C1 LogOn 3 C2 LogOff 4 C1 LogOn 4 C1 LogOff 5 C2 LogOn 6 C3 LogOn 6 C1 LogOff
in src_comp column values are reset to initial value then how to give a group number for that. Ex: C1, C2, C3 as group1. C1, C2 as group2. C1 as group3. C1, C2, C3 as group4. C1 as group5. when value resets to C1 from there i want to give a group number in another column in pyspark. how can i do that.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Manoj Singh |
| Solution 2 | Manohar Nookala |
