'How do I add an aggregate column in Kotlin DataFrame that is based on current row against other filtered rows?
val country by columnOf("UK", "UK", "FR", "FR", "DE")
val city by columnOf("London", "London", "Paris", "Paris", "Bonn")
val area by columnOf("Holborn", "Camden", "Barbes", "Eiffel", "Weststadt"),
val population by columnOf(1100, 1200, 1300, 1400, 1500)
val df = dataFrameof (country, city, area, population)
println(df)
How can I add a column which is population / sum by (country, city). This column would be calculated as follows:
Proportion
0.47826087 (i.e. 1100/2300)
0.52173913
0.481481481
0.518518519
1.00
Solution 1:[1]
Well this seems to work but not sure if there is a better way
val country by columnOf("UK", "UK", "FR", "FR", "DE")
val city by columnOf("London", "London", "Paris", "Paris", "Bonn")
val area by columnOf("Holborn", "Camden", "Barbes", "Eiffel", "Weststadt"),
val population by columnOf(1100, 1200, 1300, 1400, 1500)
val df = dataFrameof (country, city, area, population)
val total by column<Int>()
val cities = df
.groupBy { country and city }
.aggregate{ sum { population } into total }
val df2 = df.add("percentage") {
val filtercity = get(city)
val cityTotal = cities.filter { city<String>() == filtercity}[total].first()
get(population).div(cityTotal.toDouble())
}
println(df2)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | thewaterwalker |
