'How to increase values of polars dataframe column by index
I have a data frame as follow
┌────────────┬──────────┬──────────┬──────────┬──────────┐
│ time ┆ open ┆ high ┆ low ┆ close │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞════════════╪══════════╪══════════╪══════════╪══════════╡
│ 1649016000 ┆ 46405.49 ┆ 47444.11 ┆ 46248.84 ┆ 46407.35 │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 1649030400 ┆ 46407.36 ┆ 46461.14 ┆ 45744.77 ┆ 46005.44 │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 1649044800 ┆ 46005.43 ┆ 46293.38 ┆ 45834.39 ┆ 46173.99 │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 1649059200 ┆ 46174.0 ┆ 46287.97 ┆ 45787.0 ┆ 46160.09 │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ ... ┆ ... ┆ ... ┆ ... ┆ ... │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 1653278400 ┆ 30171.32 ┆ 30670.51 ┆ 30101.07 ┆ 30457.01 │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 1653292800 ┆ 30457.01 ┆ 30616.18 ┆ 30281.89 ┆ 30397.11 │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 1653307200 ┆ 30397.12 ┆ 30625.98 ┆ 29967.07 ┆ 30373.53 │
├╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 1653321600 ┆ 30373.53 ┆ 30529.9 ┆ 30042.09 ┆ 30121.02 │
└────────────┴──────────┴──────────┴──────────┴──────────┘
I want to count how many times each price (low and high) were local minimum/maximum in a window range of 2 to 50.
first I add two columns for count of being local min/max for each row and fill by zeros
raw_data["lmin_count"] = np.zeros(len(raw_data), dtype=np.int16)
raw_data["lmax_count"] = np.zeros(len(raw_data), dtype=np.int16)
then I iterate window length from 2 to 50 and find index of each local min/max by using:
for _order in range(2, 51):
local_minima = argrelextrema(raw_data["low"].to_numpy(), np.less, order=_order)[0]
local_maxima = argrelextrema(raw_data["high"].to_numpy(), np.greater, order=_order)[0]
which order is window length.
and in each iteration over window length I want to increase value of lmin_count and lmax_count by indices found in local_minima and local_maxima
I tried increasing value by this code:
if len(local_minima) > 1:
raw_data[local_minima,5] += 1
if len(local_maxima) > 1:
raw_data[local_minima,6] += 1
which local_minima and local_maxima are array of indices and 5,6 are index of lmin_count and lmax_count columns.
but got error not implemented.
So what is the best way to increase (or assign) value of column by row indices?
Update 2022/05/24
As answers were very helpful now I have other issues. I changed my code as follow:
min_expr_list = [
(
pl.col("price").rolling_min(
window_size=_order * 2 + 1, min_periods=_order + 2, center=True
)
== pl.col("price")
).cast(pl.UInt32)
for _order in range(200, 1001)
]
max_expr_list = [
(
pl.col("price").rolling_max(
window_size=_order * 2 + 1, min_periods=_order + 2, center=True
)
== pl.col("price")
).cast(pl.UInt32)
for _order in range(200, 1001)
]
raw_data = raw_data.with_columns(
[
pl.sum(min_expr_list).alias("min_freq"),
pl.sum(max_expr_list).alias("max_freq"),
]
)
first: is it possible to merge both min_expr_list and max_expr_list into one list? and if it is possible, in with_columns expression how can I add separate columns based on each element of list?
another issue I am facing is memory usage of this approach.
In previous example _order were limited but in action it is more wider than example.
currently I have datasets with millions of records (some of them have more than 10 million records) and _orders range can be from 2 to 1500 so calculating needs lots of GB of ram.
is there any better way to do that?
and one more side problem. when increasing _order to more than 1000 it seems it doesn't work. is there any limitation in source code?
Solution 1:[1]
You write very imperative code which is not really idiomatic polars. You typically should not even know where the index of a value is. Instead you assign by conditions, for instance with when(condition) -> then(value) -> otherwise(value) expressions.
Your condition in a when can still refer to an index. This snippet for instance is equal to assigning to a specific index, but then written more functional:
pl.DataFrame({
"letters": ["a", "b", "c", "d"]
}).with_column(
# use a condition to determine the index location
pl.when(pl.arange(0, pl.count()) == 2)
.then("idx_2")
.otherwise("letters").alias("letters")
)
shape: (4, 1)
???????????
? letters ?
? --- ?
? str ?
???????????
? letters ?
???????????
? letters ?
???????????
? idx_2 ?
???????????
? letters ?
???????????
Your intent, counting local minima/maxima
To help you in your cause, I'd like to show how you can find your local minima in an idiomatic polars way.
The local minima/maxima can be found by:
- taking the derivative
dy/dxof a functionx. - computing the
signof that derivative tells us where the functions slope is increasing and decreasing. - if we take the derivative of the sign of
dy/dxwe know where the signs are changing and thus where we have local minima/maxima.
Let's try this on a dummy DataFrame.
df = pl.DataFrame({
"x": [8, 4, 2, 7, 9, 6, 3, 0]
})
# find the local minima/maxima
df = df.with_columns([
(pl.col("x").diff().sign().diff().shift(-1) == -2).alias("local_maximum"),
(pl.col("x").diff().sign().diff().shift(-1) == 2).alias("local_minimum")
])
print(df)
shape: (8, 3)
???????????????????????????????????????
? x ? local_maximum ? local_minimum ?
? --- ? --- ? --- ?
? i64 ? bool ? bool ?
???????????????????????????????????????
? 8 ? false ? false ?
???????????????????????????????????????
? 4 ? false ? false ?
???????????????????????????????????????
? 2 ? false ? true ?
???????????????????????????????????????
? 7 ? false ? false ?
???????????????????????????????????????
? 9 ? true ? false ?
???????????????????????????????????????
? 6 ? false ? false ?
???????????????????????????????????????
? 3 ? false ? false ?
???????????????????????????????????????
? 0 ? false ? false ?
???????????????????????????????????????
Next we can take the cumulative sum to count the total seen local minima and maxima.
df.with_columns([
pl.col("local_maximum").cumsum().alias("local_max_count"),
pl.col("local_minimum").cumsum().alias("local_min_count")
])
shape: (8, 5)
???????????????????????????????????????????????????????????????????????????
? x ? local_maximum ? local_minimum ? local_max_count ? local_min_count ?
? --- ? --- ? --- ? --- ? --- ?
? i64 ? bool ? bool ? u32 ? u32 ?
???????????????????????????????????????????????????????????????????????????
? 8 ? false ? false ? 0 ? 0 ?
???????????????????????????????????????????????????????????????????????????
? 4 ? false ? false ? 0 ? 0 ?
???????????????????????????????????????????????????????????????????????????
? 2 ? false ? true ? 0 ? 1 ?
???????????????????????????????????????????????????????????????????????????
? 7 ? false ? false ? 0 ? 1 ?
???????????????????????????????????????????????????????????????????????????
? 9 ? true ? false ? 1 ? 1 ?
???????????????????????????????????????????????????????????????????????????
? 6 ? false ? false ? 1 ? 1 ?
???????????????????????????????????????????????????????????????????????????
? 3 ? false ? false ? 1 ? 1 ?
???????????????????????????????????????????????????????????????????????????
? 0 ? false ? false ? 1 ? 1 ?
???????????????????????????????????????????????????????????????????????????
I hope this helps nudge you in the right direction.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
