'Converting SAS's PROC SQL into Python pandas library (with where clause)
I am determining the weights of stocks at the end of each month (e.g., 2016-12-29), and invest in those stocks for the following month (e.g., from 2017-01-02 to 2017-01-31) and calculate portfolio's return (weights times ret for given tickers for a given month-end date).
In order to do so, I was wondering a left join given below, in SAS code;
proc sql;
create table merged as
select a.*, b.weights
from prices as a, weights as b
where
a.ticker = b.ticker
and
a.date between b.date_st and b.date_end;
quit;
where table formats (in DataFrames) are given as follows.
Table prices (other than first three observations are omitted for brevity):
PX_LAST ret
ticker date
A069500 2002-10-14 5631.0 NaN
2002-10-15 5769.0 0.024507
2002-10-16 5791.0 0.003813
...
which contains multiple tickers and multiple dates for a given ticker. (PX_LAST means stock's closing price and ret means its daily return.)
Table weights (cross-sectional table; no observation is omitted):
weight date_st date_end
ticker date
A069500 2016-12-29 2.00E-01 2017-01-02 2017-01-31
A130680 2016-12-29 1.00E-01 2017-01-02 2017-01-31
A132030 2016-12-29 1.45E-11 2017-01-02 2017-01-31
A136340 2016-12-29 1.34E-11 2017-01-02 2017-01-31
A138230 2016-12-29 9.22E-12 2017-01-02 2017-01-31
A143850 2016-12-29 2.00E-01 2017-01-02 2017-01-31
Is it possible to query the above SAS-SQL in Python's Pandas library? I searched for examples, but couldn't find the one with
- dealing with two (or more) tables, and
- "between" condition in where clause.
Maybe I could slice prices table and weights table by each month-end date and re-join them altogether, but that's seemingly highly ineffective.
Any thoughts or comments would be much appreciated.
(Sorry for not posting what I've tried; I couldn't come up with any valid Python-equivalent example of SAS SQL.)
(Edited) Expected output would be
PX_LAST ret weight
ticker date
A069500 2017-01-02 ... 2.00E-01
2017-01-03 ... 2.00E-01
2017-01-04 ... 2.00E-01
…
A069500 2017-01-31 ... 2.00E-01
A130680 2017-01-02 ... 1.00E-01
…
A130680 2017-01-31 ... 1.00E-01
(values of columns PX_LAST, ret of the table prices are omitted here for brevity.)
Results of prices.head(5).to_dict('list') and weights.head(5).to_dict('list') are
{'PX_LAST': [5631.0, 5769.0, 5791.0, 5871.0, 6125.0],
'ret': [nan,
0.024507192328183214,
0.0038134858727683074,
0.013814539803142845,
0.04326349855220579]},
and
{'weights': [0.1999999999329153,
0.09999999999852502,
1.4490077570181562e-11,
1.3382147933394332e-11,
9.215515264984838e-12],
'date_st': [Timestamp('2016-01-12 00:00:00'),
Timestamp('2016-01-12 00:00:00'),
Timestamp('2016-01-12 00:00:00'),
Timestamp('2016-01-12 00:00:00'),
Timestamp('2016-01-12 00:00:00')],
'date_end': [Timestamp('2016-12-29 00:00:00'),
Timestamp('2016-12-29 00:00:00'),
Timestamp('2016-12-29 00:00:00'),
Timestamp('2016-12-29 00:00:00'),
Timestamp('2016-12-29 00:00:00')]}
, respectively.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
