'How do I select the last non-empty value for a given ID and date
I'm trying to select the last value in a column that isn't blank (not non-null technically) and select it for every date after, until that value changes, then select that value and so on.
What I have:
| company_id | date | sales_stage | previous_sales_stage |
|---|---|---|---|
| 1 | 2022-05-20 00:00:00.000 | a | NULL |
| 1 | 2022-05-19 00:00:00.000 | b | NULL |
| 1 | 2022-05-18 00:00:00.000 | c | NULL |
| 1 | 2022-05-17 00:00:00.000 | c | NULL |
| 1 | 2022-05-16 00:00:00.000 | c | NULL |
| 1 | 2022-05-15 00:00:00.000 | d | NULL |
| 1 | 2022-05-14 00:00:00.000 | d | NULL |
| 1 | 2022-05-13 00:00:00.000 | d | NULL |
| 1 | 2022-05-12 00:00:00.000 | e | NULL |
| 1 | 2022-05-11 00:00:00.000 | e | NULL |
What I'd like to have:
| company_id | date | sales_stage | previous_sales_stage |
|---|---|---|---|
| 1 | 2022-05-20 00:00:00.000 | a | b |
| 1 | 2022-05-19 00:00:00.000 | b | c |
| 1 | 2022-05-18 00:00:00.000 | c | d |
| 1 | 2022-05-17 00:00:00.000 | c | d |
| 1 | 2022-05-16 00:00:00.000 | c | d |
| 1 | 2022-05-15 00:00:00.000 | d | e |
| 1 | 2022-05-14 00:00:00.000 | d | e |
| 1 | 2022-05-13 00:00:00.000 | d | e |
| 1 | 2022-05-12 00:00:00.000 | e | NULL |
| 1 | 2022-05-11 00:00:00.000 | e | NULL |
This is going in a summary table that is shared with all companies ( so there will be multiple company ids and stages for a given date) and is calculated daily from a stored proc. If there is no value, NULL is okay.
Here is some T-SQL to create a temp table that recreates this example :
DROP TABLE IF EXISTS #blah
CREATE TABLE #blah
(
company_id INT
, [date] DATETIME
, sales_stage VARCHAR(50)
, previous_sales_stage VARCHAR(50)
);
INSERT INTO #blah (company_id, sales_stage, [date]) VALUES (1,'a',CAST(GETDATE() AS DATE))
INSERT INTO #blah (company_id, sales_stage, [date]) VALUES (1,'b',dateadd(d,-1,cast(getdate() as date)))
INSERT INTO #blah (company_id, sales_stage, [date]) VALUES (1,'c',dateadd(d,-2,cast(getdate() as date)))
INSERT INTO #blah (company_id, sales_stage, [date]) VALUES (1,'c',dateadd(d,-3,cast(getdate() as date)))
INSERT INTO #blah (company_id, sales_stage, [date]) VALUES (1,'c',dateadd(d,-4,cast(getdate() as date)))
INSERT INTO #blah (company_id, sales_stage, [date]) VALUES (1,'d',dateadd(d,-5,cast(getdate() as date)))
INSERT INTO #blah (company_id, sales_stage, [date]) VALUES (1,'d',dateadd(d,-6,cast(getdate() as date)))
INSERT INTO #blah (company_id, sales_stage, [date]) VALUES (1,'d',dateadd(d,-7,cast(getdate() as date)))
INSERT INTO #blah (company_id, sales_stage, [date]) VALUES (1,'e',dateadd(d,-8,cast(getdate() as date)))
INSERT INTO #blah (company_id, sales_stage, [date]) VALUES (1,'e',dateadd(d,-9,cast(getdate() as date)))
SELECT * FROM #blah
UPDATE #blah SET previous_sales_stage = 'b' WHERE company_id = 1 AND date = '2022-05-20 00:00:00.000'
UPDATE #blah SET previous_sales_stage = 'c' WHERE company_id = 1 AND date = '2022-05-19 00:00:00.000'
UPDATE #blah SET previous_sales_stage = 'd' WHERE company_id = 1 AND date = '2022-05-18 00:00:00.000'
UPDATE #blah SET previous_sales_stage = 'd' WHERE company_id = 1 AND date = '2022-05-17 00:00:00.000'
UPDATE #blah SET previous_sales_stage = 'd' WHERE company_id = 1 AND date = '2022-05-16 00:00:00.000'
UPDATE #blah SET previous_sales_stage = 'e' WHERE company_id = 1 AND date = '2022-05-15 00:00:00.000'
UPDATE #blah SET previous_sales_stage = 'e' WHERE company_id = 1 AND date = '2022-05-14 00:00:00.000'
UPDATE #blah SET previous_sales_stage = 'e' WHERE company_id = 1 AND date = '2022-05-13 00:00:00.000'
UPDATE #blah SET previous_sales_stage = NULL WHERE company_id = 1 AND date = '2022-05-12 00:00:00.000'
UPDATE #blah SET previous_sales_stage = NULL WHERE company_id = 1 AND date = '2022-05-11 00:00:00.000'
SELECT * FROM #blah
Solution 1:[1]
G is a graph created with four nodes, and thus G_adj is a (4, 4) sparse matrix.
adata is a scanpy object with 6 observations, and four variables. the scanpy louvain algorithm clusters observations, and thus expects an adjacncy matrix of shape (6, 6).
Not sure what you were meaning to do:
If you truly have 6 nodes you should alter your code for the graph:
print(features.shape)
edgelist = [(0,1), (1,2), (2,3)]
G = nx.Graph()
G.add_nodes_from(range(6))
G.add_edges_from(edgelist)
G_adj = nx.convert_matrix.to_scipy_sparse_matrix(G) # transform to scipy sparse matrix
adata = sc.AnnData(features.numpy())
If you have 4 nodes, alter the adata creation line:
adata = sc.AnnData(features.numpy().T)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | YotamW Constantini |
