'T-SQL vlookup with fake calendar table?

I am rather new in T-SQL and I have to create a view, where the output will be as shown below: enter image description here

But my sales table doesn't have any data about sales in February and May for customer ABC and no data in January for customer XYZ, but I really want to have 0 for these months. How to do it in T-SQL?



Solution 1:[1]

This is great question about a very important topic that, even many experienced developers need to touch up on. Being "relatively new at SQL" I wont just offer a solution, I'll explain the key concepts involved.

The Auxiliary Table Numbers

First lets learn about what a tally table, aka numbers table is all about.

What does this do?

SELECT N = 1 ;

It returns the number 1.

N
-----
1

How about this?

SELECT N = 1 FROM (VALUES(0)) AS e(N);

Same thing:

N
-----
1

What does this return?

SELECT N = 1 FROM (VALUES(0),(0),(0),(0),(0),(0)) AS e(n);

Here I'm leveraging the VALUES table constructer which allows for a list of values to be treated like a view. This returns:

N
-------
1
1
1
1
1

We don't need the ones, we need the rows. This will make more sense in a moment. Now, what does this do?

WITH e(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0)) AS e(n))
SELECT N = 1 FROM e e1;

It returns the same thing, five 1's, but I've wrapped the code into a CTE named e. Think of CTEs as inline unnamed views that you can reference multiple times. Now lets CROSS JOIN e to itself. This returns for 25 dummy rows (5*5).

WITH e(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0)) AS e(n))
SELECT N = 1 FROM e e1, e e2;

Next we leverage ROW_NUMBER() over our set of dummy values.

WITH E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0)) AS e(n))
SELECT N = ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a;

Returns (truncated for brevity):

N
--------------------
1
2
3
...
24
25

Using as an auxiliary numbers table @OneToTen is a table with random numbers 1 to 10. I need to count how many there are, returning 0 when there aren't any. NOTE MY COMMENTS:

;--== 2. Simple Use Case - Counting all numbers, including missing ones (missing = 0)
DECLARE @OneToTen TABLE (N INT);
INSERT  @OneToTen VALUES(1),(2),(2),(2),(4),(8),(8),(10),(10),(10);

WITH E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS e(n)),
 iTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a)
SELECT    
  N       = i.N,
  Wrong   = COUNT(*),    -- WRONG!!! Don't do THIS, this counts ALL rows returned
  Correct = COUNT(t.N)   -- Correct, this counts numbers from @OneToTen AKA "t.N"
FROM      iTally    AS i -- Aux Table of numbers
LEFT JOIN @OneToTen AS t -- Table to evaluate
  ON      i.N  = t.N     -- LEFT JOIN @OneToTen numbers to our Aux table of numbers
WHERE     i.N <= 10      -- We only need the numbers 1 to 10
GROUP BY  i.N;           -- Group by with no Sort!!!

This returns:

N     Wrong       Correct
----- ----------- -----------
1     1           1
2     3           3
3     1           0
4     1           1
5     1           0
6     1           0
7     1           0
8     2           2
9     1           0
10    3           3

Note that I show you the wrong and right way to do this. Note how COUNT(*) is wrong for this, you need COUNT(whatever you are counting).

Auxiliary table of Dates (AKA calendar table)

My we use our numbers table to create a calendar table.

;--== 3. Auxilliary Month/Year Calendar Table
DECLARE @Start DATE = '20191001',
        @End   DATE = '20200301';

WITH E1(N) AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS e(n)),
 iTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a)
SELECT TOP(DATEDIFF(MONTH,@Start,@End)+1)
  TheDate        = f.Dt,
  TheYear        = YEAR(f.Dt),
  TheMonth       = MONTH(f.Dt),
  TheWeekday     = DATEPART(WEEKDAY,f.Dt),
  DayOfTheYear   = DATEPART(DAYOFYEAR,f.Dt),
  LastDayOfMonth = EOMONTH(f.Dt)
  FROM         iTally AS i
CROSS APPLY (VALUES(DATEADD(MONTH, i.N-1, @Start))) AS f(Dt)

This returns:

TheDate    TheYear     TheMonth    TheWeekday  DayOfTheYear LastDayOfMonth
---------- ----------- ----------- ----------- ------------ --------------
2019-10-01 2019        10          3           274          2019-10-31
2019-11-01 2019        11          6           305          2019-11-30
2019-12-01 2019        12          1           335          2019-12-31
2020-01-01 2020        1           4           1            2020-01-31
2020-02-01 2020        2           7           32           2020-02-29
2020-03-01 2020        3           1           61           2020-03-31

You will only need the YEAR and MONTH.

The Auxiliary Customer table

Because you are performing aggregations (SUM,COUNT,etc.) against multiple customers we will also need an Auxiliary table of customers, more commonly known as a lookup or dimension.

SAMPLE DATA:

;--== Sample Data
DECLARE @sale TABLE
(
  Customer  VARCHAR(10),
  SaleYear  INT,
  SaleMonth TINYINT,
  SaleAmt   DECIMAL(19,2),
  INDEX idx_cust(Customer)
);

INSERT @sale 
VALUES('ABC',2019,12,410),('ABC',2020,1,668),('ABC',2020,1,50),  ('ABC',2020,3,250),
      ('CDF',2019,10,200),('CDF',2019,11,198),('CDF',2020,1,333),('CDF',2020,2,5000),
      ('CDF',2020,2,325),('CDF',2020,3,1105),('FRED',2018,11,1105);

Distinct list of customers for an "Auxilliary Table of Customers"

SELECT DISTINCT s.Customer FROM @sale AS s;

For my sample data we get:

Customer
----------
ABC
CDF
FRED

Putting it all together

Here I'm going to:

  1. Create a numbers table
  2. Use my numbers table to create a calendar table
  3. Create an auxiliary Customer table from @sale
  4. CROSS JOIN (combine) both tables for a "junk dimension"
  5. LEFT JOIN our sales data to our calendar/customer auxiliary tables/junk dimension
  6. Group by the auxiliary table values

SOLUTION:

;--==== SAMPLE DATA
DECLARE @sale TABLE
(
  Customer  VARCHAR(10),
  SaleYear  INT,
  SaleMonth TINYINT,
  SaleAmt   DECIMAL(19,2),
  INDEX idx_cust(Customer)
);

INSERT @sale 
VALUES('ABC',2019,12,410),('ABC',2020,1,668),('ABC',2020,1,50),  ('ABC',2020,3,250),
      ('CDF',2019,10,200),('CDF',2019,11,198),('CDF',2020,1,333),('CDF',2020,2,5000),
      ('CDF',2020,2,325),('CDF',2020,3,1105),('FRED',2018,11,1105);

;--==== START/END DATEs
DECLARE @Start DATE = '20191001',
        @End   DATE = '20200301';

;--==== FINAL SOLUTION
WITH     -- 6.1. Auxilliary Table of numbers:
E1(N)     AS (SELECT 1 FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS e(n)),
iTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM E1, E1 a),
         -- 6.2. Use numbers table to create an "Auxilliary Date Table" (Calendar Table):
MonthYear(SaleYear,SaleMonth) AS
(
  SELECT TOP(DATEDIFF(MONTH,@Start,@End)+1) YEAR(f.Dt), MONTH(f.Dt)
  FROM         iTally AS i
  CROSS APPLY (VALUES(DATEADD(MONTH, i.N-1, @Start))) AS f(Dt)
)
SELECT
  Customer  = cust.Customer,
  MonthYear = CONCAT(cal.SaleYear,'-',cal.SaleMonth),
  Sales     = ISNULL(SUM(s.SaleAmt),0)
         -- Auxilliary Table of Customers
FROM       (SELECT DISTINCT s.Customer FROM @sale AS s) AS cust -- 6.3. Aux Customer Table
CROSS JOIN MonthYear AS cal -- 6.4. Cross join to create Calendar/Customer Junk Dimension
LEFT JOIN  @sale     AS s   -- 6.5. Join @sale to Junk Dimension on Year,Month and Customer
  ON       s.SaleYear  = cal.SaleYear
 AND       s.SaleMonth = cal.SaleMonth
 AND       s.Customer = cust.Customer
GROUP BY   cust.Customer, cal.SaleYear, cal.SaleMonth  -- 6.6. Group by Junk Dim values
ORDER BY   cust.Customer, cal.SaleYear, cal.SaleMonth; -- Order by not required

RESULTS:

Customer   MonthYear    Sales
---------- ------------ ------------
ABC        2019-10      0.00
ABC        2019-11      0.00
ABC        2019-12      410.00
ABC        2020-1       718.00
ABC        2020-2       0.00
ABC        2020-3       250.00
CDF        2019-10      200.00
CDF        2019-11      198.00
CDF        2019-12      0.00
CDF        2020-1       333.00
CDF        2020-2       5325.00
CDF        2020-3       1105.00
FRED       2019-10      0.00
FRED       2019-11      0.00
FRED       2019-12      0.00
FRED       2020-1       0.00
FRED       2020-2       0.00
FRED       2020-3       0.00

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Alan Burstein