'SQL in Databricks: Only keep oldest records from each month

I have a table tab1 with several columns including a date column. I want to create a view v1 with only the oldest records for each month and remove any records that came before. Duplicates should be kept and a new month_year column created.

INPUT TABLE tab1

Provider report_date other columns
Apple 20220101
Apple 20220105
Apple 20220120
Apple 20220120
Apple 20220205
Apple 20220210

OUTPUT VIEW v1

Provider report_date month_year other columns
Apple 20220120 01/2022
Apple 20220120 01/2022
Apple 20220210 02/2022

I've used the below code, but it does not work because report_date (yyyymmdd) is an int.

CREATE OR REPLACE TEMPORARY VIEW v1 AS
SELECT 
  *
FROM
  tab1
WHERE
  report_date IN (
    SELECT   MAX(report_date)
    FROM     tab1
    GROUP BY MONTH(report_date), YEAR(report_date)
  )

I've tried to cast report_date to char in the SELECT statement part so that the above works, but that gives me an error as well:

  CAST(dataset_day_id AS varchar),

I've also tried CONVERT function, but I get the following error:

AnalysisException: Undefined function: 'CONVERT'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source