'Using dbplyr and across to find the number of missing values in a SQL table

I'm trying to use dbplyr to calculate the number of missing rows across all columns in a database. The database is an Azure Synapse SQL database.

I have a simple table with one column, "MyColumn" for testing purposes and use:

b = a %>% summarise(across(everything(), ~ sum(is.na(.))))

But unfortunately get the error:

Error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server] Parse error at line: 1, column: 45: Incorrect syntax near 'IS'. 'SELECT TOP 11 SUM(CAST(IIF((("MyColumn") IS NULL), 1, 0) AS BIT)) AS "MyColumn" FROM (SELECT "MyColumn" FROM "dbo"."MyTable") "q01"'

Any thoughts on what I might be doing wrong?

I'm using dbplyr 2.1.1

Thanks



Solution 1:[1]

Based on the error message this is the SQL query that your dplyr commands are being translated into:

SELECT TOP 11 SUM(CAST(IIF((("MyColumn") IS NULL), 1, 0) AS BIT)) AS "MyColumn"
FROM (
    SELECT "MyColumn"
    FROM "dbo"."MyTable"
) "q01"

If you can run this query directly in SQL, then it will tell you why it is failing.

Two possibilities come to mind:

  1. Some flavors of SQL object to SUM on a column of type BIT.
  2. across may not translate perfectly with dbplyr.

Of these, it is most likely the first point, the work around is to use ifelse:

b = a %>% summarise(across(everything(), ~ sum(ifelse(is.na(.), 1, 0))))

Because ifelse will return numeric values rather than BIT values.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Simon.S.A.