'Using dbplyr and across to find the number of missing values in a SQL table
I'm trying to use dbplyr to calculate the number of missing rows across all columns in a database. The database is an Azure Synapse SQL database.
I have a simple table with one column, "MyColumn" for testing purposes and use:
b = a %>% summarise(across(everything(), ~ sum(is.na(.))))
But unfortunately get the error:
Error: nanodbc/nanodbc.cpp:1655: 42000: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server] Parse error at line: 1, column: 45: Incorrect syntax near 'IS'. 'SELECT TOP 11 SUM(CAST(IIF((("MyColumn") IS NULL), 1, 0) AS BIT)) AS "MyColumn" FROM (SELECT "MyColumn" FROM "dbo"."MyTable") "q01"'
Any thoughts on what I might be doing wrong?
I'm using dbplyr 2.1.1
Thanks
Solution 1:[1]
Based on the error message this is the SQL query that your dplyr commands are being translated into:
SELECT TOP 11 SUM(CAST(IIF((("MyColumn") IS NULL), 1, 0) AS BIT)) AS "MyColumn"
FROM (
SELECT "MyColumn"
FROM "dbo"."MyTable"
) "q01"
If you can run this query directly in SQL, then it will tell you why it is failing.
Two possibilities come to mind:
- Some flavors of SQL object to
SUMon a column of typeBIT. acrossmay not translate perfectly with dbplyr.
Of these, it is most likely the first point, the work around is to use ifelse:
b = a %>% summarise(across(everything(), ~ sum(ifelse(is.na(.), 1, 0))))
Because ifelse will return numeric values rather than BIT values.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Simon.S.A. |
