'GROUP BY ORDER BY migrating from SQLite to SQL Server

The third weekend now, still no success migrating all my queries to SQL Server. I have to move to SQL Server because my SQlite is larger than 50 GB now, hundreds of millions of rows and still growing. Migration went well with a C# console app and SqlBulkCopy.

I struggle now to migrate all my queries. What is easy with SQLite, all what I know from SQL seems to work differently with SQL Server. I did not find the right way so far. I still think in the SQLite way and don't know how to translate to SQL Server.

Issues are:

  1. "The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified."

  2. "Column 'view_Data.Timestamp' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause."

I tried with TOP 100 PERCENT, OFFSET 0 ROWS, ROW_NUMBER() OVER, PARTITION BY, ..., but never had the same result like with SQLite. I don't know when to use what.

Here the two queries I have to "translate" to T-SQL. Group by ComputerName as it has multiple data and only the newest is required in the specific date range.

SQLite query for 1 day like '2021-11-05':

SELECT DATE(Timestamp), AppVersion, COUNT(*) AS count 
FROM
    (SELECT Timestamp, AppVersion 
     FROM view_Data 
     WHERE DATE(Timestamp) BETWEEN DATE('2021-11-05', '-24 day') 
                               AND DATE('2021-11-05') 
     GROUP BY ComputerName 
     ORDER BY Timestamp DESC)
GROUP BY AppVersion
ORDER BY count DESC

SQLite query for all DISTINCT DATE(Timestamp):

SELECT t1.Timestamp, AppVersion, COUNT(*) AS count
FROM
    (SELECT DISTINCT DATE(Timestamp) Timestamp 
     FROM view_Data) t1
INNER JOIN 
    view_Data t2 ON UniqueKey IN (SELECT UniqueKey 
                                  FROM view_Data 
                                  WHERE DATE(Timestamp) BETWEEN DATE(t1.Timestamp, '-24 day') 
                                                            AND DATE(t1.Timestamp) 
                                  GROUP BY ComputerName 
                                  ORDER BY Timestamp DESC)
GROUP BY t1.Timestamp, AppVersion
ORDER BY t1.Timestamp DESC, count DESC;


Solution 1:[1]

"The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified."

This one is pretty clear. You cannot define sort conditions on dependent queries unless you explicitly specify the portion of rows you select.

That's because subqueries (etc.) are meant to give a pool of rows, and not pinning their order gives the query optimizer more choices how (or when) to fetch those rows. Sorting a subquery forces the DB engine to materialize it on the spot, which might result in a less efficient query plan.

For example, here the ORDER BY in the subquery does not make sense:

SELECT DATE(Timestamp), AppVersion, COUNT(*) AS count FROM
(
    SELECT Timestamp, AppVersion FROM view_Data 
    WHERE DATE(Timestamp) BETWEEN DATE('2021-11-05', '-24 day') AND DATE('2021-11-05') 
    GROUP BY ComputerName
    ORDER BY Timestamp DESC
)
GROUP BY AppVersion
ORDER BY count DESC

You're ordering in the main query, you can drop the inner ORDER BY.

Also, all the aggregates are missing, e.g. you can't really select Timestamp when you're not grouping by Timestamp, because there is more than one TimeStamp value in each group. If this happens to give you the right value in SQLite, that's sheer luck(1).

This is precisely what

"Column 'view_Data.Timestamp' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause."

tells you.

It seems you're trying to solve the max-per-group problem: You're trying to get the latest AppVersion per ComputerName in the given time frame. A common way that works in both SQLite and SQL Server is calculating row numbers:

SELECT
    AppVersion,
    ComputerName,
    Timestamp,
    ROW_NUMBER() OVER (PARTITION BY ComputerName ORDER BY Timestamp DESC) rn
FROM
    view_Data 
WHERE 
    Timestamp BETWEEN DATE('2021-11-05', '-24 day') AND DATE('2021-11-05')

Now the latest row per ComputerName will have rn = 1, which will be convenient later.

Of course date arithmetic is different in SQL Server. This should work:

WHERE 
    Timestamp BETWEEN DATEADD(dd, -24, '2021-11-05') AND '2021-11-05'

Now you can use this - either as a subquery, or a CTE, your choice - to count the versions across all computers:

SELECT
    AppVersion,
    MIN(TimeStamp) MinTimeStamp,
    MAX(TimeStamp) MaxTimeStamp,
    COUNT(*) AS count
FROM (
    -- the above query
) AS version_data
WHERE
    rn = 1
GROUP BY
    AppVersion
ORDER BY
    COUNT(*) DESC;

or, as a CTE

WITH vesion_data AS (
    -- the above query
)
SELECT
    AppVersion,
    MIN(TimeStamp) MinTimeStamp,
    MAX(TimeStamp) MaxTimeStamp,
    COUNT(*) AS count
FROM
    version_data
WHERE
    rn = 1
GROUP BY
    AppVersion
ORDER BY
    COUNT(*) DESC;

Note how everything except AppVersion is in an aggregate function.


(1) Technically, it's documented what happens. But it is not obvious from the query, and it's an implementation detail of SQLite that has changed at least once through the lifetime of SQLite. For all practical purposes, it's best to consider a "bare" column (one that is not grouped and not aggregated in some way) as unordered, and its value as random. Behavior is certainly not transferable to other DB engines, so being explicit about your ordering preferences is wise, even if a DB engine lets you get away with queries like this.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1