'Select clause with limit and order by
If you do a select query with limit and order by a column, does that guarantee the results are deterministic? Even if the column has a lot of same values, like a boolean column? Or is the determinism guaranteed only if each row has a unique value on that column?
Solution 1:[1]
The sort has to be stable to get replicable results. For instance for:
CREATE TABLE t(id INT, col VARCHAR);
INSERT INTO t
VALUES (1, 'a'), (2, 'a'), (3, 'b);
Query:
SELECT *
FROM t
ORDER BY col
LIMIT 1;
It could return either 1, 'a' or 2, 'a'. It means that there is a tie which is not resolved and another column should be used to provide stable sort.
To easily check if columns provide stable sort the following query could be used:
SELECT *
FROM t
QUALIFY COUNT(*) OVER(PARTITION BY col_list_here) > 1
Solution 2:[2]
Did a quick test of this and it returned non-deterministic results (i.e. it changes every single time).
Adding more column in the ORDER BY clause to increase the cardinality still produced inconsistent result.
I was only able to get deterministic results when the ORDER BY columns produces unique combinations of values.
Solution 3:[3]
If a non-unique column is used in the ORDER BY clause, the result order is non-deterministic.
Moreover, for the LIMIT / TOP, please note that ORDER BY must be at the same level.
Another example of non-deterministic order behavior is if the combination of the keys in the OVER clause of a window function doesn’t form a composite unique key in the table.
Solution 4:[4]
You should preprocess your dataframe before of using in model. Suppose you have a csv file which contains a column 'Like' (values - yes,no) as below:
import pandas as pd
df=pd.read_csv("/content/GoogleDrive/MyDrive/yes_no example - Sheet1.csv") # my sample csv
df.head()
Output:
Name Like
0 Mohan yes
1 Shyam no
2 Renu yes
3 vivek yes
4 sohna no
you can define a function to convert the string into numeric and can map this function to dataframe column:
def liking(yes_no):
if yes_no =='yes':
return 1
elif yes_no== 'no':
return 2
df.Like = df.Like.map(liking)
df.head()
Output:
Name Like
0 Mohan 1
1 Shyam 2
2 Renu 1
3 vivek 1
4 sohna 2
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Clark Perucho |
| Solution 3 | Anika Shahi |
| Solution 4 | TFer2 |
