'SQL query to extract metadata based on two conditions in second dataset
I have two data sets with the following format:
m - acc | sra_study | bioproject |
tax - acc | tax_id | total_count
Each row of m represents a biological sample (acc). The tax table represents which organisms (tax_id) were found in each biological sample (acc) and how many times they were observed (total_count).
The tax_id and total_count columns values are integers, while the rest of the columns are strings.
I would like to filter rows in m based on whether tax_id=9606 is present with a total_count > 10000000 AND tax_id=2 is present with a total_count>1000000 for a given sample (acc).
I have tried to do this with the following SQL query:
SELECT m.acc, m.sra_study, m.bioproject
FROM `nih-sra-datastore.sra.metadata` as m,
`nih-sra-datastore.sra_tax_analysis_tool.tax_analysis` as tax
WHERE m.acc = tax.acc
AND (tax.tax_id = 9606 AND tax.total_count > 10000000)
AND (tax.tax_id = 2 AND tax.total_count > 1000000)
However, the query does not return any results. I suspect this is because of a problem with the syntax my SQL query.
Solution 1:[1]
You have specified what appears to be a mutually exclusive condition in your where clause
tax.tax_id = 9606 .. AND .. tax.tax_id = 2
Modify Like
SELECT m.acc, m.sra_study, m.bioproject
FROM `nih-sra-datastore.sra.metadata` as m, `nih-sra-datastore.sra_tax_analysis_tool.tax_analysis` as tax
WHERE m.acc=tax.acc
AND ((tax.tax_id=9606 AND tax.total_count > 10000000)
OR
(tax.tax_id=2 AND tax.total_count>1000000)
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Bernard Dy |
