'Limit results of a joined table to the number of entries in the base table
I use Lansweeper to keep track of all server assets. At the moment I am working on a report, which shows which webservers are installed and running. For this I have to join to tables, Table A contains the installed webserver software, table B contains the names of the services
The expected result should like this:
| VM | Version | Class | Servicename |
|---|---|---|---|
| VM1 | 9 | Tomcat | tomcat9 |
| VM2 | 9 | Tomcat | tomcat9 |
| VM2 | 7 | Tomcat | tomcat7 |
| VM3 | 9 | Tomcat | tomcat9 |
It is working fine, except for servers where multiple webservers are installed. If I have two webservers on a machine I get 4 results in the table. This is because I join based on the VM-ID, so the result looks like this:
| VM | Version | Class | Servicename |
|---|---|---|---|
| VM1 | 9 | Tomcat | tomcat9 |
| VM2 | 9 | Tomcat | tomcat9 |
| VM2 | 7 | Tomcat | tomcat7 |
| VM2 | 9 | Tomcat | tomcat7 |
| VM2 | 7 | Tomcat | tomcat9 |
| VM3 | 9 | Tomcat | tomcat9 |
This is the SQL statement for this result:
select a.VM, a.Version, a.Class, b.Servicename
from Installedsoftware as a
inner join Services as b on a.VM = b.VM
I totally understand that this happens because of two entries in table A und table B which results in 4 entries because of the join.
Is there a way to avoid this without adding a second join condition? Because this would only help in this scenario but not if I have two tomcat9 servers on the same machine. Is there a way to limit the results in a way like "if Table A has only two entries for VM1, then show only two lines of the joined table"?
Solution 1:[1]
This seems like a groupwise-maximum problem, which has several potential approaches, but all encourage a predictable, deterministic row as opposed to "I don't care which row." One way is:
select a.VM, a.Version, a.Class, b.Servicename
from dbo.Installedsoftware as a
left outer join
(
SELECT VM, Servicename, rn = ROW_NUMBER() OVER
(PARTITION BY VM ORDER BY Servicename)
FROM dbo.Services
) as b on a.VM = b.VM
AND b.rn = 1;
You could get a little closer to "I don't care" by changing:
(PARTITION BY VM ORDER BY @@SPID)
Or to a pseudo-random row (at a performance cost) using:
(PARTITION BY VM ORDER BY NEWID())
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Aaron Bertrand |
