'PROC SQL: Warning variable already exists on multiple dataset join
I have this data check integrity code for an oncology study I'm working on. This is ostensibly to confirm TU,TR and RS are consistent.
proc sql ;
create table tu_tr_rs as
select tu.*,tr.*,rs.*
from trans.tu as tu
left join trans.tr as tr on tu.usubjid=tr.usubjid and tu.TULNKID
=tr.TRLNKID and tu.tudtc=tr.trdtc
left join trans.rs as rs on tr.usubjid=rs.usubjid and tr.trdtc=
rs.rsdtc
;
quit;
However, when I run this code I get the warning
"Variable XXXX already exists on file WORK.TU_TR_RS."
When I add the feedback option to PROC SQL to get a more granular look I get this
So I know if it's one variable that brings this warning up you can use a rename/DROP combination to work around it but for this case is it just the case that I have to explicitly state the variables for each dataset in the select statement or is there something fundamentally wrong with the code?
Solution 1:[1]
Yes, if you want to select columns with the same name from 2 (or more) data sets, you simply need to select them explicitly and give them distinct names. Something like this:
create table tu_tr_rs as
select
tu.ColA as tu_ColA
,tu.ColB as tu_ColB
/* etc */
,tr.ColA as tr_ColA
,tr.ColB as tr_ColB
/* etc */
,rs.ColA as rs_ColA
,rs.ColB as rs_ColB
/* etc */
from trans.tu as tu
/* etc */
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Caledon |

