'Translate SPSS output into R script
I am trying to translate the following SPSS output into R scripts, but given my lack of experience using SPSS, I'm struggling to translate exactly what was done. As far as I'm aware, the steps were intended to:
- select distinct by ID and Dates
- Identify duplicate cases
SORT CASES BY ID(A) Date(A).
MATCH FILES
/FILE=*
/BY ID Date
/FIRST=PrimaryFirst
/LAST=PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE MatchSequence=1-PrimaryLast.
ELSE.
COMPUTE MatchSequence=MatchSequence+1.
END IF.
LEAVE MatchSequence.
FORMATS MatchSequence (f7).
COMPUTE InDupGrp=MatchSequence>0.
SORT CASES InDupGrp(D).
MATCH FILES
/FILE=*
/DROP=PrimaryFirst InDupGrp MatchSequence.
VARIABLE LABELS PrimaryLast 'Indicator of each last matching case as Primary'.
VALUE LABELS PrimaryLast 0 'Duplicate Case' 1 'Primary Case'.
VARIABLE LEVEL PrimaryLast (ORDINAL).
FREQUENCIES VARIABLES=PrimaryLast.
EXECUTE.
Any advice or assistance to translate the above segment would be appreciated.
Solution 1:[1]
The syntax (not output ;)) you posted does indeed find and mark the rows where the same combination of ID and Date appears in more then one row. You can replicate this easily in R, start by looking up duplicated() function.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | eli-k |
