'How to match/compare two dataframes searching for similar columns/rows
I have two dataframes:
df1 =
| CRATER_ID | LAT_CIRC_IMG | LON_CIRC_IMG |
|---|---|---|
| 01-1-000003 | 79.257599 | 211.909949 |
| 01-1-000004 | 78.696512 | 207.162344 |
| 01-1-000012 | 77.791422 | 186.586023 |
| 01-1-000013 | 76.655040 | 194.807812 |
| 01-1-000014 | 76.976285 | 195.615795 |
df2 =
CRATER_ID
02-000239
02-000311
02-000320
02-000325
02-000443
...
25-002058
25-002387
25-002484
25-002650
26-001661
I need to search for the Crater_IDs from df2 in df1, but as you can see the numbers don't match perfectly, bc in df2 the middle numbers are missing. I need a Code that gives me a third dataframe with all Crater_IDs and Coordinates (like df1) but only with the IDs from df2. I already tried regular expressions and some other stuff but nothing seems to work.
Solution 1:[1]
You can use a regex to rework your CRATER_ID from df1:
# remove dash+number (only 1st occurrence)
df1['CRATER_ID'].str.replace(r'-\d+', '', n=1, regex=True)
output:
0 01-000003
1 01-000004
2 01-000012
3 01-000013
4 01-000014
Name: CRATER_ID, dtype: object
Then replace it in df1 or use it directly to merge:
df1.merge(df2,
left_on=df1['CRATER_ID'].str.replace('-\d+', '', n=1, regex=True),
right_on='CRATER_ID'
)
output: None here as no match with the provided sample.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mozway |
