'How to match/compare two dataframes searching for similar columns/rows

I have two dataframes:

df1 =

CRATER_ID LAT_CIRC_IMG LON_CIRC_IMG
01-1-000003 79.257599 211.909949
01-1-000004 78.696512 207.162344
01-1-000012 77.791422 186.586023
01-1-000013 76.655040 194.807812
01-1-000014 76.976285 195.615795

df2 =

CRATER_ID            
02-000239                
02-000311                             
02-000320               
02-000325                       
02-000443                       
   ...                         
25-002058                        
25-002387                        
25-002484                      
25-002650                 
26-001661

I need to search for the Crater_IDs from df2 in df1, but as you can see the numbers don't match perfectly, bc in df2 the middle numbers are missing. I need a Code that gives me a third dataframe with all Crater_IDs and Coordinates (like df1) but only with the IDs from df2. I already tried regular expressions and some other stuff but nothing seems to work.



Solution 1:[1]

You can use a regex to rework your CRATER_ID from df1:

# remove dash+number (only 1st occurrence)
df1['CRATER_ID'].str.replace(r'-\d+', '', n=1, regex=True)

output:

0    01-000003
1    01-000004
2    01-000012
3    01-000013
4    01-000014
Name: CRATER_ID, dtype: object

Then replace it in df1 or use it directly to merge:

df1.merge(df2,
          left_on=df1['CRATER_ID'].str.replace('-\d+', '', n=1, regex=True),
          right_on='CRATER_ID'
         )

output: None here as no match with the provided sample.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mozway