'How to finding Intersect of array filled columns in two DataFrames
Problem Statement
I have two corresponding DataFrames, one is employee table, one is job catalog table, one of their columns is filled with array, I want to find and intersection of two array in the skill_set column from two DataFrames (I've using np.intersect1d) and return the value to employee DataFrame for each id in employee DataFrame.
So 1 id in employee DataFrame will be looped to find intersection of all job_name in job catalog DataFrame in same job rank with the current employee job rank. Final output is meant to find 5 job with highest amount of intersect (using len since np.intersect1d returns a list) from job DataFrames.
employee_data
+----+--------+----------+----------+
| id|emp_name| job_rank| skill_set|
+----+--------+----------+----------+
| 2| c | 1|[a1,a2,a3]|
| 2| a | 2|[a1,a2,a3]|
| 1| c | 3|[a1,a2,a3]|
| 1| j | 4|[a1,a2,a3]|
| 3| k | 5|[a1,a2,a3]|
| 1| l | 6|[a1,a2,a3]|
+----+--------+----------+----------+
job_data
+----+--------+----------+----------+
| id|job_name| job_rank| skill_set|
+----+--------+----------+----------+
| 2| c | 1|[a1,a2,a3]|
| 2| a | 2|[a1,a2,a3]|
| 1| c | 1|[a1,a2,a3]|
| 1| b | 4|[a1,a2,a3]|
| 3| r | 3|[a1,a2,a3]|
| 1| a | 6|[a1,a2,a3]|
| 1| m | 2|[a1,a2,a3]|
| 1| g | 4|[a1,a2,a3]|
+----+--------+----------+----------+
Solution 1:[1]
I can give you an idea how you can solve this, considering the emp data and job data are not too big.
Do a full join (or inner join as you need) on employee_data and job_data. So your new joined data will have len(employee_data) * len(job_data) rows and will have skills from both tables including employee details
| emp_details | emp_skills | job_details | job_skills |
Operate on this table to find which of emp_skills matches with job_skills with (lambda) functions. With functions you are easily operate on array objects.
Select the emp details from the row
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
