'How to get a list of two words before or after a keyword in python?
I collected this data and I am trying to identify if the keyword exact what are the two word before it and after it
data = pd.read_csv( 'jobs.csv')
view(data)
| Job | Discerption |
|---|---|
| Engineer | the job requires x,y,z..... |
| Driver | this job need a high-school and Communication skills |
The data length is about 10k
For example the keyword "Communication" Can i find the words before and after Communication and make the results look like this
| Job | Discerption | after | before |
|---|---|---|---|
| Engineer | the job requires x,y,z | NA | NA |
| Driver | this job need a high-school and Communication skills | skills | high-school, and |
Na because the keyword doesn't exist
I tired pandas and regex but nothing is working for me :/
I would really appreciate the help
Solution 1:[1]
You can use Series.map to map a column into another column by applying a function to every element.
If an element is a list of words, you can use list.index to find the position of the keyword you're looking for, then list slicing sentence[i-2:i] to get the two words before a given index.
import pandas as pd
data = pd.DataFrame({
'Job': ['Engineer', 'Driver'],
'Description': ['the job requires x,y,z', 'this job need a high-school and Communication skills']
})
def get_two_words_before(sentence, word):
sentence = sentence.split()
if word in sentence:
i = sentence.index(word)
return sentence[i-2:i]
else:
return []
def get_two_words_after(sentence, word):
sentence = sentence.split()
if word in sentence:
i = sentence.index(word)
return sentence[i+1:i+3]
else:
return []
data['before'] = data['Description'].map(lambda x: get_two_words_before(x, 'Communication'))
data['after'] = data['Description'].map(lambda x: get_two_words_after(x, 'Communication'))
print(data)
Output:
Job Description before after
0 Engineer the job requires x,y,z [] []
1 Driver this job need a hig... [high-school, and] [skills]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Stef |
