'Use a file to search lines in another file in Python
I have two files: one has one word per line and the other has 3; they look like this:
List file:
Gene1
Gene2
Gene3
Gene4
Master file:
Gene8 Gene3 2.1
Gene10 Gene5 3
Gene1 Gene20 2.1
Gene3 Gene2 3.3
Gene48 Gene95 2
So what I want is to use the List file to search and extract the lines in the Master file that match with the List and write them in a third New file. So the desired output would be:
New file:
Gene8 Gene3 2.1
Gene1 Gene20 2.1
Gene3 Gene2 3.3
I've tried using regular expressions to use re.search, but I didn't seem to get it correct as it was always writing the whole document in case of matches, rather than the individual matching lines.
I tried loading the files and converting them to string and with a double for loop but it looks like it is matching letter by letter intead of by words, which makes the output file quite difficult to manage.
Yes, I saw the post Use Python to search lines of file for list entries but I cant make it work properly and the resulting files need still more formating that make the process complicated and I seem to be losing some info (List file has thousands of entries and Master file is several hundred of thousand of lines so it is not easy to keep track of).
I come to you as I know there should be a way more efficient and easy way to do because it needs to be run several times
Solution 1:[1]
This should do it. I used both of the sample data files that you provided and the code below provides the desired output that you posted. If this process is going to be repeated often and you need to speed it up then you might want to consider using a different search algorithm. If this is the case then just let me know what operations will be most common(inserting into list, searching the list, deleting items in the list), and we can use the most appropriate search algorithm.
# open the list of words to search for
list_file = open('list.txt')
search_words = []
# loop through the words in the search list
for word in list_file:
# save each word in an array and strip whitespace
search_words.append(word.strip())
list_file.close()
# this is where the matching lines will be stored
matches = []
# open the master file
master_file = open('master.txt')
# loop through each line in the master file
for line in master_file:
# split the current line into array, this allows for us to use the "in" operator to search for exact strings
current_line = line.split()
# loop through each search word
for search_word in search_words:
# check if the search word is in the current line
if search_word in current_line:
# if found then save the line as we found it in the file
matches.append(line)
# once found then stop searching the current line
break
master_file.close()
# create the new file
new_file = open('new_file.txt', 'w+')
# loop through all of the matched lines
for line in matches:
# write the current matched line to the new file
new_file.write(line)
new_file.close()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Derek Morgan |
