'Filter rows in csv file based on another csv file and save the filtered data in a new file

Good day all

so I was trying to filter file2 based on file1, where file1 is a subset from file2. But file2 has a description column that I need to be able to analyse the data in file1. What I'm trying to do is to filter file2, and get only the titles that are in file1 with their description. I tried this, but I'm not quite sure if it is totally right, plus it is working but I don't get any file saved on my computer.

import re
import mmap
from pandas import DataFrame
output = []
with open('file2.csv', 'r') as f2:
    mm = mmap.mmap(f2.fileno(), 0, access=mmap.ACCESS_READ)
    for line in open('file1.csv', 'r'):
        Title = bytes("")
        nameMatch = re.search(Title, mm)
        if nameMatch:
            # output.append(str(""))
            fulltypes = [ 'O*NET-SOC Code', 'Title' , 'Discription' ]
            final = DataFrame(columns=fulltypes)
            final.to_csv(output.append(str("")))
    mm.close()

Any idea?



Solution 1:[1]

Assuming your csv files aren't too huge, you can do this by reading both into pandas and using the join method. Take the following example:

import pandas as pd

file1 = pd.DataFrame({'Title': ['file1.csv', 'file2.csv', 'file3.csv']})
file2 = pd.DataFrame({'Title': ['file1.csv', 'file2.csv', 'file4.csv'],
                      'Description': ['List of files', 'List of descriptions', 'Something unrelated']})

joined = pd.merge(file1, file2, left_on='Title', right_on='Title')

print joined

This prints:

       Title           Description
0  file1.csv         List of files
1  file2.csv  List of descriptions

i.e. just the files that exist in both.

As pandas can natively read a csv into a dataframe, in your case you could do:

import pandas as pd

file1 = pd.DataFrame.from_csv('file1.csv')
file2 = pd.DataFrame.from_csv('file2.csv')

joined = pd.merge(file1, file2, left_on='Title', right_on='Title')

joined.to_csv('Output.csv', index=False)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 asongtoruin