'How do I use pandas to compare a list of numbers to a column in an excel document?

I have this simple script that creates a list by taking the values out of the column of excel_1 and compares it to the values of a column in excel_2. I want it to return any rows that aren't in the column 'ext_addr_id' in excel_2.

If I hard code the list, it creates the document just fine. But when I create the list with a for loop, then it returns the whole list and doesnt seem to do the comparison, or at least when it does the comparison, it doesn't recognize the numbers in the list as the same as the numbers in excel_2.

I've tried everything and I'm out of ideas, what am I missing?

import pandas as pd
import os

filePath = r'/outputs/coupa_address_id_translation_results.xlsx'
if os.path.exists(filePath):
    os.remove(filePath)

cust_results = pd.read_excel(r'./data_sources/excel_1.xlsx')
# maybe convert to csv to get access to other libs
# aws_results = pd.read_excel(r'./data_sources/excel_2.xlsx')
aws_results = pd.read_csv(r'./data_sources/excel_2.csv')

missing_addresses = pd.DataFrame()
cust_addr_id_list = []

for cust_row, cust_col in cust_results.iterrows():
    cust_ext_addr_id_column_number = 0  # index begins at 0
    cust_addr_id_list.append(str(cust_results.values[cust_row][cust_ext_addr_id_column_number]).strip())
    # cust_addr_id_list.append(cust_results.values[cust_row][cust_ext_addr_id_column_number])
    
print(cust_addr_id_list)
filter = ~aws_results['ext_addr_id'].isin(cust_addr_id_list)
# filter = ~aws_results['ext_addr_id'].isin(['1253', '2177', '2258'])

# print(filter)

missing_addresses = aws_results[filter]

missing_addresses.to_excel("./outputs/coupa_address_id_translation_results.xlsx")

Thanks for any help you can provide

EDIT: Responding to Durtal re: the contents of the cust_addr_id_list, I've tried it with both strings and integers, and actually this was the crux of my question. The outputs are

['2339', '2534', '2234', '2340', '2363', '2297', '2834', '2371', '2190', '2373', '2284', '2287', '1821', '1898', '2342', '2374', '2311', '2343', '2344', '2376', '2193', '2192', '2345', '2346', '2191', '2379', '2383', '2293', '1887', '2513', '675', '1888', '2235', '2269', '2270', '2272', '2224', '2223', '1769', '1897', '1771', '2286', '2537', '1899', '2382', '2225', '2283', '2276', '2285', '683', '2161', '2271', '2174', '2173', '1781', '2335', '2347', '2317', '1780', '2521', '1906', '2349', '2328', '2348', '2327', '1782', '2207', '2210', '2840', '2208', '2260', '2184', '2321', '2183', '2182', '2322', '2333', '2350', '2692', '2717', '2690', '2720', '2718', '2719', '2310', '2266', '2313', '1893', '2265', '2314', '2337', '2308', '2309', '2315', '2351', '2352', '1541', '1890', '2298', '1889', '1891', '2240', '2242', '2370', '2217', '2244', '2501', '2301', '2331', '2319', '2238', '1911', '2329', '2689', '2728', '2729', '2688', '2727', '2730', '2731', '2723', '2290', '2687', '2724', '2725', '2726', '2330', '2166', '2168', '2167', '2257', '2169', '1910', '2282', '2209', '2336', '2517', '2171', '2362', '2307', '1883', '2197', '2299', '2320', '1913', '2303', '2318', '2316', '2858', '1903', '1901', '1902', '2300', '1905', '2304', '2326', '2302', '2250', '2393', '2355', '2356', '2366', '2252', '2357', '1894', '2375', '2843', '1909', '2359', '2377', '1895', '2433', '2196', '1907', '1738', '1786', '2237', '2175', '2261', '1784', '2365', '2236', '2259', '2323', '1846', '2443', '2248', '2369', '2204', '2253', '2216', '2212', '2457', '2214', '2213', '1790', '2258', '1920', '2215', '2177', '2178', '2188', '2189', '2636', '2754', '2202', '2263', '2427', '2222', '2262', '2220', '2281', '1900', '2305', '1904', '1908', '2221', '2438', '2360', '2425', '2334', '2761', '2295', '2312', '2361', '2288', '2650', '2543', '2251', '2860', '2735', '2368', '2338', '2228', '2227', '2258']

and

[2339, 2534, 2234, 2340, 2363, 2297, 2834, 2371, 2190, 2373, 2284, 2287, 1821, 1898, 2342, 2374, 2311, 2343, 2344, 2376, 2193, 2192, 2345, 2346, 2191, 2379, 2383, 2293, 1887, 2513, 675, 1888, 2235, 2269, 2270, 2272, 2224, 2223, 1769, 1897, 1771, 2286, 2537, 1899, 2382, 2225, 2283, 2276, 2285, 683, 2161, 2271, 2174, 2173, 1781, 2335, 2347, 2317, 1780, 2521, 1906, 2349, 2328, 2348, 2327, 1782, 2207, 2210, 2840, 2208, 2260, 2184, 2321, 2183, 2182, 2322, 2333, 2350, 2692, 2717, 2690, 2720, 2718, 2719, 2310, 2266, 2313, 1893, 2265, 2314, 2337, 2308, 2309, 2315, 2351, 2352, 1541, 1890, 2298, 1889, 1891, 2240, 2242, 2370, 2217, 2244, 2501, 2301, 2331, 2319, 2238, 1911, 2329, 2689, 2728, 2729, 2688, 2727, 2730, 2731, 2723, 2290, 2687, 2724, 2725, 2726, 2330, 2166, 2168, 2167, 2257, 2169, 1910, 2282, 2209, 2336, 2517, 2171, 2362, 2307, 1883, 2197, 2299, 2320, 1913, 2303, 2318, 2316, 2858, 1903, 1901, 1902, 2300, 1905, 2304, 2326, 2302, 2250, 2393, 2355, 2356, 2366, 2252, 2357, 1894, 2375, 2843, 1909, 2359, 2377, 1895, 2433, 2196, 1907, 1738, 1786, 2237, 2175, 2261, 1784, 2365, 2236, 2259, 2323, 1846, 2443, 2248, 2369, 2204, 2253, 2216, 2212, 2457, 2214, 2213, 1790, 2258, 1920, 2215, 2177, 2178, 2188, 2189, 2636, 2754, 2202, 2263, 2427, 2222, 2262, 2220, 2281, 1900, 2305, 1904, 1908, 2221, 2438, 2360, 2425, 2334, 2761, 2295, 2312, 2361, 2288, 2650, 2543, 2251, 2860, 2735, 2368, 2338, 2228, 2227, 2258]

depending on whether I wrap (cust_results.values[cust_row][cust_ext_addr_id_column_number] in a str() (such as in the example) or not.

In neither case does it work. However, if I take a few entries that appear in both lists and pass them directly into the filter variable, such as the # filter = line commented out, then the script works and returns the correct lines.



Solution 1:[1]

Your Code has several issues, but we will stick to your logic for this time.

It is a bad idea to use a name of a built-in-function for a variable. Use filter_ instead of filter, or better: give it an appropriate name like index_not_matched.

A recipe for self-help:

If the whole dataframe is returned, this probably means, that your filter-variable contains only True. Check this and than inspect why this happens!

Obviously filter = ~aws_results['ext_addr_id'].isin(cust_addr_id_list) is not working as expected. Inspect why!

Are the items of both lists of the same datatype? Look at aws_results.dtypes and [type(el) for el in cust_addr_id_list].

Possible types are: String, Integer, Float. Make sure both lists are of the same type. Then ensure that both list contain the same values. This should lead you where you want.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Durtal