'Getting error when trying to do vlookup with pandas python

So here is what I am trying to do. I have a large data frame named newdf. It has several rows, but the relevant ones for this are year, and product name. I need to count the number of times the same product names appear in each year (from 2018 to 2021), and create a new dataframe that would look like below.

Product Name 2018 2019 20120 20121
abc 0 5 10 8
xyz 2 0 0 5

Here is what I have done so far

df_target = pd.DataFrame({'Product Name': newdf['Product Name']}) #copied only the product name column to new dataframe df_target

df_target.drop_duplicates(subset= 'Product Name', keep='first') # deleted duplicates from this dataframe.

    df_target["2018"]=""
    df_target["2019"]=""   #adding empty columns to the dataframe where results can later be added
    df_target["2020"]=""
    df_target["2021"]=""


    df_target.set_index("Product Name",inplace = True) #Setting Product Name as index

    df_2018 = newdf.query('YEAR == "2018"')
    df_2019 = newdf.query('YEAR == "2019"')
    df_2020 = newdf.query('YEAR == "2020"') #creating new dataframes for each year by filtering the original one
    df_2021 = newdf.query('YEAR == "2021"')
  

    counts_2018 = pd.DataFrame(df_2018.Product Name.value_counts().reset_index())
    counts_2019 = pd.DataFrame(df_2019.Product Name.value_counts().reset_index())
    counts_2020 = pd.DataFrame(df_2020.Product Name.value_counts().reset_index())
    counts_2021 = pd.DataFrame(df_2021.Product Name.value_counts().reset_index())  # Counting the number of times a product number appears in each year


    counts_2018.columns = ['Product Name', ' 2018']
    counts_2019.columns = ['Product Name', ' 2019']
    counts_2020.columns = ['Product Name', ' 2020']
    counts_2021.columns = ['Product Name', ' 2021'] # Labelling the columns in the count dataframes.


    df_target["2018"] = df_target.index.map(counts_2018["2018"])  # This last line of code is where I get the error. When I try to map data from the count data frame to the target one that I created earlier. The error is below

KeyError Traceback (most recent call last)C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)2392 try:-> 2393 return self._engine.get_loc(key)2394 except KeyError:

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5239)()

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5085)()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas_libs\hashtable.c:20405)()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas_libs\hashtable.c:20359)()

KeyError: '2018'

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last) <ipython-input-18-cf92c30b79a3> in <module>() ----> 1 df_target["2018"] = df_target.index.map(counts_2018["2018"])

C:\Anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key) 2060 return self._getitem_multilevel(key) 2061 else: -> 2062 return self._getitem_column(key) 2063 2064 def _getitem_column(self, key):

C:\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key) 2067 # get column 2068 if self.columns.is_unique: -> 2069 return self._get_item_cache(key) 2070 2071 # duplicate columns & possible reduce dimensionality

C:\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item) 1532 res = cache.get(item) 1533 if res is None: -> 1534 values = self._data.get(item) 1535 res = self._box_item_values(item, values) 1536 cache[item] = res

C:\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath) 3588 3589 if not isnull(item): -> 3590 loc = self.items.get_loc(item) 3591 else: 3592 indexer = np.arange(len(self.items))[isnull(self.items)]

C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 2393 return self._engine.get_loc(key) 2394 except KeyError: -> 2395 return self._engine.get_loc(self._maybe_cast_indexer(key)) 2396 2397 indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5239)()

pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5085)()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas_libs\hashtable.c:20405)()

pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas_libs\hashtable.c:20359)()

KeyError: '2018'


The error is big, and I cant find a way to resolve it. Can anyone please advice?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source