'Getting error when trying to do vlookup with pandas python
So here is what I am trying to do. I have a large data frame named newdf. It has several rows, but the relevant ones for this are year, and product name. I need to count the number of times the same product names appear in each year (from 2018 to 2021), and create a new dataframe that would look like below.
Product Name | 2018 | 2019 | 20120 | 20121 |
---|---|---|---|---|
abc | 0 | 5 | 10 | 8 |
xyz | 2 | 0 | 0 | 5 |
Here is what I have done so far
df_target = pd.DataFrame({'Product Name': newdf['Product Name']}) #copied only the product name column to new dataframe df_target
df_target.drop_duplicates(subset= 'Product Name', keep='first') # deleted duplicates from this dataframe.
df_target["2018"]=""
df_target["2019"]="" #adding empty columns to the dataframe where results can later be added
df_target["2020"]=""
df_target["2021"]=""
df_target.set_index("Product Name",inplace = True) #Setting Product Name as index
df_2018 = newdf.query('YEAR == "2018"')
df_2019 = newdf.query('YEAR == "2019"')
df_2020 = newdf.query('YEAR == "2020"') #creating new dataframes for each year by filtering the original one
df_2021 = newdf.query('YEAR == "2021"')
counts_2018 = pd.DataFrame(df_2018.Product Name.value_counts().reset_index())
counts_2019 = pd.DataFrame(df_2019.Product Name.value_counts().reset_index())
counts_2020 = pd.DataFrame(df_2020.Product Name.value_counts().reset_index())
counts_2021 = pd.DataFrame(df_2021.Product Name.value_counts().reset_index()) # Counting the number of times a product number appears in each year
counts_2018.columns = ['Product Name', ' 2018']
counts_2019.columns = ['Product Name', ' 2019']
counts_2020.columns = ['Product Name', ' 2020']
counts_2021.columns = ['Product Name', ' 2021'] # Labelling the columns in the count dataframes.
df_target["2018"] = df_target.index.map(counts_2018["2018"]) # This last line of code is where I get the error. When I try to map data from the count data frame to the target one that I created earlier. The error is below
KeyError Traceback (most recent call last)C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)2392 try:-> 2393 return self._engine.get_loc(key)2394 except KeyError:
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5239)()
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5085)()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas_libs\hashtable.c:20405)()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas_libs\hashtable.c:20359)()
KeyError: '2018'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last) <ipython-input-18-cf92c30b79a3> in <module>() ----> 1 df_target["2018"] = df_target.index.map(counts_2018["2018"])
C:\Anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key) 2060 return self._getitem_multilevel(key) 2061 else: -> 2062 return self._getitem_column(key) 2063 2064 def _getitem_column(self, key):
C:\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key) 2067 # get column 2068 if self.columns.is_unique: -> 2069 return self._get_item_cache(key) 2070 2071 # duplicate columns & possible reduce dimensionality
C:\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item) 1532 res = cache.get(item) 1533 if res is None: -> 1534 values = self._data.get(item) 1535 res = self._box_item_values(item, values) 1536 cache[item] = res
C:\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath) 3588 3589 if not isnull(item): -> 3590 loc = self.items.get_loc(item) 3591 else: 3592 indexer = np.arange(len(self.items))[isnull(self.items)]
C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 2393 return self._engine.get_loc(key) 2394 except KeyError: -> 2395 return self._engine.get_loc(self._maybe_cast_indexer(key)) 2396 2397 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5239)()
pandas_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas_libs\index.c:5085)()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas_libs\hashtable.c:20405)()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas_libs\hashtable.c:20359)()
KeyError: '2018'
The error is big, and I cant find a way to resolve it. Can anyone please advice?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|