'ValueError: Missing column provided to 'parse_dates': 'date'

I am working on this ML project; here is a look of the training dataset

Now since the training dataset is really large I am trying to get 1% of the random data from training using the following code:

from numpy import float32
dtypes={'id': float32,
        'store_nbr':float32,
        'item_nbr':float32,
        'unit_sales':float32,
        'onpromotion': bool
}

def skip_row(row_idx):
  if row_idx==0:
    return False
  return random.random() > sample_fraction\
  # random.random randomly retuns numbers that lie between 0 and 1
  # So for 1% of the rows it returns false, meaning that it asks to keep the row and for the rest 99% of the data it returns True meaning that it it has to frop the value
random.seed(42)
# by setting the seed to a number it ensures that we get the same random outputs everytime we run this notebook

df= pd.read_csv(data_dir + "/train.csv",
                usecols=selected_cols,
                parse_dates=['date'], 
                dtype=dtypes,
                skiprows=skip_row)

However when I run this I am hit by the following error;

Solution 1:^[1]

Well, I just rechecked and it turns out that I hadn't selected the column. This solved the error.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Vishnu

'ValueError: Missing column provided to 'parse_dates': 'date'

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]