'Saving data with DataCatalog

I was looking at iris project example provided by kedro. Apart from logging the accuracy I also wanted to save the predictions and test_y as a csv.

This is the example node provided by kedro.

def report_accuracy(predictions: np.ndarray, test_y: pd.DataFrame) -> None:
    """Node for reporting the accuracy of the predictions performed by the
    previous node. Notice that this function has no outputs, except logging.
    """
    # Get true class index
    target = np.argmax(test_y.to_numpy(), axis=1)
    # Calculate accuracy of predictions
    accuracy = np.sum(predictions == target) / target.shape[0]
    # Log the accuracy of the model
    log = logging.getLogger(__name__)
    log.info("Model accuracy on test set: %0.2f%%", accuracy * 100)

I added the following to save the data.

data = pd.DataFrame({"target": target , "prediction": predictions})
data_set = CSVDataSet(filepath="data/test.csv")
data_set.save(data)

This works as intended, however, my question is "is it the kedro way of doing thing" ? Can I provide the data_set in catalog.yml and later save data to it? If I want to do it, how do I access the data_set from catalog.yml inside a node.

Is there a way to save data without creating a catalog inside a node like this data_set = CSVDataSet(filepath="data/test.csv") ? I want this in catalog.yml, if possible and if it follows kedro convention!.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source