'Importing specific columns of a csv file into a 2D Numpy array format

a university assignment has us tasked with constructing code that analyses CSV files and selects certain columns based on their header. Here is the question:


Write a function load_metrics(filename) that given filename (a string, always a csv file with same columns as given in the sample metric data file), extract columns in the order as follows:

created_at
tweet_ID
valence_intensity
anger_intensity
fear_intensity
sadness_intensity
joy_intensity
sentiment_category
emotion_category

The extracted data should be stored in the NumPy array format (i.e., produces <class 'numpy.ndarray'>). No other post-processing is needed at this point. The resulting output will now be known as data.

Note: when importing, set the delimiter to be ',' (i.e., a comma) and the quotechar to be '"' (i.e., a double quotation mark).


And here is the code ive written so far:

import csv
import numpy as np

def load_metrics(filename):

    """Loads data from csv files"""

    col_list = ["created_at","tweet_ID","valence_intensity",
                "anger_intensity","fear_intensity","sadness_intensity",
                "joy_intensity","sentiment_category",
                "emotion_category"]

    with open(filename, 'r') as csvfile:
        data = np.loadtxt(csvfile, delimeter=",", quotechar='"', usecols=col_list)
    
    return data
    

any improvements I can make? Thank you.



Solution 1:[1]

I haven't tried it myself but from the documentation it looks like usecols of numpy.loadtxt expects a sequence of integers enter image description here

https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html

You need to find which column numbers are the ones you want out of all the columns

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Edward