'Adding label in AutoML for text classification

I am trying to create a text dataset in a Pipeline for a text classification but I believe I am doing it the wrong way or at least I don't get it. The csv passing only contains two columns message and label which is true or false.

Inside my pipeline I am creating dataset like this which I am not very sure how dataset is recognizing that column label is the independent variable.

dataset = gcp_aip.TextDatasetCreateOp(
    project = project # my project id,
    display_name = display_name # reference name,
    gcs_source  = src_uris # path to my data in gcs,
    import_schema_uri = aiplatform.schema.dataset.ioformat.text.single_label_classification, 
)

once created the dataset, i do training like this within the Pipeline

# training
model = gcp_aip.AutoMLTextTrainingJobRunOp(
    project = project,
    display_name = display_name,
    prediction_type = "classification",
    multi_label = False,   
    dataset = dataset.outputs["dataset"],
)

Not sure if creation and training is doing correctly since I never specified that label is my label column and needs to use message as a feature.

In vertex ai the dataset created look like this

enter image description here

But in my training section the results from the AutML, looks like this, dont know why, label with 0% is there, which makes me doubt about the insertion of the data

enter image description here



Solution 1:[1]

In preparation of CSV file, you don't need to specify which column is the feature and the label. Vertex AI's AutoML automatically reads the first column as the feature and the second column as the label. You may refer to this documentation for more details in preparation of CSV data.

Below is sample CSV file, all values under first column(column A) are detected to be the feature and all values under second column(column B) are the labels. enter image description here

You might need to check your CSV file and search for the word "label" on your second column and replace it with either "True" or "False" since based on your given data, you are only trying to have 2 labels which are "True" and "False". In addition, if you find the word "label" on your 2nd column and it doesn't have a value on its first column, then you just need to just remove the word "label".

In your provided screenshot here, there is a 1 count for the word "label", which means there is a "label" value existing on the 2nd column of your CSV data. enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Scott B