'Vertex AI Pipeline Failed Precondition

I have been following this video: https://www.youtube.com/watch?v=1ykDWsnL2LE&t=310s

Code located at: https://codelabs.developers.google.com/vertex-pipelines-intro#5 (I have done the last two steps as per the video which isn't an issue for google_cloud_pipeline_components version: 0.1.1)

I have created a pipeline in vertex ai which ran and used the following code to create the pipeline (from video not code extract in link above):

#run pipeline
response = api_client.create_run_from_job_spec(
    "tab_classif_pipeline.json", pipeline_root = PIPELINE_ROOT,
    parameter_values = {
    "project" : PROJECT_ID,
    "display_name" : DISPLAY_NAME
    }
)
    

and in the GCP logs I get the following error:

"google.api_core.exceptions.FailedPrecondition: 400 BigQuery Dataset location `eu` must be in the same location as the service location `us-central1`.

I get the error at the dataset_create_op stage:

    dataset_create_op = gcc_aip.TabularDatasetCreateOp(
    project = project, display_name = display_name, bq_source = bq_source
)

My dataset is configured in EU (the whole region) so I don't understand where us-central1 is coming from (or what the service location is?).

Here is the all the code I have used:

 PROJECT_ID = "marketingtown"
 BUCKET_NAME = f"gs://lookalike_model"
 from typing import NamedTuple
 import kfp
 from kfp import dsl
 from kfp.v2 import compiler
 from kfp.v2.dsl import (Artifact, Input, InputPath, Model, Output, 
                            OutputPath, ClassificationMetrics, 
 Metrics, component)
 from kfp.v2.components.types.artifact_types import Dataset
 from kfp.v2.google.client import AIPlatformClient
 from google.cloud import aiplatform
 from google_cloud_pipeline_components import aiplatform as gcc_aip

 #set environment variables
 PATH = %env PATH
 %env PATH = (PATH)://home/jupyter/.local/bin
 REGION = "europe-west2"
    
 #cloud storage path where artifact is created by pipeline
 PIPELINE_ROOT = f"{BUCKET_NAME}/pipeline_root/"
 PIPELINE_ROOT
 import time
 DISPLAY_NAME = f"lookalike_model_pipeline_{str(int(time.time()))}"
 print(DISPLAY_NAME)
 
@kfp.dsl.pipeline(name = "lookalike-model-training-v2", 
pipeline_root = PIPELINE_ROOT)

def pipeline(
    bq_source : str = f"bq://{PROJECT_ID}.MLOp_pipeline_temp.lookalike_training_set",
    display_name : str = DISPLAY_NAME,
    project : str = PROJECT_ID,
    gcp_region : str = "europe-west2",
    api_endpoint : str = "europe-west2-aiplatform.googleapis.com",
    thresholds_dict_str : str = '{"auPrc" : 0.3}'
):
    dataset_create_op = gcc_aip.TabularDatasetCreateOp(
        project = project, display_name = display_name, bq_source = bq_source
    )
    
    training_op = gcc_aip.AutoMLTabularTrainingJobRunOp(
        project=project,
        display_name=display_name,
        optimization_prediction_type="classification",
        budget_milli_node_hours=1000,
        column_transformations=[
            {"categorical": {"column_name": "agentId"}},
            {"categorical": {"column_name": "postcode"}},
            {"categorical": {"column_name": "isMobile"}},
            {"categorical": {"column_name": "gender"}},
            {"categorical": {"column_name": "timeOfDay"}},
            {"categorical": {"column_name": "sale"}},
        ],
        dataset=dataset_create_op.outputs["dataset"], #dataset from previous step
        target_column="sale",
    )
    
    #outputted evaluation metrics
    model_eval_task = classification_model_eval_metrics(
        project,
        gcp_region,
        api_endpoint,
        thresholds_dict_str,
        training_op.outputs["model"],
    )
    
    #if deployment threshold is mean, deploy
    with dsl.Condition(
        model_eval_task.outputs["dep_decision"] == "true",
        name="deploy_decision",
    ):
        
    endpoint_op = gcc_aip.EndpointCreateOp(
        project=project,
        location=gcp_region,
        display_name="train-automl-beans",
    )
        
    #deploys model to an endpoint
    gcc_aip.ModelDeployOp(
        model=training_op.outputs["model"],
        endpoint=endpoint_op.outputs["endpoint"],
        min_replica_count=1,
        max_replica_count=1,
        machine_type="n1-standard-4",
        )
   

     compiler.Compiler().compile(
        pipeline_func = pipeline, package_path = "tab_classif_pipeline.json"
    )

    #run pipeline
    response = api_client.create_run_from_job_spec(
        "tab_classif_pipeline.json", pipeline_root = PIPELINE_ROOT,
        parameter_values = {
        "project" : PROJECT_ID,
        "display_name" : DISPLAY_NAME
        }
    )


Solution 1:[1]

I solved this issue by adding the location to the TabularDatasetCreateJob:

    dataset_create_op = gcc_aip.TabularDatasetCreateOp(
    project=project,
    display_name=display_name, 
    bq_source=bq_source,
    location = gcp_region
)

I now have the same issue with the model training job but I have learnt that a lot of the functions in the above code take a location parameter, or default to us-central1. I will update if I get any further.

Solution 2:[2]

As @scottlucas confirmed, this question was solved by upgrading to the latest version of google-cloud-aiplatform that can be done through pip install --upgrade google-cloud-aiplatform.

Upgrading to the latest library ensures that all official documentations available to be used as reference, are aligned with the actual product.

Posting the answer as community wiki for the benefit of the community that might encounter this use case in the future.

Feel free to edit this answer for additional information.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 scott lucas
Solution 2