'Vertex AI batch prediction location
When I initiate a batch prediction job on Vertex AI of google cloud, I have to specify a cloud storage bucket location. Suppose I provided the bucket location, 'my_bucket/prediction/', then the prediction files are stored in something like: gs://my_bucket/prediction/prediction-test_model-2022_01_17T01_46_39_898Z, which is a subdirectory within the bucket location I provided. The prediction files are stored within that subdirectory and are named:
prediction.results-00000-of-00002
prediction.results-00001-of-00002
Is there any way to programmatically get the final export location from the batch prediction name, id or any other parameter as shown below in the details of the batch prediction job?

Solution 1:[1]
Not only with those parameters because and you can run the same job multiple times, new folders based on the execution date will be create, but you can get it from the API using your job id (don't forget to set the credentials by GOOGLE_APPLICATION_CREDENTIALS if you are not running on cloud sdk):
Get the output directory by the Vertex AI - Batch prediction API by the job ID:
curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) "https://us-central1-aiplatform.googleapis.com/v1/projects/[PROJECT_NAME]/locations/us-central1/batchPredictionJobs/[JOB_ID]"
Output: (Get the value from gcsOutputDirectory )
{
...
"gcsOutputDirectory": "gs://my_bucket/prediction/prediction-test_model-2022_01_17T01_46_39_898Z"
...
}
EDIT: Getting batchPredictionJobs via Python API:
from google.cloud import aiplatform
#-------
def get_batch_prediction_job_sample(
project: str,
batch_prediction_job_id: str,
location: str = "us-central1",
api_endpoint: str = "us-central1-aiplatform.googleapis.com",
):
client_options = {"api_endpoint": api_endpoint}
client = aiplatform.gapic.JobServiceClient(client_options=client_options)
name = client.batch_prediction_job_path(
project=project, location=location, batch_prediction_job=batch_prediction_job_id
)
response = client.get_batch_prediction_job(name=name)
print("response:", response)
#-------
get_batch_prediction_job_sample("[PROJECT_NAME]","[JOB_ID]","us-central1","us-central1-aiplatform.googleapis.com")
Solution 2:[2]
Just adding a cherry on top of @ewertonvsilva's answer...
If you are following Google's example on programmatically getting the batch prediction,
The object response from response = client.get_batch_prediction_job(name=name) has the output_config attribute that you need. All you need to do is to call response.output_info.gcs_output_directory once the prediction job is complete.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Donny |
