'Can't Schedule Query in BigQuery Via Python SDK
I'll preface this by saying I'm fairly new to BigQuery. I'm running into an issue when trying to schedule a query using the Python SDK. I used the example on the documentation page and modified it a bit but I'm running into errors.
Note that my query does use scripting to set some variables, and it's using a MERGE statement to update one of my tables. I'm not sure if that makes a huge difference.
def create_scheduled_query(dataset_id, project, name, schedule, service_account, query):
parent = transfer_client.common_project_path(project)
transfer_config = bigquery_datatransfer.TransferConfig(
destination_dataset_id=dataset_id,
display_name=name,
data_source_id="scheduled_query",
params={
"query": query
},
schedule=schedule,
)
transfer_config = transfer_client.create_transfer_config(
bigquery_datatransfer.CreateTransferConfigRequest(
parent=parent,
transfer_config=transfer_config,
service_account_name=service_account,
)
)
print("Created scheduled query '{}'".format(transfer_config.name))
I was able to successfully create a query with the function above. However the query errors out with the following message:
Error code 9 : Dataset specified in the query ('') is not consistent with Destination dataset '{my_dataset_name}'.
I've tried changing passing in "" as the dataset_id parameter, but I get the following error from the Python SDK:
google.api_core.exceptions.InvalidArgument: 400 Cannot create a transfer with parent projects/{my_project_name} without location info when destination dataset is not specified.
Interestingly enough I was able to successfully create this scheduled query in the GUI; the same query executed without issue.
I saw that the GUI showed the scheduled query's "Resource name" referenced a transferConfig, so I used the following command to see what that transferConfig looked like, to see if I could apply the same parameters using my Python script:
bq show --format=prettyjson --transfer_config {my_transfer_config}
Which gave me the following output:
{
"dataSourceId": "scheduled_query",
"datasetRegion": "us",
"destinationDatasetId": "",
"displayName": "test_scheduled_query",
"emailPreferences": {},
"name": "{REDACTED_TRANSFER_CONFIG_ID}",
"nextRunTime": "2021-06-18T00:35:00Z",
"params": {
"query": ....
So it looks like the GUI was able to use "" for destinationDataSetId but for whatever reason the Python SDK won't let me use that value.
Any help would be appreciated, since I prefer to avoid the GUI whenever possible.
UPDATE:
This does appear to be related to the scripting I used in my query. I removed the scripts from the query and it's working. I'm going to leave this open because I feel like this should be possible using the SDK since the query with scripting works in the console without issue.
Solution 1:[1]
This same thing also threw me through a loop but I managed to figure out what was wrong. The problem is with the
parent = transfer_client.common_project_path(project)
line that is given in the example query. By default, this returns something of the form projects/{project_id}. However, the CreateTransferConfigRequest documentation says of the parent parameter:
The BigQuery project id where the transfer configuration should be created. Must be in the format projects/{project_id}/locations/{location_id} or projects/{project_id}. If specified location and location of the destination bigquery dataset do not match - the request will fail.
Sure enough, if you use the projects/{project_id}/locations/{location_id} format instead, it resolves the error and allows you to pass a null destination_dataset_id.
Solution 2:[2]
I had the exact same issue. the fix for the issue is as below.
The below method returns Projects/{projectid}
parent = transfer_client.common_project_path(project_id)
instead use the below method , which returns projects/{project}/locations/{location}
parent = transfer_client.common_location_path(project_id , "EU")
I had tried with the above change , i am able to schedule a script in BQ.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Teddy |
| Solution 2 | Naeem Khan |
