'pandas to gbq claims a schema mismatch while the schema's are exactly the same. On github all the issues are claimed to have been solved in 2017
I am trying to append a table to a different table through pandas, pulling the data from BigQuery and sending it to a different BigQuery dataset. While the table schema is exactly the same i get the error " "Please verify that the structure and " pandas_gbq.gbq.InvalidSchema: Please verify that the structure and data types in the DataFrame match the schema of the destination table."
This error occurred earlier where I went for table overwrites but in this case the datasets are too large to do that (and that is not a sustainable solution).
df = pd.read_gbq(query, project_id="my-project", credentials=bigquery_key,
dialect='standard')
pd.io.gbq.to_gbq(df, dataset, projectid,
if_exists='append',
table_schema=[{'name': 'Date','type': 'STRING'},
{'name': 'profileId','type': 'STRING'},
{'name': 'Opco','type': 'STRING'},
{'name': 'country','type': 'STRING'},
{'name': 'deviceType','type': 'STRING'},
{'name': 'userType','type': 'STRING'},
{'name': 'users','type': 'INTEGER'},
{'name': 'sessions','type': 'INTEGER'},
{'name': 'bounceRate','type': 'FLOAT'},
{'name': 'sessionsPerUser','type': 'FLOAT'},
{'name': 'avgSessionDuration','type': 'FLOAT'},
{'name': 'pageviewsPerSession','type': 'FLOAT'}
],
credentials=bigquery_key)
The schema in BigQuery is as follows:
Date STRING
profileId STRING
Opco STRING
country STRING
deviceType STRING
userType STRING
users INTEGER
sessions INTEGER
bounceRate FLOAT
sessionsPerUser FLOAT
avgSessionDuration FLOAT
pageviewsPerSession FLOAT
I then get the following error:
Traceback (most recent call last): File "..file.py", line 63, in
<module>
main()
File "..file.py", line 57, in main
updating_general_data(bigquery_key)
File "..file.py", line 46, in updating_general_data
credentials=bigquery_key)
File
"..\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\io\gbq.py",
line 162, in to_gbq
credentials=credentials, verbose=verbose, private_key=private_key)
File
"..\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas_gbq\gbq.py",
line 1141, in to_gbq
"Please verify that the structure and " pandas_gbq.gbq.InvalidSchema: Please verify that the structure and
data types in the DataFrame match the schema of the destination table.
To me it seems that there is a 1 on 1 match. I've seen other threads talk about this and these threads are mainly talking about date formats even though the date format is already a string in this case and is then with the table_schema still made as string.
Solution 1:[1]
I had a column in my dataframe called "No." Deleting the period got rid of the issue and the schema was inferred.
Solution 2:[2]
I had real trouble with this, and fixed it by getting pandas-gbq to create the database rather than making it in UI and trying to match the schema
Solution 3:[3]
Ultimate solution to this is instead of manually specifying schema which can always be prone to type casting/naming errors, Its always best top get the schema from the very table. So have a client using latest version of API:
from google.cloud import bigquery
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file(
'credentials.json')
project_id = 'your_project_id',
client = bigquery.Client(credentials= credentials,project=project_id)
Get the table you want to write/append to:
table = client.get_table('your_dataset.your_table')
table
Generate schema from the table:
generated_schema = [{'name':i.name, 'type':i.field_type} for i in table.schema]
generated_schema
Rename your dataframe accordingly:
data.columns = [i.name for i in table.schema]
Pass the same schema while pushing it to BigQuery:
data.to_gbq(project_id = 'your_project_id',
destination_table = 'your_dataset.your_table',
credentials = service_account.Credentials.from_service_account_file(
'credentials.json'),
table_schema = generated_schema,
progress_bar = True,
if_exists = 'replace')
Solution 4:[4]
Most likely, the problem arises because the column names in the DataFrame and Schema do not match
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | aleinikov |
| Solution 2 | Matthew Kaye |
| Solution 3 | |
| Solution 4 | Oleksandr Nagornyi |
