'Best way to get pub sub json data into bigquery
I am currently trying to ingest numerous types of pub sub-data (JSON format) into GCS and BigQuery using a cloud function. I am just wondering what is the best way to approach this?
At the moment I am just dumping the events to GCS (each even type is on its own directory path) and was trying to create an external table but there are issues since the JSON isn't newline delimited.
Would it be better just to write the data as JSON strings in BQ and do the parsing in BQ?
Solution 1:[1]
With BigQuery, you have a brand new type name JSON. It helps you to query more easily JSON data type. It could be the solution if you store your event in BigQuery.
About your questions about the use of Cloud Functions, it depends. If you have a few events, Cloud Functions are great and not so much expensive.
If you have an higher rate of event, Cloud Run can be a good alternative to leverage concurrency and to keep the cost low.
If you have million of event per hour or per minute, consider Dataflow with the pubsub to bigquery template.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | guillaume blaquiere |
