'Can Google Dataflow connect to API data source and insert data into Big Query

We are exploring few use cases where we might have to ingest data generated by the SCADA/PIMS devices. For security reason, we are not allowed to directly connect to OT devices or datasources. Hence, this data has REST APIs which can be used to consume the data. Please suggest if Dataflow or any other service from GCP can be used to capture this data and put it into Big Query or any other relevant target service. If possible, please share any relevant documentation/link around such requirements.



Solution 1:[1]

Yes!

Here is what you need to know: when you write an Apache Beam pipeline, your processing logic lives in DoFn that you create. These functions can call any logic you want. If your data source is unbounded or just big, then you will author a "splittable DoFn" that can be read by multiple worker machines in parallel and checkpointed. You will need to figure out how to provide exactly-once ingestion from your REST API and how to not overwhelm your service; that is usually the hardest part.

That said, you may wish to use a different approach, such as pushing the data into Cloud Pubsub first. Then you would use Cloud Dataflow to read the data from Cloud Pubsub. This will provide a natural scalable queue between your devices and your data processing.

Solution 2:[2]

You can capture data with PubSub and direct it to be processed in Dataflow and then saved into BigQuery (or storage), with a specific IO connector.

Stream messages from Pub/Sub by using Dataflow: https://cloud.google.com/pubsub/docs/stream-messages-dataflow

Google-provided streaming templates (for Dataflow): PubSub->Dataflow->BigQuery: https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming

Whole solution: https://medium.com/codex/a-dataflow-journey-from-pubsub-to-bigquery-68eb3270c93

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Kenn Knowles
Solution 2 razimbres