'Error while Data Ingestion from SFTP to GCS or BigQuery using Cloud Data Fusion

I am trying to move CSV files in SFTP folder to GCS using Data Fusion. But I am unable to do it and throwing below error:

Here are the properties of both FTP and GCS plugins. Surprisingly, I could see the data in PREVIEW mode in all the stages but when I try to deploy the pipeline it fails. I tried using CSVParser as well as a TRANSFORM in between source(FTP) and sink (GCS). Still it shows the same error. I am using FTP plugin in Hub with version 3.0.0. Please help me to solve it.

enter image description here

And the error is as below, when I try to deploy the pipeline, eventhough Preview Data I was able to see the data.

enter image description here



Solution 1:[1]

Well I have dig a lot on this, I found that this plugins have issues when running ftp-plugins, so at the moment you can't do much on it. Fortunately, there are workarounds for this. To name a few here are some:

  • You can use an old version ( Dataproc image to 1.5/1.3 ) as indicated on the public case that also makes reference to this issue. For more details about this case, you can check the link for the issue, SFTP Source fails when deployed (SftpExecption) but not in preview. Don't forget to upvote and leave a comment too.

  • Another way is to use SFTPCopy plugin (once you pick up from the hub it should appear under Conditions and Actions). So you will be able to pick up the file from your SFTP into a local path and the use Source FILE to continue with the processing of your file. There is a small guide on Reading from SFTP and writing to BigQuery

  • This one is a bit extreme but you can also use a different workflow management platform like airflow for file processing.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Betjens