'HDFS commands in pyspark script

I am writing a simple pyspark script to copy hdfs files and folders from one location to another. I have gone through many docs and answers available online but i could not find a way to copy folders and files using pyspark or to execute hdfs commands using pyspark(particularly copy folders and files)

Below is my code

hadoop = sc._jvm.org.apache.hadoop
Path = hadoop.fs.Path
FileSystem = hadoop.fs.FileSystem
conf = hadoop.conf.Configuration()
fs = FileSystem.get(conf)
source = hadoop.fs.Path('/user/xxx/data')
destination = hadoop.fs.Path('/user/xxx/data1')

if (fs.exists(Path('/user/xxx/data'))):
    for f in fs.listStatus(Path('/user/xxx/data')):
        print('File path', str(f.getPath()))
        **** how to use copy command here ? 

Thanks in advance



Solution 1:[1]

Create a new Java object for the FileUtil class and use its copy methods, not hdfs script commands

How to move or copy file in HDFS by using JAVA API

It might be better to just use distcp rather than Spark, though, otherwise, you'll run into race conditions if you try to run that code with multiple executors

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 OneCricketeer