'How to delete all files in folder except CSV?
I wrote a dataframe to a csv in Pyspark. And I got the output files in the directory as:
._SUCCESS.crc
.part-00000-6cbfdcfd-afff-4ded-802c-6ccd67f3804a-c000.csv.crc
part-00000-6cbfdcfd-afff-4ded-802c-6ccd67f3804a-c000.csv
How do I keep only the CSV file in the directory and delete rest of the files, using Python?
Solution 1:[1]
import os
directory = "/path/to/directory/with/files"
files_in_directory = os.listdir(directory)
filtered_files = [file for file in files_in_directory if not file.endswith(".csv")]
for file in filtered_files:
path_to_file = os.path.join(directory, file)
os.remove(path_to_file)
first, you list all files in directory. Then, you only keep in list those, which don't end with .csv. And then, you remove all files that are left.
Solution 2:[2]
Try iterating over the files in the directory, and then os.remove only those files that do not end with .csv.
import os
dir_path = "path/to/the/directory/containing/files"
dir_list = os.listdir(dir_path)
for item in dir_list:
if not item.endswith(".csv"):
os.remove(os.path.join(dir_path, item))
Solution 3:[3]
You can also have fun with list comprehension for doing this:
import os
dir_path = 'output/'
[os.remove(os.path.join(dir_path, item)) for item in os.listdir(dir_path) if not item.endswith('.csv')]
Solution 4:[4]
I would recommended to use pathlib (Python >= 3.4) and the in-build type set() to substract all csv filenames from the list of all files. I would argument this is easy to read, fast to process and a good pythonic solution.
>>> from pathlib import Path
>>> p = Path('/path/to/directory/with/files')
>>> # Get all file names
>>> # https://stackoverflow.com/a/65025567/4865723
>>> set_all_files = set(filter(Path.is_file, p.glob('**/*')))
>>> # Get all csv filenames (BUT ONLY with lower case suffix!)
>>> set_csv_files = set(filter(Path.is_file, p.glob('**/*.csv')))
>>> # Create a file list without csv files
>>> set_files_to_delete = set_all_files - set_csv_files
>>> # Iteratore on that list and delete the file
>>> for file_name in set_files_to_delete:
... Path(file_name).unlink()
Solution 5:[5]
for (root,dirs,files) in os.walk('Test', topdown=true):
for name in files:
fp = os.path.join(root, name)
if name.endswith(".csv"):
pass
else:
os.remove(fp)
What the advandtage of os.walk?, it reads all the subdirectory in particular directory mentioned.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | theoctober19th |
| Solution 3 | Synthase |
| Solution 4 | buhtz |
| Solution 5 | Faraaz Kurawle |
