'Preprocessing large data in databricks community edition

I have 16 GB dataset and want to use it in databricks. However, in community edition DBFS limit is 10 GB. May you please assist me to preprocess the data to be able to move it from driver to DBFS.



Solution 1:[1]

The simplest way for that is not to use DBFS (it's designed only for temporary data), but host data & results in your own environment, like, AWS S3 bucket or ADLS (could be a higher transfer costs).

If you can't use it, then solution depends on other factors - what is the input file format, like, is it compressed/uncompressed, etc.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Alex Ott