'Upload contents of spark dataframe as CSV to REST API using Python?

I'm trying to piece together the code required to run a query on a Hive/HDFS database (i.e. the same query I could run in Hive or Impala, using Zeppelin or Hue), then upload the contents of that to a REST API URL. I'm a very experienced developer but new to Python, dataframes, Spark, HDFS etc.

I've got my SQL query that returns the correct data (e.g. using Impala or Hive). I've got Python code that will connect to a REST API endpoint for upload:

import requests
x = requests.post(url, data = my_data)

I know that Python pandas library can save out CSV https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html#pandas.DataFrame.to_csv

I'm not sure how to get Python to run the query though, and what else I might be missing here...

Execution environment is python or pyspark running in Apache Zeppelin, table is in Hadoop/HDFS

Apologies if I'm misusing terms here, just trying to get my head around this :)

Thanks



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source