'Function in Pandas dataframe, equivalent to Spark SQL
I work with Microsoft Databrics and there is a simple function to save a table with a pyspark dataframe
table_name = 'location.table_name'
df.write.saveAsTable(table_name)
However this does not works with a pandas dataframe, and making a conversion is problematic.
What I need is a function that, given only 2 arguments, dataframe and tablename, makes the same function
Should look like this:
def save_pandas_to_SQL(df, 'location.table_name'):
"""Function"""
Solution 1:[1]
import pandas as pd
data = [['Scott', 50], ['Jeff', 45], ['Thomas', 54],['Ann',34]]
# Create the pandas DataFrame
pandasDF = pd.DataFrame(data, columns = ['Name', 'Age'])
First, transform your pandas Dataframe to a spark-Dataframe, then save it as a table.
sparkDF = spark.createDataFrame(pandasDF)
sparkDF.printSchema()
sparkDF.show()
table_name = 'location.table_name'
sparkDF.write.saveAsTable(table_name)
root
|-- Name: string (nullable = true)
|-- Age: long (nullable = true)
+------+---+
| Name|Age|
+------+---+
| Scott| 50|
| Jeff| 45|
|Thomas| 54|
| Ann| 34|
+------+---+
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | JAdel |
